Skip to content

GDrive Migration Information

This is the location CHPC will use to supply information - how to guides, tips, etc - on the migration of data from Google Drive.

CHPC will be holding a series of Office Hour type sessions to work with users, starting the week of February 20, 2022. These sessions will be informal, guided by user questions. This will be a time during which CHPC staff will be available to field questions from  users  on any aspect of the move off Google Drive, including storage options available, use of rclone, other data movement options, etc.

Campus Level Alternatives

As mentioned in our announcement of 14 February 2022,  current campus level options, though they are limited in amount of storage provided, include Box and Microsoft OneDrive. There is also a UIT Knowledge Base article with information on options and their suitability for different types of data (public/sensitive/restricted). 

Added 4 March 2022/Updated 15 March 2022 

Another alternative to consider, depending on the nature of  the data, may be LabArchives.  For information, including a links to a set of recordings of webinars, see the Marriott Library LabArchive page. We have heard that LabArchives was just renewed for an additional year,  until the end of March 2023.

###

As we  gather information that we believe will be of general interest to others, we will add it here. 

General Information

This information is not in any specific order.

##Use the Data Transfer Nodes (DTNs) for data transfer process - see DTN page for details. Can spread transfers across different DTNs.

##OneDrive file size limit is 250 GB; Box file size limit is 15 GB  50 GB 

13 April 2022: 

Change in limit for the University Google Drive - now 25 GB instead of 5 GB.  Central IT is working on a new contract with Google that will also allow departments to purchase additional space.  Pricing and other details to follow once the contract has been finalized.

What happens after 29 April 2022:  At this time, Google will not allow users over the limit to write to the GDrive space. No data deletion will be done at this point, but this is expected to happen later in the year for those still over  the limit

 

Links to pricing of other cloud storage offerings

Amazon S3 Pricing page: https://aws.amazon.com/s3/pricing/
Wasabi: https://wasabi.com/paygo-pricing-faq/ ($6/TB/month, no ingress/egress)
Backblaze B2: https://www.backblaze.com/b2/cloud-storage-pricing.html ($5/TB/month, $10/TB egress)
Wasabi and Backblaze both say they're willing to sign a BAA
 
##with cloud vendors - need to also be careful to take in account any ingress/egress charges.
 

Linux backup to Box

Here is an example script that Martin is using to back up his desktop to Box, along with the script he was using to do this Google Drive:

as compared to Google Drive:
 
Box is not case sensitive, so files/directory with same names independent of capitalization are treated as the same - ex Ireland, IRELAND, ireland.  Workaround - tar the data first, giving the tar file an unique name, using the crypt layer (if you do not need to share) - see https://forum.rclone.org/t/box-duplicates-due-to-case-insensitive/5400

 

Box

Looking at other options available via Box (thanks to Heidi Schubert!). Will post more here when available.

OneDrive

Information on limits/restrictions on OneDrive. This includes file/folder names, characters, file types

Hints with rclone

More information about rclone options for gdrive is found at https://rclone.org/drive/

Handling shortcuts on gdrive:

  • You can choose not to follow in the copy
  • if you follow, you need to watch for loops that can occur if something in the shortcut points back to original location.  Shortcuts by default are followed/traversed so if that is what you want no additional flags are needed.  When they are copied/synced they become copies of what the 'short cut' pointed at vs maintaining the 'short cut' or sym link like aspects.  If you do not want to follow/traverse 'short cuts' then a flag is needed, see below for an example: 
    • -- example line to avoid 'short cuts': 'rclone copy --drive-skip-shortcuts gdrive: /path/to/destination/'

Dealing with duplicate file name: 

  • the is a dedupe option to rclone, where you can state how to handle
  • not all duplicates are obvious - ex if you have a Microsoft office file (word, ppt, excel) in google drive and edit, it takes it to a google document and drops the extension (ppt, doc, etc) -- so in google drive they appear to be two different names. But when it gets moved back it is reverted to the original format and the extension is added back
  • Example 'how to auto-rename duplicates to be ${name}-N (where N is some incrementing integer): 'rclone dedupe --dedupe-mode rename --drive-shared-with-me gdrive:'

Dealing with a "shared drive" from google drive:

  • For a  google shared drive, you need to set up a new rclone configuration specifically for this drive.
Dealing with "shared with me' content on google drive:
  • Note this is not  google shared drives, formerly known as google team drives
rclone mount from local desktop: 
  • There are lots of limitations with a mount like this as well as risks/impacts to the user as far as responsiveness to commands that query file systems etc.  We do not encourage this as a general use, it is a shim to help in some cases.
  • If you will be using this mount to transfer files, we recommend that you do this from one of the Data Transfer Nodes
  • Info here: https://rclone.org/commands/rclone_mount/
    • Example: 'rclone mount --dir-perms 0700 --daemon --allow-non-empty gdrive: ~/gdrive/' &
      • to mount a google drive rclone configuration with the name 'gdrive' to a file called gdrive under your home directory
      • You can then navigate to ~/gdrive and you will see the contents of your GDrive space
    • Example to unmount: 'fusermount -u ~/gdrive'

 

CyberDuck

 

For GDrive:
  • Open Connection, in new window, top pull down - select Google Drive,  select Connect at bottom of this window.
  • That opens up a  web page where you will select the google account to be used (use your UNID@gcloud.utah.edu one if you have multiple gmail accounts), and then you will get an authentication code after you select to allow the connection.
  • At the same time the web page is opened, there will be a cyberduck login window asking for the authentication code.
  • Enter the code and you will now be able to see your files on GDrive.
  • The get Info button gives you information on files (size, dates, URL),  the actions gives you options to download or sync content  from GDrive to your computer  or upload content to GDrive from  the computer you are  running cyberduck.
For Box:
  • the connection was not allowed. I am exploring options with UIT.

For OneDrive:

  • need to test
Last Updated: 4/13/22