- Data Transfer Overview
- Direct Mounts of CHPC File Systems
- Local authenticated transfers of small data sets
- Optimized Wide Area Network transfers of large data sets
- Temporary Guest Transfer
Transferring data in and out of CHPC resources is a critical portion of many science workflows. Science workflows may transfer a few small text files between collaborators or sites, or may transfer multiple tera or petabytes of data. CHPC offers a number of options for moving data to and from CHPC storage resources. In some cases, the data may not need to be moved, as there are options to mount some CHPC file systems from local resources. On this webpage, a description and usage information for four common scenarios is provided:
- Direct mounts of CHPC file systems
- Local authenticated transfers of small files or small data sets
- Optimized Wide Area Network (WAN) authenticated transfers for multiple large files or large datasets
- Temporary guest transfers for those without CHPC accounts
A good resource for information on data transfer considerations is the ESnet Faster Data site. Specifically, for setting expectations regarding transfer times, see the page "expected time to transfer data". For understanding the impacts of dropped packets on network protocols and the corresponding impacts to large science transfers, see the page regarding "tcp for long range data transfers".
NOTE: Direct mounts are NOT allowed for the CHPC PE fileserver(s), e.g., mammoth. sshfs can be used instead.
For usage situations where you do not need a second copy on a local machine, but instead
only need to access a file, you can do so by having the CHPC file system in which
the file is located mounted on your local machine. CHPC allows mounts of home directories
as well as group owned file systems on local machines. In addition we allow mounts
of other group owned file systems; however, we do not allow mounts of the general
CHPC scratch file systems such as /scratch/kingspeak/serial and /scratch/general/lustre.
If the local machine is off campus, you must access via the University of Utah VPN. Below is information on how to do the mounts for home and group spaces on Windows,
Mac and Linux systems. Note that there is also a short training video that covers this topic. In all of the following you must replace
<uNID> with your unid.
UPDATE - October 2020: CHPC is moving to a new clustered implementation of Samba called CTDB - Clustered Trivial DataBase. We have set up a cluster of five nodes, each of which has access to all Samba shares, with new Samba connections distributed among the five nodes via a load-balancing mechanism, and includes a mechanism for re-distributing mounds should one of the nodes fail. This new implementation changes the path users will use to set up mounts for their home directory and group spaces on windows and mac systems. Instead of having to know the specific mount path for each file share, e.g., cottonwood-vg1-0-lv1.chpc.utah.edu, the mount paths will now all be to samba.chpc.utah.edu. Below are updated examples for mounting both home and group spaces. Please start to move to this new solution. After a test period we will disable the old method (tentatively scheduled for early January 2021).
- For home directories: The information needed to mount your home directory space exists
in the Account Creation Notification email sent to new users. Alternately, you can
also get this information when on a CHPC resource, by issuing a
df|grep <uNID>command. As an example if the
cottonwood-vg2-1-lv1.chpc.utah.edu:/uufs/cottonwood/common/cottonwood-vg2-1-lv1/XX/<uNID>, then using the value of XX and your unid you would map the network drive using the following paths (NOTE - when authenticating, your username is ad\<uNID>):
- On Windows:
- On a MAC:
- On Linux: If you have root, you can cifs mount CHPC file spaces by (you will be prompted
for your password):
mount.cifs //cottonwood-vg2-1-lv1.chpc.utah.edu/XX-home/<uNID> /mnt -o user=<uNID>
- On Windows:
- For group directories: When the PI obtains group space, CHPC provides the information
on mounting and using the group spaces. A user in a particular group can also get
this information by connecting to an interactive node via ssh, and then issuing a
cdto the group space followed by
df -h | grep -B 1 groupspacename. As an example, for a space called name-group1, if the df command gives
cottonwood-vg1-3-lv1.chpc.utah.edu:/uufs/cottonwood/common/cottonwood-vg1-3-lv1/namethen for the groupspace name of name-group1. (NOTE - when authenticating, your username is ad\<uNID>):
- On Windows:
- On a MAC:
- On Linux: If you have root, you can cifs mount CHPC file spaces by (you will be prompted for
mount.cifs //cottonwood-vg1-3-lv1.chpc.utah.edu/name-group1 /mnt -o user=<uNID>
- On Windows:
NOTE 18 April 2019- we are seeing issues with SSHFS connections to the CHPC PE and general file system mounts leading to AD lockouts. We continue to explore to see if there is a workaround or a fix. If you are seeing AD lock outs, that is you are unable to connect to the PE or general resources, please determine if any of your devices are using SSHFS. If you do, remove the SSHFS connection and try logging in again. ****
SSHFS allows to mount a remote file system using sftp - a secure file transfer protocol. This is equivalent to terminal access with ssh.
sshfs on Linux
To use sshfs on Linux, one needs to install the sshfs package and create a mount point.
The installation of the sshfs package requires root privileges, but the rest can be
done as a regular user, e.g. on CentOS:
sudo yum install fuse-sshfs
This then follows by mounting the remote file server, e.g. for CHPC's PE home directory:
sshfs uNID@redwood1.chpc.utah.edu:/uufs/chpc.utah.edu/common/HIPAA/uNID $HOME/HIPAA
Unmounting is achieved by:
fusermount -u $HOME/HIPAA
sshfs on MACOS
Sshfs on MacOS is similar to Linux. To use sshfs on MacOS, one needs to install the sshfs packages and create a mount point. The installation of the sshfs package requires administrative privileges, and the rest can be done as a regular user. First, you must install the FUSE for MacOS and sshfs packages from https://osxfuse.github.io/ . These are distributed as two separate packages.
Next create a directory to use as a mount point, either in a terminal window or in the Finder, for example $HOME/HIPAA.
Finally, open a Terminal session to execute the following statement (replacing "uNID"
with your uNID):
sshfs uNID@redwood1.chpc.utah.edu:/uufs/chpc.utah.edu/common/HIPAA/uNID $HOME/HIPAA
To unmount the volume, execute:
Interaction between the Finder and the remote SSHFS file system is a little peculiar. Rather than finding your files under your home in the HIPAA folder (as in this example) they will be located under your home in the folder "OSXFUSE Volume 0 (sshfs)". That mount will not be listed among other mounts in the Locations section of the Finder. Within the terminal, however, your files will be located under the HIPAA mount point, or whatever mount point you specified in your sshfs command.
sshfs on Windows
On Windows the situation is not that simple, as there are several different sshfs clients which vary in their stability. Only the SFTP Net Drive seems to support 2 factor authentication needed for the Protected Environment
In the general environment, we have had success with two sshfs clients which seem to be actively developed:
- SFTP Net Drive - this is a simple install and running the program provides a dialog where one enters
the machine to sshfs to (e.g. kingspeak1.chpc.utah.edu). Disconnect by clicking the
Disconnect button in the program's dialog window.
- For PE, in the connection dialog, choose AuthenticationKeyboard-interactive, and Serverredwood1.chpc.utah.edu. Then push connect. At the first password prompt enter your password, at the second prompt enter 1, which will push a 2-factor authentication request to your phone, OK that and the PE home directory should get mounted.
- SSHFS-Win - to install and use this program do the following:
- Install the latest stable version of WinFSP
- Install the latest stable version of SSHFS-Win
- Open File Explorer, right click on This PC, choose Map Network Drive, enter drive letter and uNID@kingspeak1.chpc.utah.edu. Windows will ask for your CHPC password and your CHPC home directory will appear as aa drive letter of your choice.
- To disconnect, right click on that drive letter and choose Disconnect.
In some scenarios, a research workflow requires the movement of a few small input files to a computational cluster and/or a few small output files back to a local desktop for storage, final reporting and/or analysis. For this case, CHPC suggests two options:
- the use of simple graphical tools to move data to/from a local machine. Examples are WinSCP on Windows machines, CyberDuck on Macs. In these cases you can "drag and drop" files from one system to the other.
- the use of Linux commands and tools (scp, rsnyc over ssh, sftp, wget, curl are examples) on CHPC interactive nodes or on the specialized Data Transfer Nodes (DTNs). Note the DTN usage is described in the next section. Detailed information of the use of these commands can readily be obtained either from the man pages (links above) or via a web search.
- to get a download link for data that are on a website with no direct link, e.g. for Dropbox or Sharefile, try Chrome extension CurlWget, or Firefox extension cliget. This extension will generate an URL that can be used with wget or curl on a terminal command line.
For science workflows that transfer very large files and/or very large data sets between national labs, industry partners, or peer institutions, users require more advanced parallel transfer tools running on tuned endpoint devices such as Data Transfer Nodes (DTNs). CHPC supports various parallel transfer tools that support the known science workflows at the University of Utah. If a research group requires the support of an additional transfer tool, the group may request it through email@example.com.
Network traffic from most systems located at CHPC and at other locations on campus
has traditionally passed through the campus firewall when communicating with resources
off campus. For many small data transfer usage cases, this traffic flow is acceptable.
However, large research computing workflows require more bandwidth requirements and
more connections/sessions than the campus firewall can reasonably handle well. These
characteristics of research computing workflows can easily overwhelm the capacity
of the campus firewall, impacting much of the day to day usage for the rest of campus.
To help address these research computing workflow needs, the campus has created a
Science DMZ which is a different network segment with different security approaches.
This network segment allows for transfers with specific high performance, low latency,
and other special network and security characteristics. CHPC offers a number of dedicated
Data Transfer Nodes (DTNs) that utilize the Science DMZ for data transfers. For more information regarding the specifics of a Data Transfer Node, see "What makes a Data Transfer Node?".
The general environment DTNs are - NOTE: Updated Feb 2021 to reflect new dtn05-8, replacing airplane01-4; updated Jun 2021 to remove dtn01-dmz and dtn04-dmz:
- dtn03.chpc.utah.edu (connected at 10gbs, no dmz, use for internal campus transfers)
- dtn05.chpc.utah.edu (connected via dmz at 100gbs)
- dtn06.chpc.utah.edu (connected via dmz at 100gbs)
- dtn07.chpc.utah.edu (connected via dmz at 100gbs)
- dtn08.chpc.utah.edu (connected via dmz at 100gbs)
All CHPC users are able to utilize these DTNs and leverage all the parallel transfer tools that CHPC supports. The new dtn05-8 operate individually, as well as together as a Globus Endpoint with Concurrent machines for moving large data sets with lots of files (see globus section below for CHPC endpoints).
Also note that dtn03.chpc.utah.edu does not use the Science DMZ; this is the choice if doing an internal to campus transfer.
CHPC also supports additional specialized tools for moving data to/from cloud storage. Some of these tools are specific to a single cloud storage provider (such as s3cmd for Amazon cloud services), whereas others such as rclone work with different cloud storage providers.
CHPC also supports a number of group owned DTNs. If you need any information about an existing group owned DTN or are interested in having a group dedicated DTN, please contact us.
We also have a set of DTNs in the protected environment, updated in 2021:
You can use one of the above or use pe-dtn.chpc.utah.edu and it it will round-robin between the two servers. These are currently connected at 40gbs. Globus is now an option for data transfers in the PE. See the Globus page referenced below for details.
In April 2021 we have enable the use of the dtns via slurm, both in the general environment on notchpeak and the protected environment on redwood. See https://www.chpc.utah.edu/documentation/software/slurm-datatransfernode.php for details.
There are a number of options available to use for large scale data transfer. In the following we list ones that we have installed on CHPC resources. See the application database for more information on CHPC installations of these tools. These can be used on the Data Transfer Nodes mentioned in the above section (recommended for any large file transfers), or from the other CHPC maintained resources that have access to the CHPC application.
Rclone is a command-line program that supports file transfers and syncing of files between local storage and Google Drive as well as a number of other storage services, including Dropbox and Swift/S3-based services. Rclone offers options to optimize a transfer and reach higher transfer speeds than other common transfer tools such as scp and rsync. For more information on using rclone as well as configuring it for both transferring data to google drive storage and to the CHPC archive storage, see the CHPC Rclone Software Documentation page.
The Globus service is comprised of a set of tools developed to facilitate parallel, load-balanced, fault tolerant data transfers. There are a few steps involved in getting set up to use this service at CHPC, which are detailed on the Globus Software Documentation page. For further information please visit: https://www.globus.org/quickstart
FDT, Fast Data Transport, is a Java based file transfer application. CHPC has an installation that is located at /uufs/chpc.utah.edu/sys/pkg/fdt/0.9.20. FDT can be used in three modes: server, client and SCP.
- In Server mode the FDT will start listening for incoming client connections. The server may or may not stop after the last client finishes the transfer.
- In Client mode the client will connect to the specified host, where an FDT Server is expected to be running. The client can either read or write file from/to the server.
- In the SCP (Secure Copy) mode the local FDT instance will use SSH to start/stop the FDT server and/or client. The security is based on ssh credentials. The server started in this mode will accept connections ONLY from the "SCP" client.
java -jar /uufs/chpc.utah.edu/sys/pkg/fdt/0.9.20/fdt.jar --help or visit the FDT site.
BBCP is a point-to-point network file copy application developed at SLAC National Accelerator Laboratory that allows for sending data in multiple simultaneous streams, developed for the BABAR is a particle physics experiment Complete documentation on bbcp usage can be found at BBCP website.
/uufs/chpc.utah.edu/sys/pkg/bbcp/std/bin/bbcp --help or visit the BBCP site.
Aspera is IBM's high-performance file transfer software which allows for the transfer large files and data sets with predictable, reliable and secure delivery regardless of file size or transfer distance from a location that has the aspera transfer server running. The NCBI recommend the use of aspera for transfer of data sets from their site. The command line client is installed on CHPC at /uufs/chpc.utah.edu/sys/installdir/aspera/3.6.1/connect/bin.
To use the command line client:
module load aspera/3.6.1
See the CHPC Aspera software page for additional information.
The aspera transfer guid via NCBI can be found at https://www.ncbi.nlm.nih.gov/books/NBK242625/
The currently up-to-date documentation for ascp can be found at http://downloads.asperasoft.com/en/documentation/8
Fpsync, part of the fpart package, is a shell script that wraps fpart and rsync to launch multiple rsyncs in parallel
module load fpsync
NOTE: Large transfers are very dependent on the characteristics of the resources on both ends of the transfer. If you need assistance in initiating, monitoring, or troubleshooting large transfers, you can reach out to CHPC via firstname.lastname@example.org.
CHPC provides a mechanism for our users to transfer files to and/or from individuals without CHPC accounts. This service is called guest-transfer.
What is it for?
- At times, CHPC users need to transfer files to or from individuals that don't have CHPC accounts. These files are often quite large (many gigabytes), and thus unsuitable for other transport mechanisms (email, DVD).
- These file transfers often need to happen with little or no warning. They may also need to occur outside CHPC's support hours. Thus, the guest-transfer service must function without intervention or assistance from CHPC staff.
What is it not for?
- The guest transfer service is not for repeated events.
- The guest transfer service is not for long-term data storage.
- The guest transfer service is not for restricted (PHI/HIPAA/HITECH/FERPA) or sensitive data.
- The guest transfer service is not for huge data transfers (it's currently restricted to approximately 5 terabytes).
How do I get a guest account?
- When you need to use the guest transfer service, visit https://guest-transfer.chpc.utah.edu/ and fill out the form. This form creates a guest transfer account. You then gives the guest account username and password to your colleague. You and your colleague can now share files.
How do I use the service?
- Once you have created a guest-transfer account and given it to your colleague, you and your colleague can copy your files to guest-transfer.chpc.utah.edu with your scp/sftp client (scp, sftp, WinSCP, etc).
Things to remember:
- The process is completely automatic. You fill out the form, it immediately gives you a guest account.
- Only CHPC users can create accounts.
- The person who creates the guest account is responsible for all activity of the guest account.
- This guest account is only usable for the guest transfer service. It provides no access to any other CHPC or University resources.
- Files are transferred via scp/sftp. Interactive logins are disabled.
- Files are automatically removed after 90 days (based on last-access time).
- Files in the guest-transfer service can be read or changed by any other user.
- Consider using encryption to protect and verify your files.
- DO NOT USE THIS SERVICE FOR SENSITIVE OR RESTRICTED DATA!