Storage Services at CHPC
CHPC currently offers four different types of storage: home directories, group space, scratch file systems and a new archive storage system. All storage types except for the archive storage system are accessible from all CHPC resources. Data on the new archive storage space must be moved to one of the other spaces in order to be accessible. Home directories and group spaces can also be mounted on local desktops. See the Data Transfer Services page for information on mounting CHPC file systems on local machines along with details on moving data to and from CHPC file systems.
In addition we have limited tape back up systems for both home directories and group spaces.
Note that the information below is specific for the general environment. In the protected environment (PE) all four types of storage exist. However, the nature of the storage, the pricing and policies do vary in the PE. See the Protected Environment page for more details.
For more information on CHPC Data policies, including details on the current backup policies, visit: File Storage Policies
***Remember that you should always have an additional copy or possibly multiple copies, on independent storage systems, for any crucial/critical data. While storage systems built with data resiliency mechanisms (such as RAID and erasure coding mentioned in the offerings listed below or other similar technologies) allow for multiple component failures, they do not offer any protection for large scale hardware failures, software failures leading to corruption, or for accidental deletion or overwriting of data. Please take the necessary steps to protect your data to the level you deem necessary.***
By default each user is provided with a 50 GB home directory free of charge.
THIS SPACE IS NOT BACKED UP -- It is assumed important data will be copied to a departmental file server or another location.
The quota on this directory is enforced. There is a two level quota system in place. Once a user has exceeded the 50 GB quota they have 7 days to clean up their space such that they are using less than 50 GB. If they do not, after 7 days they will no longer be able to write any files until you clean up your usage. If your home directory grows to 75 GB, you will no longer be able to write any files until you clean up your usage to under 50 GB. Please note - when over quota, you will not be able to start a FastX session or edit a file, as those tasks write to you home directory.
CHPC can provide a temporary increase to your home directory; please reach out via firstname.lastname@example.org and include the reason that the temporary increase is needed as well as for how long you will need the increase.
To view the current home directory usage and quota status, run command
CHPC also allows CHPC PIs to buy larger home directory storage at a price determined based on hardware cost recovery. All home directories of members of the group will then be provisioned in this space. The purchase price currently includes CHPC managed backup of this space. If you are interested in this option, please contact us by emailing email@example.com to discuss your storage needs.
UPDATE - January 2022: We are in the process of purchasing a replacement for the current home directory file system which was installed in 2017 and is going out of warranty in Spring 2022. In the near future we will provide details of the new home diectory hardware along with the cost of purchasing owner home directory space. Once we have these details we will reach out to groups who own home directory space in our current solution to see if they want to buy into the new solution.
Added March 2022: The hardware for the new home directory solution is described in an article in the Spring2022 newsletter. This new solution has a cost of $900/TB for the 5 year warranty lifetime. This includes the cost of the space on the VAST storage system along with backup of this space. The backup will be to the CHPC object storage, pando, and will be a weekly full backup with nightly incremental backups, with a two week retention window. CHPC will be taking delivery of the new system during the last week of March. After installation and testing we will be reaching out to all groups who have purchased home directory space on our current Compellent home directory, to discuss moving to the new solution. We anticipate that this will be sometime in May. At that time we will also be making arrangements to move the default 50 GB home directories of users to the new file system.
CHPC currently allows CHPC PIs with sponsored research projects to buy-in group level file storage at a price determined based on cost recovery. A more detailed description of this storage offering is available. The current pricing is $150/TB for the lifetime of the hardware which is purchased with a 5 year warranty, and we are usually able to get an additional 2 years of warranty added after purchase. CHPC purchases the hardware for this storage in bulk and then sells it to individual groups in TB quantities, so depending on the amount of group storage space you are interested in purchasing, CHPC may have the storage to meet your needs on hand. Please contact us by emailing firstname.lastname@example.org and request to meet with us to discuss your needs and timing. BY DEFAULT THIS SPACE IS NOT BACKED UP, HOWEVER CHPC PROVIDES A BACK UP OPTION.
It should also be noted that this archive storage space is for use in the general environment, and is not for use with regulated data; there is a separate group level storage option (project space) in the protected environment.
NOTE -- March 2019: We are no longer offering backup of NEW group spaces to tape. We will continue to provide backups of group spaces for which groups who have already purchased tapes. Details of the new options for backup of group spaces are given in CHPC's Spring 2019 Newsletter as well as in the Backup section below.
Update -- August 2021: The Summer 2021 quarter (July 1 - September 30) will be the last quarterly archive to tape of any CHPC general environment group space that is still being backed up to tape. All owners of these spaces have been notified. After that time, until XX, we will only maintain the tape infrastructure needed to do restores from the existing tapes.
New archive backups of group level storage will be to the Archive Storage discussed below. Details on the current backup policies is given at File Storage Policies. Contact us at email@example.com to set up any group space backup. CHPC also provides information on a number of user driven alternative to this service; see the section on User Driven Backup Options below.
There are various scratch file systems which are available on the HPC clusters. THE SCRATCH FILE SYSTEMS ARE NOT BACKED UP. This space is provided for users to store intermediate files required during the duration of a job on one of the HPC clusters. On these scratch file system, files that have not been accessed for 60 days are automatically scrubbed. There is no charge for this service.
The current scratch file systems are:
- /scratch/general/lustre - a 700TB lustre parallel file system accessible from all CHPC resources
- /scratch/kingspeak/serial - a 175 TB NFS system accessible from all CHPC resources - READ ONLY as of May 26; will be retired July 1.
- /scratch/general/nfs1 - a 595 TB NFS system accessible from all CHPC resources
- /scratch/general/vast - 1 PB file system available from all CHPC resources.
Linux defines temporary file system at
/var/tmp where temporary user and system files are stored. CHPC cluster nodes set up temporary
file systems as a RAM disk with limited capacity. All interactive and compute nodes
have also a spinning disk local storage at
/scratch/local. If a user program is known to need temporary storage, it is advantageous to set
environment variable TMPDIR which defines the location of the temporary storage and point it to
/scratch/local. Local disk drives range from 40 to 500 GB depending on the node, which is much more
than the default
/scratch/local can also be used for storing intermediate files during calculation, however be aware
that getting to these files after the job finishes will be difficult since they are
local to the (compute) node and not directly accessible from cluster interactive nodes.
**July 2020 Update -- changes to the use /scratch/local on cluster compute nodes** : Users can still make use of
/scratch/local on the compute nodes of the cluster for their batch jobs. However access permissions
will be set such that users will no longer be able to create directories in the top
/scratch/local directory. Instead, as part of the slurm job prolog (before the job is started),
a job level directory,
/scratch/local/$USER/$SLURM_JOB_ID , will be created. Only the job owner will have access to this directory. At the
end of the job, in the slurm job epilog, this job level directory will be removed.
These changes are being made to solve two main issues. First, this process will achieve
isolation among jobs, thereby allowing for cleanup of the
/scratch/local file system, and eliminating cases where a new job starts on a cluster with a full
or nearly full
/scratch/local . In addition, this will also eliminate the situation where a job is started only
to fail when it cannot write to
/scratch/local due to hardware issues. By moving the top level directory creation to the job prolog,
if there are any issues creating this job level directory, the node will be off-lined,
and the job prolog will find a new node for the job.
All slurm scripts that make use of
/scratch/local must be adapted to accommodate this change. Additional updated information is provided
on the CHPC slurm page.
One additional part of the change to /scratch/local is that space is now being sofware encrypted. Each time a node is rebooted this software encryption is re-setup from scratch which in effect purges anything the content of this space. There is also a cron job in place to scrub /scratch/local of content that has not been accessed for over 2 weeks. This scrub policy can be adjusted on a per host basis. A node owned by a group can opt to have us disable this (declare it as an exception for that scrub script) and it will not run on that host.
CHPC now has a new archive storage solution based around object storage, specifically ceph, a distributed object store suite developed at UC Santa Cruz. We are offering an 6+3 erasure coding configuration which results in a price of $150/TB of usable capacity for the 7 year lifetime of the hardware. As we currently do with our group space, we will operate this space in a condominium model by reselling this space in TB chunks.
This space is a stand alone entity, and will not be mounted on other CHPC resources.
One of the key features of the archive system is that users manage the archive directly,
unlike the tape archive option. Users can move data in and out of the archive storage as needed -- they can archive
milestone moments in their research, store an additional copy of crucial instrument
data, and retrieve data as needed. This archive storage solution will be accessible
via applications that use Amazon’s S3 API. GUI tools such as transmit (for Mac) as well as command-line tools such as
s3cmd and rclone can be used to move the data.
It should also be noted that this archive storage space is for use in the general environment, and is not for use with regulated data; there is a separate archive space in the protected environment.
The backup policy of the individual file systems is mentioned above.
Note -- March 2019: CHPC is migrating the backup of group home directory from tape to the disk based archive storage mentioned above. For CHPC managed backups of any newly purchased spaces groups will need to purchase space, typically twice the capacity of the group space, on the archive storage solution mentioned above. Alternate user driven backup solutions are mentioned in the next section.
Update -- August 2021: The Summer 2021 quarter (July 1 - September 30) will be the last quarterly archive to tape of any CHPC general environment group space.
Campus level options for a backup location include Box and Microsoft OneDrive. Note: There is a UIT Knowledge Base article with information on the suitability of the campus level options for different types of data (public/sensitive/restricted). Please follow these university guidelines to determine a suitable location for your data.
Note - one option in the past has been owner back up to Google Drive -- However, with the recent changes in available capacity noted below this is no longer an option. While the storage offered via Google's GSuite for Education has been unlimited, Google announced in Spring 2020 that their model for storage will be changing effective July 2022. Under the new Google Workspace for Education model storage will no longer be unlimited, but be limited to a baseline of 100 TB per institution. In February 2021 the University of Utah central IT (UIT) announced that the new per user limit will be 5 GB and that users should reduce their usage to below this limit by April 29, 2022. As we obtain additional information we will make announcements to the CHPC user mailing list.
Added 17 Feb 2022: CHPC has started a page to provide information (tips, how to information, etc) about migrating data from Google Drive - we will be updated as we gather more information.
Owner backup to University of Utah Box : This is an option suitable for sensitive/restricted data. See the link to get more information about the limitations. Note that if using rclone, the credentials expire and have to be reset periodically.
Owner backup to University of Utah Microsoft OneDrive: As with box, this option suitable for sensitive/restricted data. See the link to get more information about the limitations.
Owner backup to CHPC archive storage -- pando in the general environment and elm in the Protected Environment: This choice, mentioned in the Archive storage section above, requires that the group purchase the required space on CHPC's archive storage options.
Owner backup to other storage external to CHPC: Some groups have access to other storage resources, external to CHPC, whether at the University or at other sites. The tools that can be used for doing this are dependent on the nature of the target storage.
There are a number of tools, mentioned on our Data Transfer Services page, that can be used. Several places above we mentioned rclone which is the tool best suited for transfers to object storage file system; others are fpsync, a parallel version of rsync suited for transfers between typical Linux "POSIX-like" file systems, and globus, best suited for transfers to and from resources outside of the CHPC.
In addition we have a page that presents a number of considerations and tips for user driven backups.
For making direct mounts of home and group space on your local machine see the instructions provided on our Data Transfer Services page.