New archive storage solution available to Utah researchers state-wide
With the aid of an NSF Campus Cyberinfrastructure (CC*) award, the CHPC has built a prototype state-wide archive system. This system provides infrastructure that allows researchers to satisfy the data sharing, resiliency, and retention requirements placed on published and complete datasets. The system will also provide an opportunity for researchers to explore sharing datasets with national data-sharing platforms (e.g., National Data Platform, NDP, and Open Science Data Federation, OSDF) to promote caching datasets close to computational resources. This system can only house open data; that is, it may only be used for data without security regulations or restrictions.
This system is composed of two pieces:
- A disk-based object store built on the Ceph software stack and located at the Downtown Data Center (DDC)
- A tape-based library located at the Tonaquint Data Center (TDC) in St. George
Users interact with the object store at the DDC, called ARC-A (Archive-A), copying their datasets to this system via an S3 interface with tools like Rclone and Globus. The datasets are then automatically replicated to the Spectralogic BlackPearl tape library system, called ARC-B, where two copies are written. ARC-A is 2.8 PB and ARC-B is 7.2 PB in capacity.
If you have large datasets that (a) have accessibility requirements, (b) are associated with a publication, or (c) have datasets that are—or will be—broadly used by other institutions, we encourage you to apply for an allocation of space. Because this system was created with grant funds, it will be provided free of charge for the life of the grant. Beyond that window of time, there will be a charge per terabyte, which we are working to determine. We are limiting allocations of space to 50 TB per group. To apply for an allocation of space on this system, please contact us. As part of the application process, we ask that you provide a coarse manifest of what datasets you plan to store on the archive, along with a description of broader significance of your data, the capacity you require, and the duration of any applicable data retention requirements.