OpenStack logo

YETI is an active archive hosted at EURECOM for providing cold storage of data generated by various research partners in 4D-Omics. This archive has several requirements.

First, it must provide high-density storage of several Petabytes of data at a very low price point. This differentiates it from other “hot” storage solutions based on Hard Disk Drives and/or Solid State Storage that prioritizes fast data access over low cost. Thus, our solution will use tape as the storage medium.

Second, our cold storage solution must be able to store data over long time durations (up to 8 years atleast). The use of tape solves the media failure issue, as tape can last up to 20 years unlike HDD or SSD which lasts up to 3–5 years. However, we also need to design our solution to withstand other types of hardware failures–like failures of robotic arms in tape library, failure of tape-interface servers, etcetera.

Third, we need to provide cold storage to the community using a API while meeting security and privacy requirements, and required performance demands. In order to meet the performance requirements, our solution must use a front-end cache that can absorb sudden write bursts, or buffer recent read requests to ensure that all foreground activity is not serviced by the tape library which can be slow.

IBM Spectrum Scale logo

Taking all the aforementioned aspects into consideration, YETI uses IBM TS4500 tape library as its core long-term storage solution. In order to ensure high reliability, the tape library has been provisioned in a dual-arm, three-frame setup so that even if one arm fails, the robotic machinery will continue to service requests. Built-in mechanisms for tape verification are being used to periodically scrub data and verify integrity. Access to the tape library has been implemented using IBM Spectrum Archive software that has been be deployed in a dual-server, fail-over setup to ensure reliability even in the prescence of an interface server failure. A HDD-based, scalable, front-end cache managed using the Spectrum Scale file system has been deployed on FS5035 servers that use a dual-controller setup to provide high-performance access to the tape backend.

Taken together, YETI has a maximum storage capacity of 20PB making it the largest archive in the south of France.

IBM Spectrum Archive logo