CTA Presentations

2022

How CERN Leverages Tape in Support of Active Physics Data Archives

24 June 2022
Fujifilm 12th Annual Global IT Exececutive Summit, San Diego, USA

Presentation to senior storage industry executives on how CERN addresses the challenges of high performance, resilient, scalable and efficient massive archives at exabyte scale.

Liquibase—A tool to facilitate CTA Catalogue database schema deployment

2 June 2022
IT Activities and Services Discussion Forum (ASDF), CERN, Geneva, Switzerland

The Catalogue database is an essential component of the CERN Tape Archive (CTA) system. It contains a huge collection of information about the files that are stored on tape. Like most of the software that exists in the world, CTA does not escape the multitude of new feature requests coming from end-users, operators and from its growing community. Some of these features require a Catalogue database schema modification and the application of changes in production is often tricky. This presentation will be a quick overview of the tool that the CTA team have chosen to perform in-production database schema changes: Liquibase.

Schema upgrade procedure for the CERN Tape Archive (CTA)

2 June 2022
IT Activities and Services Discussion Forum (ASDF), CERN, Geneva, Switzerland

This talk will present how the use of Liquibase in the IT-SD-TAB section has evolved over the past year. The continuous improvement of our processes allowed us to perform the largest schema update yet with minimal downtime and service disruption. We will go over how we perform schema upgrades, the auxiliary tools we have created and improved to perform and test the upgrades, what we have learned using Liquibase over the last year and how we make sure that the upgrade will work before deploying it.

The CERN Tape Archive (CTA)—running Tier 0 tape

27 April 2022
HEPiX Spring 2022 Online Workshop

During the ongoing long shutdown, all elements in LHC data-taking have been upgraded. As the last step in the T0 data-taking chain, the CERN Tape Archive (CTA) has done its homework and redesigned its full architecture in order to match LHC Run 3 data rates.

This contribution will give an overview of the CTA service and how it has been deployed in production. We discuss the measures taken to assess and improve its performance and efficiency against various workflows, especially the latest data challenges realised on T0 tape endpoints. We illustrate the monitoring and alerting which is required to maintain performance and reliability during operations, and discuss the outlook for service evolution.

CERN’s Run 3 Tape Infrastructure

27 April 2022
HEPiX Spring 2022 Online Workshop

LHC Run 3 is imposing unprecedented data rates on the tape infrastructure at CERN T0. Here we report on the nature of the challenge in terms of performance and reliability, on the hardware we have procured, and how it is deployed, configured and managed. We share details of our experience with the technology selected, a mix of IBM and SpectraLogic libraries and Enterprise and LTO drives. In particular, LTO-9 is a new technology and we cover low level details including media initialisation and its native Recommended Access Order (RAO). We conclude with an outlook on the likely evolution of the infrastructure.

The CTA project, team and community

9 March 2022
6th EOS Workshop, CERN, Geneva, Switzerland

Introduction to the first CTA Day at the EOS Workshop, a full day of presentations and discussions about CTA.

CTA at AARNet

9 March 2022
6th EOS Workshop, CERN, Geneva, Switzerland

In this presentation, we will report on how we at AARNet deployed CTA along with restic backup client as a backup/archive solution for our production EOS clusters. The solution has been in production since late 2021. This presentation will aim to cover why we chose CTA, how CTA is deployed, and how it is integrated into our backup workflow.

EOS and CTA Status at IHEP

9 March 2022
6th EOS Workshop, CERN, Geneva, Switzerland

EOS is now the main Storage System for IHEP experiments like LHAASO and JUNO. And Castor has been used for backup experiment data for a long time at IHEP, and has difficulty to satisfiy data backup requirement of new experiments like LHAASO, JUNO. As EOSCTA became stable to replace Castor in production, we started EOSCTA evaluation and the castor migration. In this talk, we will give a brief introduction of current EOS status at IHEP, and mainly talk about our effort on CTA deployment and CTA migration.

CTA at RAL

9 March 2022
6th EOS Workshop, CERN, Geneva, Switzerland

This talk will present details of the deployment of Antares, the EOS-CTA service at RAL Tier-1, which replaces Castor.

Evaluation of CTA for use at Fermilab

9 March 2022
6th EOS Workshop, CERN, Geneva, Switzerland

Fermilab is the primary research lab dedicated to particle physics in the United States and also is home to the largest archival HEP data store outside of CERN. Fermilab currently employs a HSM based on Enstore, a Fermilab product, and dCache, for tape and disk, respectively. This Enstore+dCache HSM manages nearly 300 PB of active data on tape. Because of the necessary development work to ensure Enstore will work at expected HL-LHC data scales, Fermilab is exploring the use of CTA to replace it. We will report on the progress of this evaluation, including the deployment of CTA using containerized systems as well as the ability to read tapes formatted with CPIO tape wrappers.

dCache integration with CTA

9 March 2022
6th EOS Workshop, CERN, Geneva, Switzerland

The ever increasing amount of data that is produced by modern scientific facilities like EuXFEL or LHC puts a high pressure on the data management infrastructure at the laboratories. This includes poorly shareable resources of archival storage, typically, tape libraries. To achieve maximal efficiency of the available tape resources a deep integration between hardware and software components are required.

The CERN Tape Archive (CTA) is an open-source storage management system developed by CERN to manage LHC experiment data on tape. Although today CTA's primary target is CERN Tier-0, the data management group at DESY considers the CTA as an main alternative to commercial HSM systems.

dCache has an flexible tape interface which allows connectivity to any tape system. There are two ways that a le can be migrated to tape. Ether dCache calls a tape system specific copy command or through interaction via an in-dCache tape system specific driver. The latter has been shown (by NDGF, TRIUMF and KIT Tier-1s), to provide better resource utilization and efficiency. Together with the CERN Tape Archive team dCache developers working on seamless integration of CTA into dCache.

This presentation will show the design of dCache-CTA integration, current status and first test results at DESY.

CTA Status and Roadmap

9 March 2022
6th EOS Workshop, CERN, Geneva, Switzerland

CTA entered into production at CERN in 2020 and physics data taking into CTA started in July 2021. 2022 will see the start of LHC Run-3, with combined experiment data rates up to 40 GB/s. This presentation will give an overview of CTA's preparation and readiness for the upcoming Run, as well as a look forward to software features in the development pipeline.

How to enable EOS for tape

9 March 2022
6th EOS Workshop, CERN, Geneva, Switzerland

An EOSCTA instance is an EOS instance commonly called a tape buffer configured with a CERN Tape Archive (CTA) back-end. This EOS instance is entirely bandwidth oriented: it offers an SSD-based tape interconnection, it can contain spinning disks if needed and it is optimized for the various tape workflows. This talk will present how to enable EOS for tape using CTA and the Swiss horology gears in place to maximize tape hardware usage while meeting experiment workflow requirements.

Configuring user access control in CTA

9 March 2022
6th EOS Workshop, CERN, Geneva, Switzerland

CTA uses access mechanism provided by EOS and adds tape-specific layer. If one of these elements is misconfigured, a user won't be able to read a file, or, on the contrary, unauthorized access can be granted.

This talk explains how the combination of the ACL, Unix permissions and mount rules works in CTA. We show which tools we use for the permissions management and what are capabilities and limitations of our system.

Tape Drive Status Lifecycle

9 March 2022
6th EOS Workshop, CERN, Geneva, Switzerland

Explanation of the CTA Tape Drive status during a data transfer session.

EOSCTA file restoring

9 March 2022
6th EOS Workshop, CERN, Geneva, Switzerland

This talk sumarizes the new file restoring feature of CTA, how it works, how to configure it, when it should be used and it's current limitations.

Maintaining consistency in an EOSCTA system

9 March 2022
6th EOS Workshop, CERN, Geneva, Switzerland

This presentation summarizes the current effort to detect, and therebye subsequenly remedy, inconsistencies in the file metadata stored on EOS and CTA. We show how we combine and validate EOSCTA namespaces in order to produce a summary of healthy files for experiments and a troubleshooting tool for operators.

An HTTP Rest API as SRM replacement for tape access

9 March 2022
6th EOS Workshop, CERN, Geneva, Switzerland

Imagine a world where SRM is no longer needed to dialogue with tape storage systems. A world where only one standard protocol can be used across the entire WLCG to access tape storage systems. This dream will soon become reality on EOS...

After several discussions about the specifications of the new WLCG tape REST API, a prototype of the final API has been developed in EOS. In order to give a good idea of the functionalities the API offers, I will do a comparison between the current XRootD workflows at CERN and the new HTTP ones that will be used once the REST API will be deployed.

The CERN Tape Archive : Archival Storage for Scientific Computing

26 January 2022
Workshop on Cloud Storage, Synchronization and Sharing, CS3 2022

The CERN Tape Archive (CTA) is the tape back-end to EOS disk. CTA went into production in June 2020 and currently stores around 400 Petabytes of physics data. During 2022, CTA will ramp up to full production data-taking volumes, with the start of Run-3 of the Large Hadron Collider (LHC). CTA is an open-source system which is being evaluated and adopted by a number of scientific institutes besides CERN.

This presentation covers the outlook for archival storage and gives an overview of how tape storage fits into CERN's integrated storage strategy and the suite of storage and data transfer products/services provided by CERN's IT Department.

2021

Migration from CASTOR to the CERN Tape Archive

27 October 2021
HEPiX Autumn 2021 Online Workshop

CASTOR was used as CERN's primary archival storage system for the last two decades, including Run-1 and Run-2 of the LHC. For Run-3, CASTOR has been replaced by the CERN Tape Archive (CTA). At the end of Run-2, there were 340 Petabytes of data stored in CASTOR, which had to be migrated to CTA during Long Shutdown 2. Over 90% of this data is an active archive—the custodial copy of physics data belonging to the four LHC experiments and around a dozen smaller experiments at CERN. The migration and switch from CASTOR to CTA had to be accomplished with minimal interruption to experiment activities; to further complicate the problem, each experiment has a slightly different workflow and data management stack. This presentation will describe our experiences and lessons learned during the two-year period of the migration.

Tape Storage: Update on CASTOR to CTA Migration

18 October 2021
34th IT Technical Users Meeting (ITUM–34), CERN, Geneva, Switzerland

Since the last ITUM, all remaining SME experiments have been migrated to CTA, and SPS experiments have started data taking with archival to CTA. There remain a few CASTOR use cases still to be migrated: backups, some projects, cold/legacy data, and user home directories. We will give a progress update on the clean-up of CASTOR home directories and present the plan to migrate personal data to CERNBox.

CTA production experience

17 March 2021
HEPiX Spring 2021 Online Workshop

The CERN Tape Archive is the tape back-end to EOS and the replacement for CASTOR as the Run-3 physics archival system. The EOSCTA service entered production at CERN during summer 2020 and since then the four biggest LHC experiments have been migrated. This talk will outline the challenges and the experience we accumulated during CTA service production ramp-up as well as an updated overview of the next milestones towards Run-3 final deployment.

ALICE and the CTA Garbage Collectors

3 March 2021
5th EOS Workshop, CERN, Geneva, Switzerland

In the standard layout of an EOSCTA deployment there are two SSD buffers in front of the tape drives. One is called the "default" space and is used for writing files to tape and the other is called the "retrieve" space and is used for reading them back. These buffers prevent direct file transfers between HDDs and tape drives. Such direct transfers would suffer from the unacceptable performance penalties incurred by mixing the preferred access patterns of disk and tape. A HDD usually has thousands of concurrently open files with data bandwidth being shared across them. A tape drive on the other hand simply reads or writes one file at a time at high speed. The mechanical thrashing of a HDD that is associated with thousands of open files may be acceptable to end users but it is unacceptable to a tape drive requiring high bandwidth for a single file. The lifetime of the files within the two SSD buffers is relatively short. Files being written to tape are deleted from the default space as soon as they have been safely stored on tape. Files being retrieved from tape are deleted from the retrieve space as soon as they have been copied to their destination system. The layout of the EOSCTA deployment for ALICE experiment is different from the standard layout because it has an additional HDD disk cache called the "spinners" space which sits between the retrieve SSD buffer and the ALICE end users. The spinners space is a true disk cache because the lifetime of files within it are relatively long. These files are automatically deleted by one of two garbage collectors when space needs to be freed up in order to make room for newly retrieved files. This workshop presentation describes the ALICE HDD disk cache and the automatic garbage collectors that free up space within it.

A brief overview of the CTA mount scheduling logic

3 March 2021
5th EOS Workshop, CERN, Geneva, Switzerland

Accessing data in a tape archival system can be costly in terms of time. The time taken to mount a tape into a drive, to position the tape head to a file and to unmount the tape when this file has been read can take more than 2 minutes. A tape drive cannot be used to archive or retrieve data during the mounting and unmounting of a tape. We therefore need a solution to avoid mounting a tape when it is not worth it. Indeed, imagine a user who retrieves a single file from a tape and then 5 minutes later wants another file from the same tape. Without the CTA scheduling logic, the drive would lose twice the amount of mount, unmount and positioning time! A CTA tape server contains the scheduling logic that decides when to mount a tape in order to optimise drive usage for reading and writing data. The aim of this presentation is to explain the different elements taken into account by the scheduler of each CTA tape server to decide whether or not a tape is worth mounting.

CTA best practices for data taking workflows

3 March 2021
5th EOS Workshop, CERN, Geneva, Switzerland

There is significant diversity in the Data Acquisition (DAQ) systems of the non-LHC experiments supported at CERN. Each system can potentially have its own data taking software and helper scripts, and each can use their preferred data transfer commands and apply different checks and retry policies. The task of the CERN Tape Archive (CTA) team is to provide support for all of these different use cases and to define the best practices for integrating with an EOSCTA instance. In this talk we will present an overview of typical DAQ workflows and discuss which protocols, commands and APIs we recommended to use with EOSCTA. We will provide examples of submitting archive and retrieve requests using FTS and XRootD tools. We will explain how to monitor the status of a file on tape. We will explain the best way to ensure a file is safely stored on tape. We will also give an overview of the CTA authentication policies.

Running an EOS instance with tape on the back

3 March 2021
5th EOS Workshop, CERN, Geneva, Switzerland

An EOSCTA instance is an EOS instance commonly called a tape buffer configured with a CERN Tape Archive (CTA) back-end. This EOS instance is entirely bandwidth oriented: it offers an SSD based tape interconnection, it can contain disks if needed and it is optimized for the various tape workflows. This talk will present the specific details of the EOS tape buffer tweaks and the Swiss horlogery gears in place to maximize tape hardware usage while meeting experiment workflow requirements.

EOS+CTA WorkFlows: Tape Archival and Retrieval

3 March 2021
5th EOS Workshop, CERN, Geneva, Switzerland

The CERN Tape Archive (CTA) is the tape back-end to EOS. EOS provides an event-driven interface, the WorkFlow Engine (WFE), which is used to trigger the processes of archival and retrieval. When EOS is configured with its tape back-end enabled, the CREATE and CLOSEW (CLOSE Write) events are used to trigger the archival of a file to tape, while the PREPARE event triggers the retrieval of a file from tape and the creation of a disk replica. This talk will present the details of these tape-related workflows, including the state machine for the processes of archival and retrieval, and the metadata which is communicated between EOS and CTA.

2020

CTA Roadmap

23 November 2020
2nd HSF WLCG Virtual Workshop

Status update and roadmap for CTA, looking forward to the start of Run–3. Part of the Storage Technologies track.

CERN Tape Archive (CTA) in Production : ATLAS migration report and next steps

29 June 2020
30th IT Technical Users Meeting (ITUM–30), CERN, Geneva, Switzerland

The ATLAS experiment is migrating from CASTOR to CTA during the week of 22 June 2020. This marks the end of the CASTOR ATLAS service and the first production instance of CTA. This talk reports on the migration of ATLAS and gives a status update on the migration of the other experiments.

Migrating from CASTOR to the new CERN Tape Archive (CTA)

17 February 2020
29th IT Technical Users Meeting (ITUM–29), CERN, Geneva, Switzerland

The CERN Tape Archive is the tape back-end to EOS and the replacement for CASTOR. CTA will go into production service in March 2020 and all LHC experiments will be migrated to the new system by the end of LS2. This talk outlines the plans for the migration from CASTOR to CTA.

EOS+CTA: Adding tape storage capability to EOS

3 February 2020
4th EOS Workshop, CERN, Geneva, Switzerland

The CERN Tape Archive (CTA) is the tape back-end to EOS. Configuring EOS to work with CTA allows event-based triggering of tape archivals and retrievals. As well as controlling the tape hardware (libraries and drives), CTA provides an advanced queue manager and scheduler to manage how and when tapes will be mounted, to optimise the use of the tape infrastructure. This presentation will provide an overview of CTA's features and how they integrate with EOS.

2019

CTA Deployment and Migration from CASTOR

11 December 2019
Grid Deployment Board (GDB), CERN, Geneva, Switzerland

CTA production status and migration plan.

Current Status of Tape Storage at CERN

17 October 2019
HEPiX Fall/Autumn 2019 Workshop, Amsterdam, The Netherlands

The IT storage group at CERN provides tape storage to its users in the form of three services, namely TSM, CASTOR and CTA. Both TSM and CASTOR have been running for several decades whereas CTA is currently being deployed for the very first time. This deployment is for the LHC experiments starting with ATLAS this year. This contribution describes the current status of tape storage at CERN and expands on the strategy and architecture of the current deployment of CTA.

Why do we still use Tape at CERN?

Version français : Pourquoi utilisons-nous encore les bandes magnétiques au CERN ?
14–15 September 2019
CERN Open Days, CERN, Switzerland

The CERN Tape Archive: Preparing for the Exabyte Storage Era

14 June 2019
Mini-symposium on the Exabyte Data Challenge
6th Platform for Advanced Scientific Computing Conference (PASC 2019), Zürich, Switzerland

The High Energy Physics experiments at CERN generate a deluge of data which must be efficiently archived for later retrieval and analysis. During the first two Runs of the LHC (2009–2018), over 250 PB of physics data was collected and archived to tape. CERN is facing two main challenges for archival storage over the next decade. First, the rate of data taking and the total volume of data will increase exponentially due to improvements in the luminosity and availability of the LHC and upgrades to the detectors and data acquisition system. Data archival is expected to reach 150 PB/year during Run–3 (2021–2023), increasing to 400 PB/year during Run–4 (2025–). The integrated total data on tape will exceed one Exabyte within a few years from now. Second, constraints in available computing power and disk capacity will change the way in which archival storage is used by the experiments. This presentation will describe these challenges and outline the preparations that the CERN IT Storage Group are making to prepare for the Exabyte storage era.

2018

CERN Tape Archive Initial Deployments

12 October 2018
HEPiX Fall/Autumn 2018 Workshop, Barcelona, Spain

CTA is designed to replace CASTOR as the CERN Tape Archive solution, in order to face scalability and performance challenges arriving with LHC Run–3. This presentation will give an overview of the initial software deployment on production grade infrastructure. We discuss its performance against various workloads: from artificial stress tests to production condition data transfer sessions with an LHC experiment. It will also cover CTA's recent participation to the Heavy Ion Data Challenge and a roadmap for future deployments.

CERN Tape Archive (CTA)—from Development to Production Deployment

9 July 2018
23rd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2018), Sofia, Bulgaria

The first production version of the CERN Tape Archive (CTA) software is planned to be released for the end of 2018. CTA is designed to replace CASTOR as the CERN tape archive solution, in order to face scalability and performance challenges arriving with LHC Run-3. This contribution will describe the main commonalities and differences of CTA with CASTOR. We outline the functional enhancements and integration steps required to add the CTA tape back-end to an EOS disk storage system. We present and discuss the different deployment and migration scenarios for replacing the five CASTOR instances at CERN, including a description of how FTS will interface with EOS and CTA.

2017

The Outlook for Archival Storage at CERN

19 October 2017
HEPiX Fall/Autumn 2017 Workshop, Tsukuba, Japan

The CERN Physics Archive is projected to reach 1 Exabyte during LHC Run 3. As the custodial copy of the data archive is stored on magnetic tape, it is very important to CERN to predict the future of tape as a storage medium. This talk will give an overview of recent developments in tape storage, and a look forward to how the archival storage market may develop over the next decade. The presentation will include a status update on the new CERN Tape Archive software.

CTA—CERN Tape Archive

8 February 2017
Grid Deployment Board (GDB), CERN, Geneva, Switzerland

What, Why and When; CTA and the Tier–1s

2016

CTA: CERN (CASTOR) Tape Archive Rationale and Status

22 January 2016
Storage Development Workshop, CERN, Geneva, Switzerland

What is CTA? Requirements of a tape storage system. Architecture of CTA. Rationale and status. Prototype/proof of concept.

Email

CTA Support

Address

CERN
Esplanade des Particules 1
Geneva, 1211
Switzerland