2024 Improving Scientific Software Conference
- Lee Liming
- Boulder, CO and Virtual
Globus will be giving the following talk at this year’s conference:
Date: April 18, 2024, 11:30-12 noon
Title: Modernizing the Earth System Grid Federation’s Data Repositories: Leveraging Institutional Capacity and PaaS
Abstract: The Earth System Grid Federation (ESGF) is an international federation of data repositories for climate model simulation outputs, part of the World Climate Research Programme (WCRP). More than 35 global repositories—using a common software base—currently hold 26M+ datasets (~50 PB of data, including replicas). The federation has been active for close to 15 years with the most recent redesign of its hosting software in 2019. In 2022, international participants including the U.S. Dept. of Energy’s ESGF2-US project (Oak Ridge National Laboratory, Lawrence Livermore National Laboratory, Argonne National Laboratory) began a new modernization program. Key goals are to increase repository scalability in preparation for the next wave of climate simulations in 2025, improving repository sustainability, and broadening access to data analysis capabilities. Key strategies of this modernization program are: leveraging institutional mass storage and computing capabilities (e.g., the Argonne and Oak Ridge Leadership Class Facilities) and replacing self-hosted repository software with broadly available platform-as-a-service (PaaS) offerings (e.g., Amazon Web Services and Globus). Results include significantly increased capacity in both storage and server-side computing environments, a lightweight and open source software codebase for data publication workflows, data repositories, and new mechanisms for server-side data analysis. In this talk, we will describe the current and future architecture of ESGF2-US data repositories (authentication, search index, data storage and access, compute capabilities), progress to date and some of the challenges we’ve overcome, key design touchpoints with the international federation (API agreements and open design), and the impact of these changes on future code maintenance and repository sustainability.