NOAA Deploys Globus in R&D labs to accelerate data transfers

May 30, 2024   |  Susan Tussy

Five years ago the Research And Development (R&D) Group at the National Oceanic and Atmospheric Administration (NOAA) recognized Globus as a good solution for fast, reliable data transfer. So they subscribed to the Globus service and began deploying Globus endpoints across their High Performance Computer (HPC) resources. Today, NOAA’s on-premises R&D HPC systems, as well as multiple public cloud storage systems, are accessible via Globus. While NOAA researchers have a choice of tools, they are encouraged to use Globus and take advantage of the parallelism and data transfer speeds the service provides. This is particularly helpful for the large datasets being transferred, such as those needed as input for initializing models, and the model output files needed by a diverse set of users. Now, additional labs like the Global Systems Laboratory (GSL) and its Data Services Group (DSG) are also setting up Globus endpoints.

Democratizing Access to Data in the Cloud through Globus

The NOAA Open Data Dissemination (NODD) service was created to provide the public with free and easy data access to NOAA’s high-quality environmental data via the cloud. As a result of NODD, experimental models can be shared with the public. Large data sets—typically 30 TB per session—are transferred from on-premises to cloud storage for distribution to external collaborators using the Globus service and the Globus Amazon Web Service (AWS) Simple Storage Service (S3) connector.

Image of data transfers to the cloud
Multiple entities provide the data to NODD (Image courtesy of NOAA)

Moving large datasets with Globus

NOAA’s operational and experimental weather forecasting models run on regular schedules to keep up with the ever-changing atmosphere – every six hours for global models, and typically every hour for the high-resolution models that detail the evolution of finer scale weather events. Before the models can run, they must have large volumes of input data: atmospheric observations, radar and satellite measurements, and 3-D data from previous model runs. All these data must be delivered on time to meet tight HPC job schedules. Likewise, the large output files that provide the model forecast solutions to forecasters, emergency managers, researchers and others must be delivered as quickly as possible for maximum value.

One particularly large data set used in a statistical “initial conditions’’ analysis includes 80 member files of a global ensemble run. The 80 2GB files are transferred with Globus four times a day from the NOAA R&D HPC system in West Virginia to another HPC system in Colorado where NOAA’s next generation high resolution weather model is being developed and tested.

Creating a weather-ready nation

Image of Near Surface Smoke
Image Of Near Surface Smoke (Courtesy of NOAA)

As we continue to experience climate change NOAA is delivering more services, and making more data and services publicly available to create a weather-ready nation. For example, they are playing a vital role in supporting federal, state, local, and tribal partners to prepare for the threat of wildfires. Their near-surface smoke concentrations forecast is one example of a service which is instrumental in assisting people combating wildfires. In this scenario the Data Services Group acquires the necessary data sets and manages the delivery of graphics files from the HPC systems to GSL’s web servers using Globus.