Bright lights, big data: how Argonne is bringing supercomputing and X-rays together for scientific breakthroughs
With the combination of Argonne’s supercomputer, Polaris, and the powerful X-ray beams of the Advanced Photon Source, the future of science is ultrafast
April 13, 2023 | Argonne National Laboratory
By Andre Salles
The Advanced Photon Source (APS) is one of the most productive X-ray light sources in the world. In a typical year, roughly 5,500 scientists from around the world use the ultrabright light beams it generates to look deep into all kinds of materials. Researchers using the APS are able to catch the movement of single ions through a battery, trace even the subtlest changes in catalysts as they react and observe the makeup of proteins atom by atom.
But for those researchers, what their various X-ray techniques detect is only half the story. That data then needs to be analyzed, and for that, scientists need advanced computing. That’s what the Argonne Leadership Computing Facility (ALCF) provides, with its array of powerful analysis machines. The APS and ALCF are Department of Energy (DOE) Office of Science user facilities at DOE’s Argonne National Laboratory, that are open to the world’s scientific community.
“We are creating a new generation of smart instruments, in which advanced computing is not just an important adjunct to experiment, but an integral part of the scientific apparatus. We expect the new computationally enhanced APS to enable new discoveries in many domains.”
–Ian Foster, Argonne National Laboratory
This marriage of experimental science and data analysis leads to faster breakthroughs in many scientific areas, from drug discovery to materials science. But as fast as the combined power of these facilities is — and it is incredibly fast, even compared with the speeds of a few years ago — the goal is always to get faster. To speed up scientific discovery, the APS must capture more data more quickly, and the ALCF must process and return that data at lightning speed.
To that end, the APS is undergoing an extensive upgrade that will increase the brightness of its X-ray beams by up to 500 times. When the upgraded APS goes online in 2024, it will enable scientists to see things at scales we can scarcely imagine, and capture data at exponentially faster rates. The ALCF, meanwhile, is deploying more powerful supercomputers and enhanced data transfer capabilities to allow researchers to see the analysis of their data more quickly.
In August 2022, the ALCF unveiled Polaris, the latest supercomputer to join its ranks. Built by Hewlett Packard Enterprise, Polaris is the ALCF’s most powerful system yet. Since its arrival, Polaris has been helping to enable science across the Argonne campus and at other institutions, carrying out complex data analysis tasks in a fraction of the time the ALCF’s previous systems were capable of.
“Polaris is here to serve the scientific community, including the thousands of scientists using Argonne’s user facilities,” said Michael Papka, director of the ALCF and a deputy associate laboratory director at Argonne. “We’ve dedicated four racks of nodes to look at the integration of experimental science and high performance computing, and we’re excited about how we can implement the capabilities of Polaris across many DOE user facilities.”
One of those facilities is the APS, and leading the charge there is Nicholas Schwarz, principal computer scientist and group leader. Schwarz has been working for months with his colleagues at the ALCF to test out faster data processing with Polaris. The eventual goal, he says, is real-time autonomous data analysis that scientists can use to drive their experiments.
Imagine, he says, that you are trying to use X-rays to trace microscopic cracks as they form in a new type of material. You want to be able to not only identify where the cracks are, but quickly train the X-ray instruments on the likely spot where they will form next, to see how the material behaves. Getting your data back in seconds instead of hours will enable you to change the experiment as it’s running, to get the most and best observations possible.
“The computing needs to be ready for the science,” Schwarz said. “You can’t tell a material to stop cracking or a cell to stop dividing until computing resources are ready. You can’t wait months to get analyzed data back.”
With Polaris, this is exactly what Schwarz and his colleagues at the ALCF have been testing. Using data from four different X-ray techniques — all of which will be greatly enhanced by the upgraded APS — the team has been working on using Polaris to respond to urgent scientific data requests and turn them around without delay.
This sounds simple, but Bill Allcock, director of operations at the ALCF, will tell you that it’s more complicated than it seems. The ALCF serves many different facilities and scientific endeavors at once, and scheduling computing time on the machine is a constantly shifting proposition. Among the biggest issues the ALCF team is working out with Polaris is preemption, or recognizing which projects are more urgent than others and moving them to the front of the queue.
“We need near-real-time analysis for deadline-sensitive jobs, which conflicts with our traditional workload of large, long running jobs,” Allcock said. “To manage that conflict efficiently, we need to look at the available rack space and find out where those jobs fit. It’s like playing Tetris. With proper scheduling we can keep the racks busy and still make room when jobs show up that need the time quickly.”
The team recently finished their first fully automated end-to-end test of the preemptible queues on Polaris using data collected during an APS experiment. The process relies on Globus, a research automation platform created by researchers at Argonne and the University of Chicago, to run the computational flows that link the two facilities. Globus manages the numerous high-speed data transfers, ALCF computations and data cataloging and distribution steps involved in an experiment.
“Earlier this year, we successfully carried out our first experimental runs with no humans in the loop,” Papka said. “This is truly the vision we have been working toward, and now through a colossal effort by the Argonne-Globus team, we have it working beyond the one-off demonstrations of the past. The goal is to enable this at as many APS experiment stations as possible ahead of the upgrade.”
Though a powerful machine in its own right, Polaris also serves as the next step on the road that leads to Aurora, Argonne’s first exascale supercomputer. Aurora is currently being installed at the ALCF, and when it is complete, its processing capabilities will dwarf those of Polaris — it will be able to deliver more than 2 billion billion calculations per second.
Aurora is scheduled to come online later this year. Meanwhile the upgraded APS will shine its first light in 2024, after a year-long installation period during which the X-ray beams will shut down. The goal of teams at both the APS and the ALCF is to enable as much science and data processing speed as possible on the first day the upgraded APS is online.
But both teams know that the capabilities of their combined efforts will only grow from there.
“For APS users, our goal is to have all the computing power for them when they need it, on demand,” Schwarz said. “From a scientist’s perspective, the bottleneck is not having analyzed data when the experiment needs it. For the ALCF, this is a capability that can be emulated for other user facilities, and can serve as a model for how experimental and observational facilities integrate with computing centers.”