By Erica Yee • August 28, 2018 The Science IT Profiles series highlights how the Scientific Computing Group supports the work of Berkeley Lab researchers spanning various disciplines. ![]() Scientists can’t go back in time to witness the origin of the universe, so they do the next best thing: recreate the conditions of the early universe in the laboratory. A few microseconds after the Big Bang, the universe was filled with hot, dense matter called the quark-gluon plasma (QGP). As the universe cooled, the quarks and gluons coalesced into protons and neutrons — the subatomic particles contained in the nuclei of atoms, which make up all ordinary matter in today’s universe. Scientists study this transition by colliding heavy nuclei at the CERN’s Large Hadron Collider (LHC). These collisions generate temperatures more than 100,000 times hotter than the center of the sun, causing the protons and neutrons in these nuclei to “melt” and create QGP. ![]() The conventional way of looking at phases of matter — solid, liquid, gas — can be applied to nuclear matter at various temperatures and densities. (Credit: Peter Jacobs) ![]() Connecting researchers to data stored around the world The LHC runs heavy-ion experiments with ALICE for four weeks each year. The huge amount of data produced in that period — around 8 GB stored per second — is made available to ALICE researchers via the Worldwide LHC Computing Grid. About 80 computing facilities at academic and research institutions around the world make up the ALICE Grid. The distributed infrastructure allows multiple copies of data to be kept at different sites, with no single point of failure. Using SCG and National Energy Research Scientific Computing (NERSC) resources, Berkeley Lab stores 1.5 PB of the total 63 PB of data in the ALICE file system distributed around the world. Over half of the Grid’s computing power goes to simulating the heavy-ion collisions ALICE detects. “There's an extensive program to understand the performance of the experiment by doing simulated experiments,” said Porter. “This way we understand what our detector response should be and what our efficiencies are.” Any participating researcher can request analysis jobs of both simulated and experimental collision data, by accessing a virtual centralized database that knows what data are stored where. When a researcher submits a job to the central service, the job is rerouted to run locally at whichever site holds the relevant data. This task queue system eliminates the need for individual sites to figure out which remote site has the information they need. “In order to do efficient processing of the data, you don't want to be pulling the data around the network. What you do is you send the data out, park it somewhere where you know where it is and have computing that's connected to that storage,” explained Porter. “It’s such a big operation, and almost everything is automated. There are algorithms that say, ‘I’m reading some data. I’m producing some other data. I store a copy locally, but I also want to store it somewhere else on the Grid.’” That data started flowing in when systems at the Lab joined the Grid. "We've seen as much as 58 TB transferred in one day to the Berkeley Lab site as we were ramping up production," said SCG storage architect John White. ![]() Porter works with the SCG team to ensure its computing system is configured correctly to handle jobs and the storage system is operational. “The team is very responsive, and now they’re on the ALICE growth plan,” he said. “We expect to add a significant amount of resources in the next year.” The size of the Grid grows by around 20 percent annually, usually by adding more computing resources at individual sites. Over the past six years, the total number of jobs processed at any given time has grown from 30,000 to a current peak of 130,000. That number is expected to grow as the demand for scientific computing and storage continues to outstrip available resources. The Grid runs around-the-clock every day of the year, with the maintenance and management of the system distributed to the sites. “It is an extremely useful system that everyone is using all the time,” said Porter. “The fact that it's worldwide means you're not tied to any clock-time. There are physicists in China who are submitting jobs at a different time from the physicists in Europe and the U.S.” The Scientific Computing Group (also known as High Performance Computing Services) under the Science IT Department supports the mission of Lawrence Berkeley National Laboratory by providing technology and consulting support for science and technical programs, in the areas of data management, HPC cluster computing, and Cloud services. |
News >