This story appeared today in iSGTW.
Amid all the hype and excitement of the new physics being announced from experiments at the Large Hadron Collider in 2011, there was another, little known, cause for celebration: the anniversary of the Worldwide LHC Computing Grid (WLCG).
It was 10 years ago, in September 2001, that the huge computing grid was conceived of and approved by the CERN council, in order to handle the large volumes of data expected by the LHC. By March 2002, a plan of action had formed.
And, now that the LHC is up and running, “the biggest achievement,” said Ian Bird, the head of the WLCG at CERN, in Geneva, Switzerland, “is that is works so well and so early in the life of LHC.”
Data pours out of each of the four detectors at a ripping pace – the ATLAS detector alone produces about one petabyte per second (or 1,000,000 GB per second), and a farm of processors pares back the data, filtering out the majority of it, until 300 MB per second is chosen to be stored on the grid. One copy of the data is kept at CERN (Tier 0), while another copy of the data is transferred and shared between the 11 major computing centers (Tier 1).
“The most amazing thing is that we can actually handle this kind of data,” said Bird. “Data rates today are much higher than anything we ever planned for during a normal year of data taking.”
As well as the sheer volume of data, the WLCG has also faced the unique challenges of computationally intensive simulations, and the fact that the 8,000 or so physicists involved in the projects must be able to access the data from their home institutions around the world.
“The grid is pretty much the only way that the masses of data produced by the collider can be processed. Without it, the LHC would be an elaborate performance art project on the French-Swiss border,” wrote Geoff Brumfield in a Nature blog, following his valiant attempt to follow a single piece of data through the grid (“Down the Petabyte Highway” published January 2011).
While the WLCG computing grid is successfully handling the data today, 10 years ago, while preparations for the Large Hadron Collider were well underway, there was a hole in the funding bucket. The computing resources required to handle the avalanche of LHC data had been left behind as preparations were made for the collider.
A hole in the funding bucket
“Computing wasn’t included in the original costs of the LHC,” Les Robertson, who was the head of the computing grid from 2002 – 2008, told iSGTW in 2008.
This decision left a big hole in funding for IT crucial to the ultimate success of the LHC.“We clearly required computing,” said Robertson, “but the original idea was that it could be handled by other people.”
But by 2001, these “other people” had not stepped forward. “There was no funding at CERN or elsewhere,” Robertson said. “A single organization could never find the money to do it.”
“Early on, it became evident that, for various reasons, placing all of the computing and storage power at CERN to satisfy all the [computing] needs would not be possible. First, the infrastructure of the CERN computing facility could not scale to the required level without significant investments in a power and cooling infrastructure many times larger than what was available at the time,” Ian Bird said.
And in 2001, “CERN’s dramatic advance in computing capacity is urgent,” the press release read.
A dramatic advance in computing capacity
There were two phases to the WLCG. From 2002 to 2005, staff at CERN and collaborating institutes around the world developed prototypes, which would eventually be incorporated into the system. Then, from 2006, the LHC Computing Grid formally became a worldwide collaboration (the Worldwide LCG – WLCG), and computing centers around the world were connected to CERN to help store data and provide computing power. Throughout its lifetime, the WLCG has worked closely with large-scale grid projects such as EGEE (Enabling Grids for E-sciencE), and more recently EGI (European Grid Initiative), funded by the European Commission, and OSG (Open Science Grid) funded the National Science Foundation in the USA. Today, EGI and OSG not only support high energy physics, but a variety of other science experiments and simulations as well.
Using the grid for real computation began as early 2003, with the experiments using it to run simulations. And since 2004 a series of data and service challenges were performed (see timeline, below), to test things such as reliability of data transfers.
“The grid’s performance during the first two years of LHC running has been impressive and has enabled very rapid production of physics results,” Bird said. Data flow and hybrid clouds The first model of distributed computing, proposed in 1999 and called the MONARC model, was the model on which all the experiments originally based their own computing model. But this model was much more complicated than it has to be today, according to Bird. This complexity was added because it was thought that the weakest link in the chain would be the networks linking together all the computing centers and allowing for fast, reliable data transfer. Today, computing models are more likely to see extensive data flows between Tier 2 and Tier 3 centres.
As well as a new model for data flow, other future challenges include the use of multicore and other CPU types, the replacement of certain components of grid middleware with more standard software and the use of virtualization. And the use of cloud computing is a matter of “when, not if” Bird said.
“The LHC computing environment will outlive the accelerator itself, but it will evolve along with technology and is likely to become very different over the next few years,” Bird said.