This story appeared Dec. 21 in iSGTW.
This is the first part of a two-part series on the contribution Tevatron-related computing has made to the world of computing. This part begins in 1981, when the Tevatron was under construction, and brings us up to recent times. The second part will focus on the most recent years, and look ahead to future analysis.
Few laypeople think of computing innovation in connection with the Tevatron particle accelerator, which shut down earlier this year. Mention of the Tevatron inspires images of majestic machinery, or thoughts of immense energies and groundbreaking physics research, not circuit boards, hardware, networks, and software.
Yet over the course of more than three decades of planning and operation, a tremendous amount of computing innovation was necessary to keep the data flowing and physics results coming. In fact, computing continues to do its work. Although the proton and antiproton beams no longer brighten the Tevatron’s tunnel, physicists expect to be using computing to continue analyzing a vast quantity of collected data for several years to come.
When all that data is analyzed, when all the physics results are published, the Tevatron will leave behind an enduring legacy. Not just a physics legacy, but also a computing legacy.
In the beginning: The fixed-target experiments
1981. The first Indiana Jones movie is released. Ronald Reagan is the U.S. President. Prince Charles makes Diana a Princess. And the first personal computers are introduced by IBM, setting the stage for a burst of computing innovation.
Meanwhile, at the Fermi National Accelerator Laboratory in Batavia, Illinois, the Tevatron has been under development for two years. And in 1982, the Advanced Computer Program formed to confront key particle physics computing problems. ACP tried something new in high performance computing: building custom systems using commercial components, which were rapidly dropping in price thanks to the introduction of personal computers. For a fraction of the cost, the resulting 100-node system doubled the processing power of Fermilab’s contemporary mainframe-style supercomputers.
“The use of farms of parallel computers based upon commercially available processors is largely an invention of the ACP,” said Mark Fischler, a Fermilab researcher who was part of the ACP. “This is an innovation which laid the philosophical foundation for the rise of high throughput computing, which is an industry standard in our field.”
The Tevatron fixed-target program, in which protons were accelerated to record-setting speeds before striking a stationary target, launched in 1983 with five separate experiments. When ACP’s system went online in 1986, the experiments were able to rapidly work through an accumulated three years of data in a fraction of that time.
Entering the collider era: Protons and antiprotons and run one
1985. NSFNET (National Science Foundation Network), one of the precursors to the modern Internet, is launched. And the Tevatron’s CDF detector sees its first proton-antiproton collisions, although the Tevatron’s official collider run one won’t begin until 1992.
The experiment’s central computing architecture filtered incoming data by running Fortran-77 algorithms on ACP’s 32-bit processors. But for run one, they needed more powerful computing systems.
By that time, commercial workstation prices had dropped so low that networking them together was simply more cost-effective than a new ACP system. ACP had one more major contribution to make, however: the Cooperative Processes Software.
CPS divided a computational task into a set of processes and distributed them across a processor farm – a collection of networked workstations. Although the term “high throughput computing” was not coined until 1996, CPS fits the HTC mold. As with modern HTC, farms using CPS are not supercomputer replacements. They are designed to be cost-effective platforms for solving specific compute-intensive problems in which each byte of data read requires 500-2000 machine instructions.
CPS went into production-level use at Fermilab in 1989; by 1992 it was being used by nine Fermilab experiments as well as a number of other groups worldwide.
1992 was also the year that the Tevatron’s second detector experiment, DZero, saw its first collisions. DZero launched with 50 traditional compute nodes running in parallel, connected to the detector electronics; the nodes executed filtering software written in Fortran, E-Pascal, and C.
Gearing up for run two
1990. CERN’s Tim Berners-Lee launches the first publicly accessible World Wide Web server using his URL and HTML standards. One year later, Linus Torvalds releases Linux to several Usenet newsgroups. And both DZero and CDF begin planning for the Tevatron’s collider run two.
Between the end of collider run one in 1996 and the beginning of run two in 2001, the accelerator and detectors were scheduled for substantial upgrades. Physicists anticipated more particle collisions at higher energies, and multiple interactions that were difficult to analyze and untangle. That translated into managing and storing 20 times the data from run one, and a growing need for computing resources for data analysis.
Enter the Run Two Computing Project (R2CP), in which representatives from both experiments collaborated with Fermilab’s Computing Division to find common solutions in areas ranging from visualization and physics analysis software to data access and storage management.
R2CP officially launched in 1996. It was the early days of the dot com era. eBay had existed for a year, and Google was still under development. IBM’s Deep Blue defeated chess master Garry Kasparov. And Linux was well-established as a reliable open-source operating system. The stage is set for experiments to get wired and start transferring their irreplaceable data to storage via Ethernet.
“It was a big leap of faith that it could be done over the network rather than putting tapes in a car and driving them from one location to another on the site,” said Stephen Wolbers, head of the scientific computing facilities in Fermilab’s computing sector. He added ruefully, “It seems obvious now.”
The R2CP’s philosophy was to use commercial technologies wherever possible. In the realm of data storage and management, however, none of the existing commercial software met their needs. To fill the gap, teams within the R2CP created Enstore and the Sequential Access Model (SAM, which later stood for Sequential Access through Meta-data). Enstore interfaces with the data tapes stored in automated tape robots, while SAM provides distributed data access and flexible dataset history and management.
By the time the Tevatron’s run two began in 2001, DZero was using both Enstore and SAM, and by 2003, CDF was also up and running on both systems.
Linux comes into play
The R2CP’s PC Farm Project targeted the issue of computing power for data analysis. Between 1997 and 1998, the project team successfully ported CPS and CDF’s analysis software to Linux. To take the next step and deploy the system more widely for CDF, however, they needed their own version of Red Hat Enterprise Linux. Fermi Linux was born, offering improved security and a customized installer; CDF migrated to the PC Farm model in 1998.
Fermi Linux enjoyed limited adoption outside of Fermilab, until 2003, when Red Hat Enterprise Linux ceased to be free. The Fermi Linux team rebuilt Red Hat Enterprise Linux into the prototype of Scientific Linux, and formed partnerships with colleagues at CERN in Geneva, Switzerland, as well as a number of other institutions; Scientific Linux was designed for site customizations, so that in supporting it they also supported Scientific Linux Fermi and Scientific Linux CERN.
Today, Scientific Linux is ranked 16th among open source operating systems; the latest version was downloaded over 3.5 million times in the first month following its release. It is used at government laboratories, universities, and even corporations all over the world.
“When we started Scientific Linux, we didn’t anticipate such widespread success,” said Connie Sieh, a Fermilab researcher and one of the leads on the Scientific Linux project. “We’re proud, though, that our work allows researchers across so many fields of study to keep on doing their science.”
Grid computing takes over
As both CDF and DZero datasets grew, so did the need for computing power. Dedicated computing farms reconstructed data, and users analyzed it using separate computing systems.
“As we moved into run two, people realized that we just couldn’t scale the system up to larger sizes,” Wolbers said. “We realized that there was really an opportunity here to use the same computer farms that we were using for reconstructing data, for user analysis.”
Today, the concept of opportunistic computing is closely linked to grid computing. But in 1996 the term “grid computing” had yet to be coined. The Condor Project had been developing tools for opportunistic computing since 1988. In 1998, the first Globus Toolkit was released. Experimental grid infrastructures were popping up everywhere, and in 2003, Fermilab researchers, led by DZero, partnered with the US Particle Physics Data Grid, the UK’s GridPP, CDF, the Condor team, the Globus team, and others to create the Job and Information Management system, JIM. Combining JIM with SAM resulted in a grid-enabled version of SAM: SAMgrid.
“A pioneering idea of SAMGrid was to use the Condor Match-Making service as a decision making broker for routing of jobs, a concept that was later adopted by other grids,” said Fermilab-based DZero scientist Adam Lyon. “This is an example of the DZero experiment contributing to the development of the core Grid technologies.”
By April 2003, the SAMGrid prototype was running on six clusters across two continents, setting the stage for the transition to the Open Science Grid in 2006.
From the Tevatron to the LHC - and beyond
Throughout run two, researchers continued to improve the computing infrastructure for both experiments. A number of computing innovations emerged before the run ended in September 2011. Among these was CDF’s GlideCAF, a system that used the Condor glide-in system and Generic Connection Brokering to provide an avenue through which CDF could submit jobs to the Open Science Grid. GlideCAF served as the starting point for the subsequent development of a more generic glidein Work Management System. Today glideinWMS is used by a wide variety of research projects across diverse research disciplines.
Another notable contribution was the Frontier system, which was originally designed by CDF to distribute data from central databases to numerous clients around the world. Frontier is optimized for applications where there are large numbers of widely distributed clients that read the same data at about the same time. Today, Frontier is used by CMS and ATLAS at the LHC.
“By the time the Tevatron shut down, DZero was processing collision events in near real-time and CDF was not far behind,” said Patricia McBride, the head of scientific programs in Fermilab’s computing sector. “We’ve come a long way; a few decades ago the fixed-target experiments would wait months before they could conduct the most basic data analysis.”
One of the key outcomes of computing at the Tevatron was the expertise developed at Fermilab over the years. Today, the Fermilab computing sector has become a worldwide leader in scientific computing for particle physics, astrophysics, and other related fields. Some of the field's top experts worked on computing for the Tevatron. Some of those experts have moved on to work elsewhere, while others remain at Fermilab where work continues on Tevatron data analysis, a variety of Fermilab experiments, and of course the LHC.
The accomplishments of the many contributors to Tevatron-related computing are noteworthy. But there is a larger picture here.
“Whether in the form of concepts, or software, over the years the Tevatron has exerted an undeniable influence on the field of scientific computing,” said Ruth Pordes, Fermilab's head of grids and outreach. “We’re very proud of the computing legacy we’ve left behind for the broader world of science.”