Meyrin, Switzerland, sits serenely near the Swiss-French border, surrounded by green fields and the beautiful Rhône river. But a hundred meters beneath the surface, protons traveling at nearly the speed of light collide and create spectacular displays of subatomic fireworks inside the experimental detectors of the Large Hadron Collider at CERN, the European particle physics laboratory.
One detector, called ATLAS, is five stories tall and has the largest volume of any particle detector in the world. It captures the trajectory of particles from collisions that happen a billion times a second and measures their energy and momentum. Those collisions produce incredible amounts of data for researchers to scour, searching for evidence of new physics. For decades, scientists at ATLAS have been optimizing ways to archive their analysis of that data so these rich datasets can be reused and reinterpreted.
Twenty years ago, during a panel discussion at CERN’s First Workshop on Confidence Limits, participants unanimously agreed to start publishing likelihood functions with their experimental results. These functions are essential to particle physics research because they encode all the information physicists need to statistically analyze their data through the lens of a particular hypothesis. This includes allowing them to distinguish signal (interesting events that may be clues to new physics) from background (everything else) and to quantify the significance of a result.
As it turns out, though, getting a room full of particle physicists to agree to publish this information was the easiest part.
In fact, it was not until 2020 that ATLAS researchers actually started publishing likelihood functions along with their experimental results. These “open likelihoods” are freely available on the open-access site HEPData as part of a push to make LHC results more transparent and available to the wider community.
“One of my goals in physics is to try and make it more accessible,” says Giordon Stark, a postdoctoral researcher at the University of California, Santa Cruz, who is on the development team for the open-source software used to publish the likelihood functions.
The US Department of Energy's Office of Science and the National Science Foundation support US involvement in the ATLAS experiment.
Stark says releasing the full likelihoods is a good step toward his goal.
The problem with randomness
Why are likelihoods so essential? Because particle collision experiments are inherently random. Unlike in a deterministic experiment, where a researcher does “x” and expects “y” to happen, in a random experiment (like throwing dice or colliding beams of protons), a researcher can do “x” the same way every time but can only predict the random outcome probabilistically.
Because of the inherent randomness of particle interactions in the ATLAS detector, physicists need to construct what is called a “probability model” to mathematically describe the experiment and form meaningful conclusions about how the resulting data relate to a theory.
The probability model is a mathematical representation of all the possible outcomes. It’s represented by the expression p(x|θ): the probability “p” of obtaining data “x,” given the parameters “θ.”
The data are observations from the ATLAS detector, while the parameters are everything influencing the system, from the laws of physics to the calibration constants of the detector. A few of these parameters are central to a physicist’s model (they’re called “parameters of interest”—things like the mass of the Higgs boson), but hundreds of other “nuisance parameters” (things like detector responses, calibration constants and the behavior of the particles themselves) also need to be taken into account.
When experimentally observed data are plugged into the probability model, they return a likelihood function, which determines the values of the model’s parameters that best describe the observed data.
Importantly, the process answers the question of how likely it would be for a physicist’s theory to have produced the data they observe.
A new tool comes to the rescue
When you consider the hundreds of parameters in an ATLAS analysis, each with their respective uncertainties, along with the layers of functions relating the parameters to each other, calculating the likelihoods gets pretty complicated—and so does presenting them. While likelihoods for one or two parameters can be plotted on a graph, this clearly isn’t possible when there are hundreds of them—making the question of how to publish the likelihoods much more challenging than whether this should be done.
In 2011, ATLAS researchers Kyle Cranmer, Wouter Verkerke and their team released two tools to help with this. One, called the RooFit Workspace, allowed researchers to save their likelihoods in a digital file. The other, called HistFactory, made it easier for users to construct a likelihood function for their theory. Since then, the HistFactory concept has evolved into an open-source software package, spearheaded by Stark and fellow physicists Matthew Feickert and Lukas Heinrich, called pyhf [pronounced in three syllables: py h f].
Cranmer says it’s important to understand that pyhf isn’t some magical black box where you put data in and get a likelihood out. Researchers need to make lots of decisions in the process, and “every little bit of that likelihood function should be tied to part of the justification that you have for it and the story that you’re telling as a scientist,” he says.
After interpreting these decisions, pyhf exports the probability model in a plain-text, easy-to-read format called JSON that can be read across a range of platforms, making it easier for other researchers to access the likelihood function and see how the analysis was done.
“The important part is that it’s readable,” says Cranmer. “The stuff you’re reading is not some random, technical gobbledygook. It’s tied to how a physicist thinks about the analysis.”
Making old data do new work
Before the RooFit Workspace came along, the thousands of researchers involved in the ATLAS collaboration had no standardized way to format and store data likelihood functions. Much of the meticulous data analysis was done by PhD students who eventually graduated and left for new positions, taking their intimate familiarity with likelihood construction along with them.
Without the full likelihood function, it’s impossible to reproduce or reinterpret ATLAS data from published results without having to make possibly crude approximations. But with the layers of rich metadata embedded in the pyhf likelihoods, including background estimates, systematic uncertainty and observed data counts from the detector, scientists have everything they need to mathematically reconstruct the analysis. This allows them to reproduce and reinterpret previously published results without repeating the time-consuming and expensive process of analyzing the data from scratch.
Public likelihoods also provide fantastic opportunities for reinterpretation by theorists, says Sabine Kraml, a theoretical physicist at the Laboratory of Subatomic Physics and Cosmology in Grenoble, France, who has been involved with helping establish how LHC data, including the likelihoods, should be presented.
With full likelihood functions, theorists can calculate how well their theories fit the data collected by the detector “at a completely different level of reliability and precision," says Kraml.
To understand just how much more sophisticated and complex the analysis becomes, she says, consider the difference between a simple song and a full orchestral symphony.
Although this precise model-fitting is limited to theories that share the same statistical model as the one originally tested by the experiment—“It’s a restricted playground,” Cranmer says—there is a work-around. Full likelihoods can be put through an additional round of processing called recasting, using a service Cranmer proposed called RECAST, which generates a new likelihood function in the context of a physicist’s theory. Armed with this new likelihood, scientists can test their theories against existing ATLAS data, searching for new physics in old datasets.
So far, two ATLAS searches have been repurposed using RECAST. One used a dark-matter search to study a Higgs boson decaying to bottom quarks. The other used a search for displaced hadronic jets to look at three new physics models.
Cranmer says he hopes the ATLAS experimental community will continue to publish their likelihoods and take advantage of RECAST so the wider scientific community can test more and more theories.