In late 1801 the orbit of the newly discovered asteroid Ceres carried it behind the sun, and astronomers worried they had lost it forever. A young mathematical prodigy named Carl Friedrich Gauss developed a new statistical technique to find it. Called “least squares regression,” that technique is now a fundamental method of statistical analysis.
For about 200 years after that, however, astronomers and statisticians had little to do with one another. But in the last decade or so, astronomy and statistics have finally begun to formalize a promising relationship. Together they are developing the new discipline of astrostatistics.
Jogesh Babu, a Pennsylvania State professor and the director of the Penn State Center for Astrostatistics, remembers when the new age of astrostatistics dawned for him. Twenty-five years ago, when Babu’s focus was statistical theory, astronomy professor Eric Feigelson asked to meet with him to talk about a problem. At the end of the conversation, Babu says, “we realized we both speak English but we didn’t understand a word the other said.”
To address that disconnect, the statistician and the astrophysicist organized a continuing series of conferences at Penn State. They also wrote a book, Astrostatistics, which effectively christened the new field. But collaborations between astrophysicists and statisticians remained small and scattered, only really starting to pick up in 2006, says Babu.
“The development of statistical techniques useful to advanced astronomical research progressed very slowly, and until recently most all analyses had to be done by hand,” says statistician Joseph Hilbe, a statistics professor at Arizona State University. Before the advent of computers with sufficient capacity to do the work, certain useful calculations could take statisticians weeks to months to complete, he said.
In addition, says Tom Loredo, an astrostatistician at Cornell University, “astrophysicists are some of the more mathematically literate scientists, and we thought we could do it on our own.”
Other fields had already embraced statistics. Statistics is vital to all branches of biology—especially epidemiology, medical research, and public health—and geology. In fact, in the 1990s Hilbe developed some of the first advanced statistical tools used to analyze Medicare data. Statisticians also contribute to the social sciences, economics, environmental and ecological sciences, and to the insurance and risk analysis industries.
Slowly, though, astronomers began to realize that they might be able to benefit from the expert help of a statistician.
“I believe the large surveys shocked astronomers with how much data there is,” Hilbe says. “The Sloan Digital Sky Survey [one of the first digitized comprehensive astronomical sky surveys] told them they needed statistics.”
Although he was aware of Babu’s and Feigelson’s nascent community, Hilbe decided to go bigger. He founded the International Statistical Institute’s Astrostatistics Interest Group, the first interest group or committee authorized by an astronomical or statistical association, in 2008. The formation of working groups within the American Astronomical Society and the International Astronomical Union followed in 2012. In the same year Hilbe was elected the first president of the newly formed International Astrostatistical Association.
All told, about 700 scientists belong to the various groups, which have been gathered together under the umbrella of the Astrostatistics and Astroinformatics Portal, hosted by Penn State and with Feigelson and Hilbe as co-editors. The IAA also sponsors the new Cosmostatistics Consortium.
One of the recently formed groups is the LSST Informatics and Statistical Science collaboration, organized in preparation for the Large Synoptic Survey Telescope, which, beginning in 2022, will photograph the entire southern sky every three days for 10 years. Babu and his collaborator Feigelson are members, as is Loredo.
“One of the virtues of big data is that it gives you access to rare events,” Loredo says. He likens it to sifting through the trillions of bytes of Large Hadron Collider data to find a handful of Higgs bosons.
“Now that we have a billion galaxies, what are the rare events that we wouldn’t ever see in only a million galaxies? Studying those will require statistical methods that are as good with small data sets as with big data sets.”