If your dataset is so voluminous, or it arrives at such high velocity, or it contains such great variety, that you must reimagine the way you use it—you have a big data problem.
Google, for example, has a big data problem. To produce a useful search product, it needs to crawl through the entire worldwide web. That’s a problem of volume. Google needs to update its search at the speed of the internet. That’s a problem of velocity. Google needs to categorize the near infinite topics covered online. That’s a problem of variety.
Similarly, the leap in the data-collecting capacity of the upcoming Legacy Survey of Space and Time, conducted with the world’s largest digital camera for science at NSF-DOE Vera C. Rubin Observatory, presents researchers with a big data problem.
LSSTCam will collect more data than all previous astronomical surveys combined. That’s a problem of volume. Compared to previous surveys, LSST will report ten times as many changes to the night sky per night. That’s a problem of velocity. LSSTCam will collect multi-band time series data on galaxies, stars and solar-system objects that other surveys could not even detect. That’s a problem of variety.
Fortunately, scientists love problems! Efficient search arose in response to the big data problem of the internet. Applications of machine learning techniques are emerging in response to big data problems in physics. The gift of Rubin Observatory will be both an enormous and invaluable dataset, and the problems that dataset gives researchers to solve.