2024-04-02 by CMS Collaboration
The CMS experiment at CERN is proud to announce the first release of 13 TeV proton-proton collision data collected in 2016. Over 70 TB of 13 TeV collision data and 830 TB of corresponding simulations are now accessible to the global scientific community and enthusiasts alike through the CERN Open Data Portal.
For the first time, the scientific community has access to substantial datasets of 13 TeV collisions. This release augments the 2015 data and simulation that were made public in 2021. Over 20,000 simulations of different physics processes have been released alongside the collision data, as well as new software containers and a new virtual machine for analysis.
As a reflection of its commitment to accessible open science, CMS has released both 2016 collision data and simulation in the new "NanoAOD" (Analysis Object Data) format, a streamlined and condensed storage format that can be analyzed directly by open data users. NanoAOD is extremely efficient, encapsulating key physics information while reducing file sizes by about 95% and storing data in standard structures that can be analyzed without dedicated CMS software. This format takes a big step toward easily reusable CMS data. All data and simulation in this release are also available in the more comprehensive “MiniAOD” format for full preservation, and a subset of the collision data is available in an expanded NanoAOD format that includes information about particle candidates from the CMS "particle flow" algorithm.
The 2016 data released today, about half of the total collected in 2016, was used by CMS to publish over 200 publications exploring the nature of the Higgs boson, searching for new and rare physics processes, performing precision measurements of standard model processes, studying heavy flavor physics, and more. Scientists, researchers, educators, and students worldwide are encouraged to explore this rich dataset. The open nature of the CERN Open Data Portal aligns with CERN's commitment to transparency and knowledge-sharing, ensuring that the global community can collectively advance our understanding of the universe.
"I can't wait to see my university students dig in to this new data from CMS -- earlier releases had small examples of NanoAOD data for education and outreach, but now we can explore so much more," says Julie Hogan, a leader in the CMS Data Preservation and Open Access group who uses CMS open data in her undergraduate laboratory course. The group will offer its 5th annual Open Data Workshop in July 2024 to teach users how to analyze the 2016 data. To celebrate the release of data in the NanoAOD format, the workshop will feature "hackathon" segments where researchers can actively launch new projects using the open data. Register now to participate!
"The entire CMS collaboration has worked for many years to understand the Run 2 data and produce innovative new algorithms, data formats, and analysis tools," adds Hogan. "We are excited to open the doors to the wider community of researchers and see how our data can spur their creativity." Users can give feedback on CMS open data and seek specific help on the official CERN Open Data forum.
As before, all of the CMS open data are released into the public domain under the Creative Commons CC0 waiver via the CERN Open Data portal. The portal is openly developed on GitHub by the CERN Information Technology team in cooperation with the experimental collaborations. CMS would like to thank CERN for providing resources and expertise to build and maintain the portal. We would also like to acknowledge the continuous effort of many of our collaboration members who have helped us release this latest batch of CMS open data.