The dataset from the ATLAS Higgs Machine Learning Challenge has been released on the CERN Open Data Portal.
The Challenge, which ran from May to September 2014, was to develop an algorithm that improved the detection of the Higgs boson signal. The specific sample used simulated Higgs particles into two tau particles inside the ATLAS detector. The downloadable sample was provided for participants at the host platform on Kaggle's website. With almost 1,785 teams competing, the event was a huge success. Participants applied and developed cutting edge Machine Learning techniques, which have been shown to be better than existing traditional high-energy physics tools.
The dataset was removed at the end of the Challenge but due to high public demand ATLAS, as organizer of the event, has decided to house it in the CERN Open Data Portal where it will be available permanently. The 60MB zipped ASCII file can be decoded without a special software, and a few scripts are provided to help users get started. Detailed documentation for physicists and data scientists is also available. Thanks to the Digital Object Identifiers (DOIs) in CERN Open Data Portal, the dataset and accompanying material can be cited like any other paper.
The Challenge's winner Gábor Melis, and recipients of the Special High Energy Physics meets Machine Learning Award, Tianqi Chen and Tong He, will be visiting CERN to deliver talks on their winning algorithms on 19 May.