Machine Learning Wins the Higgs Challenge
20th November 2014 | By
The winner of the four-month long Higgs Machine Learning Challenge, launched on 12 May, is Gábor Melis from Hungary, followed closely by Tim Salimans from The Netherlands and Pierre Courtiol from France. They will receive cash prizes, sponsored by Paris-Saclay Centre for Data Science and Google, of $7000, $4000, and $2000 respectively. The three winners have been invited to participate at the Neural Information Processing Systems conference on 13 December in Canada.
The Special High Energy Physics meets Machine Learning Award was given to team Crowwork's Tianqi Chen and Tong He. Though their score was 3.72 to Melis' 3.81, a thorough scrutiny showed that Crowwork's algorithm was an excellent compromise between performance and simplicity, which could improve tools currently used in high-energy physics. The team has been invited to CERN next year for a workshop where they will discuss the application of machine learning techniques in high-energy physics. The Challenge, hosted by Kaggle, had the all-time record of 1,785 teams participating.
"The Challenge is done but we are only really half-way through the project. We now have to digest the many ideas submitted by the participants, and establish long-term collaboration between high energy physics and machine learning communities," says David Rousseau, ATLAS physicist and organizer of the Challenge.
Participants had to develop an algorithm that improves the detection of Higgs boson signal events decaying into two tau particles in a sample of simulated ATLAS data that contains few signal and a majority of non-Higgs boson "background" events. The observation of the Higgs boson was a groundbreaking result published by the ATLAS and the CMS experiments in 2012, leading a year later to the award of the Nobel Prize in physics to theorists who proposed the underlying theoretical mechanism. For the Challenge however, no knowledge of particle physics was required but skills in machine learning – the training of computers to recognize patterns in data – was at stake.
"In the course of the Challenge, I learned that physicists and computer scientists think in different ways. Combining their strengths might lead to better results," says Crowwork's Tong He.
"I hoped to win, but even if everything goes smoothly, there is always an element of luck involved. How much physics knowledge would be required was uncertain. I expected a machine learning savvy physicist to win," says Melis, a graduate in software engineering and mathematics. His algorithm is an ensemble of deep neural networks trained on random subsets of data provided with very little feature engineering and no physics knowledge.
"Competitions like these offer a great platform to try out new modelling techniques and update skills. The competitive element makes them fun," says Salimans, who has a PhD in Econometrics and works as a data science consultant. He describes his solution as a combination of a large number of boosted decision tree ensembles, with some tricks to improve statistical efficiency.
"The Higgs Machine Learning challenge was special. It shows that machine learning and data mining are very important to the world, not just in generating profit but also in cutting-edge research. In the course of the Challenge, I learned that physicists and computer scientists think in different ways. Combining their strengths might lead to better results," says Crowwork's Tong He, a Masters student in Data Mining and Bioinformatics at Simon Fraser University, Canada.
Team Crowwork developed XG Boost, an optimized general-purpose gradient boosting (or trees) algorithm that learns the effective high order combination of input variables. The toolkit is efficiently parallelized, with good accuracy. They made this available early on in the competition to other participants also.
Crowwork's Tianqi Chen developed XG Boost because his research code-base was too slow. "XG Boost has some nice features that go beyond what existing tools offer so we thought it would be helpful to share the toolkit. We were very glad that XGBoost was adopted by many users, their feedback helped us improve it," says Chen, a PhD student in Machine Learning at the University of Washington, Seattle.
"The huge success of the Challenge shows the fascination that the discovery of the Higgs boson, including the statistical tools used for it, holds for the public. It also reveals that experimental particle physics, in spite of its sophistication, can learn a lot from machine learning science," says Andreas Hoecker, physics coordinator of the ATLAS experiment.
For the detailed documentation of their optimization work, the CSE_Team_0 members – Chamila Wijayarathna, Dimuthu Upeksha, Maduranga Siwriwardena, Sachith Withana – were Specially Mentioned, together with contestant Dhiana Deva. The simulated data used for the Challenge will be made available on opendata.cern.ch for those who would like to test new ideas.