Exploring machine learning
We investigate whether machine learning can improve traditional high performance computing. We also explore scalable ways of training neural networks, for instance in the field of image recognition. Machine learning means that a computer learns independently from data and input.
Deep learning enhanced HPC applications
Traditionally, the main workloads run on a supercomputer consist of various forms of numerical simulations. Recently, scientists have started exploring the use of machine learning techniques to enhance traditional simulations, such as weather predictions. Early results indicate that these models, that combine machine learning and traditional simulation, can improve accuracy, accelerate time to solution and significantly reduce costs.
In this project we investigate whether and how machine learning and deep learning are suitable technologies to augment, accelerate or replace scientific workloads, such as numerical simulations. And in that context, is it a pre- or post-processing step to help filter and understand the input data or ultimate simulation results, or is it something that is poised to (partly) replace the decades-old codes that comprise many high performance computing (HPC) workloads?
To validate its approach and potential, we stimulate and support new and advanced use cases that enhance traditional HPC simulations with machine learning algorithms. We do so in close collaboration with scientific research groups. Four research proposals have been selected and granted in various scientific domains:
- Chiel van Heerwaarden (WUR): Machine-Learned turbulence in next-generation weather models
- Sascha Caron (Radboud University): The Quantum-Event-Creator: Generating physics events without an event generator
- Alexandre Bonvin (Utrecht University): 3DeepFace: Distinguising biological interfaces from crystal artifacts in biomolecular complexes using deep learning
- Simon Portegies Zwart (Leiden University): Machine learning for accelerating planetary dynamics in stellar clusters
- Whitepaper Deep-learning enhancement of large scale numerical simulations
- Presentation for the workshop 'Deep learning for high performance computing' on 15 October 2019: Deep Learning for HPC - Experiences of SURF & project partners
- Article on The Next Platform: Transforming HPC research with AI approaches
- Blog: How machine learning can improve HPC applications
- Whitepaper: Deep-learning enhancement of large scale numerical simulations
Scalable high performance training of deep neural networks
Caffe is one of the most popular frameworks for image recognition. Intel has contributed to this framework by improving Caffe performance when running on Intel Xeon processors. The goal of this project is to improve the scalability of Intel's Caffe performance on supercomputing systems for large-scale neural network training.
Our focus is on highly scalable high performance training of deep neural networks, and its application to various scientific challenges, such as diagnosing lung disease, plant classification, and high-energy physics. For example, we are working on porting the large-batch Stochastic Gradient Descent (SGD) training techniques to the popular Tensorflow framework. Particular focus will also be on the rapidly developing medical imaging field. Because of the large-scale data dimensionality, the field of medical imaging needs large-scale compute and memory bandwidth and capacity.
We already succeeded in minimizing the time-to-train of several deep convolutional neural networks on state-of-the-art computer vision datasets such as ImageNet and beyond. Some of the highlights of 2017 were less than 30 minute training time on the popular Imagenet-1K dataset, as well as state-of-the art results in terms of accuracy on other datasets such as the full ImageNet and Places-365 datasets.
- Article: Changing Course: Rethinking How AI Can Interpret X-Rays
- Article: When Dense Matrix Representations Beat Sparse
- Blog: Achieving Deep Learning Training in less than 40 Minutes on ImageNet-1K & Best Accuracy and Training Time on ImageNet-22K & Places-365 with Scale-out Intel® Xeon®/Xeon Phi™ Architectures
- Article: Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
- Conference paper: Large Minibatch Training on Supercomputers with Improved Accuracy and Reduced Time to Train
- Presentation Teratec Forum 2018: Towards the recognition of the world’s flora
- Presentation IXPUG workshop – ISC 2018: Deep Learning for fast simulation
- Article: Diagnosing Lung Disease Using Deep Learning
Project team SURF
Caspar van Leeuwen