Exploring machine learning

We investigate whether machine learning can improve traditional high performance computing. We also explore scalable ways of training neural networks, for instance in the field of image recognition. Machine learning means that a computer learns independently from data and input.

Part of SURF Open Innovation Lab

Deep learning enhanced HPC applications

Traditionally, the main workloads run on a supercomputer consist of various forms of numerical simulations. Recently, scientists have started exploring the use of machine learning techniques to enhance traditional simulations, such as weather predictions. Early results indicate that these models, that combine machine learning and traditional simulation, can improve accuracy, accelerate time to solution and significantly reduce costs.

In this project we investigate whether and how machine learning and deep learning are suitable technologies to augment, accelerate or replace scientific workloads, such as numerical simulations. And in that context, is it a pre- or post-processing step to help filter and understand the input data or ultimate simulation results, or is it something that is poised to (partly) replace the decades-old codes that comprise many high performance computing (HPC) workloads?

Use cases

To validate its approach and potential, we stimulate and support new and advanced use cases that enhance traditional HPC simulations with machine learning algorithms. We do so in close collaboration with scientific research groups. Four research proposals have been selected and granted in various scientific domains:

Chiel van Heerwaarden (WUR): Machine-Learned turbulence in next-generation weather models
Sascha Caron (Radboud University): The Quantum-Event-Creator: Generating physics events without an event generator
Alexandre Bonvin (Utrecht University): 3DeepFace: Distinguising biological interfaces from crystal artifacts in biomolecular complexes using deep learning
Simon Portegies Zwart (Leiden University): Machine learning for accelerating planetary dynamics in stellar clusters

Scalable high performance training of deep neural networks

Caffe is one of the most popular frameworks for image recognition. Intel has contributed to this framework by improving Caffe performance when running on Intel Xeon processors. The goal of this project is to improve the scalability of Intel's Caffe performance on supercomputing systems for large-scale neural network training.

Our focus is on highly scalable high performance training of deep neural networks, and its application to various scientific challenges, such as diagnosing lung disease, plant classification, and high-energy physics. For example, we are working on porting the large-batch Stochastic Gradient Descent (SGD) training techniques to the popular Tensorflow framework. Particular focus will also be on the rapidly developing medical imaging field. Because of the large-scale data dimensionality, the field of medical imaging needs large-scale compute and memory bandwidth and capacity.

We already succeeded in minimizing the time-to-train of several deep convolutional neural networks on state-of-the-art computer vision datasets such as ImageNet and beyond. Some of the highlights of 2017 were less than 30 minute training time on the popular Imagenet-1K dataset, as well as state-of-the art results in terms of accuracy on other datasets such as the full ImageNet and Places-365 datasets.

Project team SURF

Valeriu Codreanu

Damian Podareanu

Caspar van Leeuwen