Machine learning with Apache Spark

Date: 13 FEB 2018 

Apache Spark is one of the most popular computing frameworks for large-scale data processing. It also includes a machine learning library (MLlib) with distributed versions of many machine learning algorithms.

13 Feb 2018
SURFsara, Amsterdam
Prior knowledge needed?

In this workshop we give an introduction to Apache Spark and explain how to use it for distributed machine learning. For the hands-on we will be using PySpark, Sparks Python API, from a Jupyter notebook environment.

Please bring your own laptop (with an ssh client installed) for the hands-on sessions!


  • Experience with the Python programming language
  • Basic knowledge of supervised machine learning methods

The training is organized by SURFsara as a PRACE Training Centre.

Latest modifications 20 Dec 2017