SURF works on improving data management for AI-based projects
Many open-source tools exist, but each solves only a part of the challenge. Where one tool provides data versioning, another is needed to run experiments at scale, and yet another for inspecting experiment results. Each requires different levels of expertise, knowledge of programming languages, and types of infrastructure. On the other hand, commercial platforms provide integrated solutions but lock the user in a specific infrastructure.
The importance of reproducibility
Reproducibility of scientific research is vital to maintain transparency and trust among scientists as well as between the scientific community and the public. In computational research, one of the first steps towards improving reproducibility is the application of version control. With version control software changes are tracked, which allows other researchers to replicate experiments exactly. For machine learning research, tracking only software changes is often not enough. With data changing as frequently as models, how can machine learning workflows remain transparent and reproducible?
Experimenting with available tools
In the past few month, SURF has run experiments with several available tools in order to understand what solutions already exists and their shortcomings, if any. With Data Version Control both machine learning model code and data are versioned, allowing researchers to automatically keep an exact history of all relevant aspects of their workflow. With Ray Tune or Optuna, hyperparameter tuning jobs are distributed across the Lisa cluster to run experiments at scale, and researchers can easily keep track of experiments with the web interface of the MLflow tool.
Investigating services of public cloud providers
In addition, SURF is investigating machine learning services of public cloud providers, such as AWS SageMaker, Azure Machine Learning, and Google’s Vertex AI. These cloud providers claim to provide a unified platform for preprocessing data and training, tuning and deploying machine learning models. Is this the case, how do these platforms differ from existing tools, and what functionality do they provide for reproducible machine learning research projects?
Get in touch
Are you working on a machine learning project and do you recognize any of these challenges? Please contact us at (email@example.com). We are currently preparing resources to help users on our infrastructure implement this workflow and hope to be able to provide more generally available support in the future.