What is serverless computing?
The term serverless may suggest that no servers are used for computation, but that is not the case. Instead it means that the user, typically a software developer, does not have to set up servers (virtual or bare metal) and does not have to worry about maintenance, upgrades, or backups.
Serverless services roughly come in 2 flavours:
- Databases services (relational or otherwise) often are called serverless when a cloud provider offers developers a service that is always online, does not need maintenance, scales automatically and has a pay-for-usage pricing model.
- Function as a Service (FaaS) is the most interesting aspect of serverless computing. These frameworks allow developers to write small functions in a variety of programming languages. The cloud provider will make sure the functions are executed when needed. Typically, these functions are of short duration and triggered by an event, like the upload of a file, the visit of a website or the signal of a device. The function is deployed automatically and scaled up by the cloud provider when necessary. The developer does not need to set up servers for running software or worry about running large-scale deployments. This is all handled by the serverless platform.
Commercial serverless offerings are pay-per-use. You only pay for the time functions are triggered and executions happens. As an example, consider a webserver that hosts a website. Traditionally, a webserver runs on a (virtual) server and waits for incoming traffic. This running server incurs costs. In contrast, an idling serverless website only exists as a file on some storage service. No costs are incurred when there is no traffic. When users want to visit the site, functions are triggered and the website comes alive. When millions of users want to see the website, the cloud provider will scale the execution of the functions effortlessly. Many companies have reduced the cost of hosting their websites by making them serverless.
Apart from the serverless offerings from cloud providers there are a number of open source FaaS projects. At SURF we looked mainly at OpenFAAS and Kubeless, but there are many more. OpenFaaS is currently being used in our streaming data platform (that is used by The Green Village) to convert incoming data streams to a more friendly format for further processing. The advantage is that computation only takes place when data comes in and that scaling is automatic. When there is no data, no resources are being used. When there is a lot of data, the functions are run in parallel.
Experiment with large data processing
Because of its scalability, it is tempting to see if serverless functions can be used for large-scale distributed computations. At SURF we tried this together with researchers of the University of Wageningen. The goal was to analyse genomic data sets; the analysis of a file should start as soon it was uploaded to the iRODS data store at SURF.
This use case provided some insight into the potential, but also the limitations of serverless computing for large data processing. The upload of a file can be seen as a trigger for the analysis - this part worked. However, we found 3 main limitations:
- Serverless functions are designed to be short-lived (minutes) and use limited resources. In our experiment, the analysis of a single file took much longer than a typical serverless function: hours or even days.
- There is little support for error recovery when part of the analysis fails.
- Serverless functions should be written in a language supported by the platform. Here, we made the analysis software available in a container that contained existing software tools.
Our solution was to orchestrate containers in a way that is very similar to serverless functions. Using the orchestration tool Kubernetes, containers containing the analysis software were deployed only when a file was uploaded. In this way the software was not restricted in either run time or in its use of resources. In addition, any software in a container can be run in this way, not only those functions written in a supported programming language. The advantage for the researchers is that they only need to provide a container with the appropriate software that can analyse a single file. The execution and scaling of the analysis are taken care of by the platform. Wageningen university continues to use this solution and currently we are considering similar solutions for other users.
Serverless technology is a very promising technology. It allows developers to concentrate on code, instead of peripheral issues like deploying and maintaining servers. It can be much cheaper than traditional solutions. Serverless platforms take care of scaling and as such are interesting for distributed applications. Their event-driven nature makes them very suitable for use in domains like internet of things and automated workflows. At SURF we continue to follow new developments concerning serverless technology.
We would love to hear from people who are interested in using this technology in their research project. Please contact Machiel Jansen: email@example.com or tel. 020-8001300.
Vector created by gstudioimagen - www.freepik.com