Exploring infrastructure for Dutch speech recognition

Due to developments in AI, the world of automatic speech recognition (ASR) is rapidly changing. New ASR systems seem to provide overwhelmingly accurate transcription of speech. But how do these systems perform under atypical conditions and in large scale applications?

25 Jun 2024
SURF Utrecht

ASR systems that have become available on the market recently such as Whisper, seem to provide overwhelmingly accurate transcription of speech. But how do these systems perform under atypical conditions?  For example, in the case of dialects, children or elderly speech or speech from non-native Dutch speakers? What happens if there are multiple speakers, cross talk and background noises? And, what to do if you want to transcribe very large amounts of speech data? What's the best way to handle this in a more (infra)structural way? 

In this seminar, we will show examples from different application areas and discuss practical, operational, and strategic aspects of:

  • The necessity of making highly quality (Dutch) speech recognition engines available in research and educational context, e.g., to transcribe speech from lectures, interviews or meetings to text.
  • Whether it is important or not that speech recognition engines can be upgraded when better engines or models become available, or that different versions of models can be selected for specific tasks (e.g., specific types of speech).
  • How open standards can be applied and ‘explainability’ can be fostered (how are models created, using which data sets, performance specifications) as much as possible.

This seminar has not the aim to facilitate research on speech technology, rather it addresses the use of existing speech technology solutions, and how this can be (further) optimized: let's move together towards sustainable solutions for research and education!


Chair: Annette Langedijk (SURF)

12:45h Doors open
13:00h Welcome and Setting the Stage (Roeland Ordelman, CTO CLARIAH)

User perspectives on ASR

  • Jeffrey van Woensel and Annabel de Ruiter (Nederlands Veteraneninstituut)
  • Can You Hear Me, Loud and Clear? Advantages and Limitations of Voice Recorded Speech to Text Answers in the Online LISS panel - Joris Mulder (Centerdata)
  • How we got a supercomputer to listen and write down all the Dutch podcasts - Sahra Mohamed (NL podcasts)
  • Speech-to-text in user generated video - Arnout Probst (UvA/HvA) 
14:30h Break

Technological perspectives on ASR

  • Speech Technology: Trends, Limitations, and Future - Vivian van Oijen (SURF)
  • (Infra)structural considerations for high quality ASR for a variety of research domains - Henk van den Heuvel (Radboud Universiteit)
15:50h Panel discussion
16:30h Closing and networking reception

Note: The language of the meeting will be English.

For whom?

The event is of particular interest to: 

  • Researchers, educators and support staff from various disciplines interested in the application of automatic speech recognition
  • Research & education infrastructure providers


SURF Utrecht (Kantoren Hoog Overborch - Hoog Catharijne) 
Moreelsepark 48 
3511 EP Utrecht 
Route description


SURF in cooperation with Stichting Open Spraaktechnologie.