Multilingual text-to-image model makes AI more inclusive

The national supercomputer Snellius is extremely powerful, but sometimes even more computing power is required. In such cases, SURF can provide capacity on a European scale. Dheeraj Varghese explains how he used the European supercomputer LUMI to develop an AI model that performs better than models from American and Chinese Big Tech companies.

30 June 2026

Key facts

Who: Dheeraj Varghese
Function: PhD student
Organisation: University of Amsterdam
Service: European supercomputer LUMI
Challenge: to develop an AI model that generates images based on prompts in six languages, without the use of English
Solution: the necessary computing power was provided by the European supercomputer LUMI, with the help and guidance of SURF

English is the native language of the ICT world. This goes beyond mere terminology - the large language models that have enabled AI to make such significant strides in recent years are all based on English vocabularies. The same applies to many models that generate images based on textual prompts.

This has consequences. You can feed a Dutch-language prompt into such a model, but it is translated into English in the background. And that’s where things can go wrong: solutions based on translations often lead to inefficiency and misinterpretations of language and culture. “As a result, a large proportion of the world’s population is disadvantaged,” says Dheeraj Varghese.

He is a PhD student in Cees Snoek’s research group at the University of Amsterdam (UvA). Together with his colleague Mohammad Derakhshani, he developed NeoBabel. Their aim was for this model to be able to generate images based on prompts in no fewer than six very different languages: English, Dutch, Chinese, Hindi, French and Persian.

“The text-to-image model NeoBabel required more computing power than is available in the Netherlands”

Too big a job for Snellius

That was easier said than done: it would require an enormous amount of computing power. Not only to train the model, but also to obtain usable datasets. After all, words in six languages had to be linked to images. When they started, 40 million ‘image-label pairs’ were publicly available, but that was nowhere near enough. Algorithms they developed themselves had to drastically expand this number using open-source language files.

All of this required more computing power than is available in the Netherlands. Even for the national supercomputer Snellius, the task was too great.

Support at every step

Fortunately, there is international cooperation in Europe regarding the use of supercomputers: EuroHPC. Through SURF and NWO, Dheeraj and Mohammad gained access to LUMI in Finland: one of the most powerful supercomputers in the world. SURF, together with 10 other European countries, is a consortium partner of this European supercomputer, LUMI.

“That changed everything,” says Dheeraj. “We were allowed to work with no fewer than 1,000 GPUs simultaneously over a period of several months. That enabled us to achieve our goal. We went from 40 to 124 million image-label pairs.”

However, it didn’t happen overnight. Mohammad and Dheeraj were the first Dutch AI researchers to use this supercomputer. They were used to systems with Nvidia GPUs, but LUMI uses AMD processors. “It was as if we had to learn a different language.”

Fortunately, they were able to call on the LUMI User Support Team (LUST). Dheeraj: “We received help at every step: how to log into the system, which workflows we could use, the container setup… They have documentation for everything. That saved us an enormous amount of time and effort.”

“SURF helped us every step of the way. That saved us a huge amount of time and effort”

Mistakes at Big Tech

NeoBabel has been a success. The result can be viewed online by anyone. Dheeraj shows an example: “We gave the model a prompt in Dutch: ‘A large brown bear is sitting next to a wooden table with a glass of golden-coloured beer, a forest in the background, warm light, a humorous scene’. NeoBabel produced an image that shows exactly that.”

“We also submitted that same prompt to two leading models from Big Tech companies: BLIP3o, from the American firm Salesforce, and Janus Pro 7B, from the Chinese firm DeepSeek. Strangely enough, there’s no bear to be seen in their results: they were thrown off track by the fact that ‘beer’ means ‘beer’ in English. Words such as ‘bench’ or ‘light’ also led to incorrect results in the Big Tech models during our tests.”

What is remarkable here is that NeoBabel’s model is four times smaller than the other two. And when given prompts in English, it performs just as well as the competition.

One prompt, two results: NeoBabel generates a bear, whilst other AI models display a beer.

“It turned out to be possible to develop a powerful model for generating images at a Dutch university”

Next step: world models

All of NeoBabel’s material is now available as open source. Anyone is free to use it and build on it to make AI more inclusive, across linguistic and cultural boundaries.

What’s next for Dheeraj? “We’ve learnt an awful lot from NeoBabel, not least from working with LUMI. It turned out to be possible to develop such a powerful model in Europe, at a Dutch university. We can put all that experience to excellent use to take our research a step further.”

That will be a major step for Dheeraj, as he is thinking about world models. He explains: “An AI model like this tries to understand how the world evolves over time. One example is Google’s Genie: a virtual, self-learning world with which you can interact in real time.” This goes far beyond games, as in games everything is programmed by humans.

One thing is clear: in the near future, we’ll still have a great need for LUMI.

Text: Aad van de Wijngaart

Would you like to use LUMI too?

Go to the LUMI service page