"We use data generated by people during their day-to-day Internet use in order to design intelligent systems that can offer added value to the users."
Added value for users
Martha Larson and her students are carrying out research into multimedia retrieval and recommender systems. She explains: "We use data generated by people during their day-to-day Internet use in order to design intelligent systems that can offer added value to the users. Imagine that you order a book from a web store and receive suggestions for other books that you might also be interested in. Those suggestions are calculated using algorithms that make use of earlier interactions on the website. You leave behind your data and receive recommendations in return."
Pixels predict photo locations
Larson is also researching the algorithms that play a role in searching for multimedia, such as photos: "We are currently trying to use the content of photos – i.e. the pixels – to predict where the photo was taken. For example, you can retrieve photos of Amsterdam where the user has turned off the GPS. But we also do the reverse. A person who turned off their GPS may have done so deliberately in order to prevent anyone from knowing where the photo was taken. The question is then: how can you alter photos so that it is no longer possible to determine the location? People will still be able to recognise the location, but the computer will no longer be able to do so."
Large data collections are required for the kind of research that Larson is working on. For example, for her photo research she is using a Yahoo dataset with photos from Flickr. Her research group also receives datasets from the business world that, for example, provide insight into the click behaviour of a group of people. These datasets are smaller, but analysing them requires a lot of processing power as the researchers need to run a large number of algorithms.
"The students who worked with Spark were extremely enthusiastic."
Plenty of processing power required
Plenty of processing power also requires good computing facilities, says Larson: "Sometimes the universities' computers are inadequate, so it's really great that we can call on SURFsara. We have recently done a lot of work with the Spark cluster. This is important, as it allows students to learn about tools that are being increasingly used in the business world. The students who worked with Spark were extremely enthusiastic." Support is also a highly important part of the process, and Larson speaks very positively about the support received from SURF: "We don't have intensive contact, but our experiences have been great: the sky is the limit.”
Recording personal data
Recommender systems do not just involve technology; privacy is also important. "That is definitely an issue, even for my students," says Larson. "Until now it was often assumed that the more data, the better. One of my students is now researching how much data you can leave out without having an impact on the quality of the recommendation. That offers opportunities to show more restraint in the recording of personal data. It might sound strange that you would use high-performance computing in order to leave out data, but you need to try out a large number of parameters and calculate scenarios."