Use case: On top of the news with HPC Cloud(Publicatie)Interview with Dr Damian Trilling, University of Amsterdam In a time when interference with elections and fake news are regularly making the headlines, a thorough analysis of media content is as welcome as it is relevant. At the University of Amsterdam (UvA), communication scientists are working on methods to gain better and more rapid insight into trends in the news.
"A few years ago we started thinking about how to do this more efficiently and faster. All the newspapers are now available digitally, so automation was the obvious next step. In addition, there is a need for this type of research to be done on a larger scale. Analysis of the media goes beyond just analysing the newspapers. There are a number of other prominent channels in the provision of news online."
Answer new questions
The speaker is Damian Trilling (35), a university lecturer in the Political Communication & Journalism programme group at the UvA's Faculty of Social and Behavioural Sciences. "Our research can often be automated very easily, just by counting the words, but we are much further advanced in our work, for example using 'topic modelling', a method used to discover hidden semantic structures and machine learning. In this way, we can answer new questions.”
Prejudices and stereotypes
“As such, I am working with my colleague Anne Kroon on a project where we are looking at 15 years of reporting, and researching to what extent prejudices and stereotypes exist in relation to specific groups of migrants. In this research group there are many people working on subjects relating to populism. Or questions like: are press releases copied verbatim or, and this is much more interesting, not? Here you require a number of advanced techniques that, for example, can look reliably at the use of synonyms. What matters most to us is the relationship between the public, journalism and politics. There are some interesting cross-connections that can be made between them."
Tens of millions of articles
"It is difficult to estimate how much data we need to work with. We are dealing with at least tens of millions of articles. Because these are mainly written text, the size of the data sets is not really a major challenge – the challenges are that we want to do smart searching through them and need to process the results rapidly. For this you need computing power and working memory, plus the certainty that our data collection and analyses keep running 24 hours a day. We cannot manage this just using our own computers. For some analyses, we are talking about processes that can run for a week. Recently, we wanted to filter the names of people from a few million articles. The software to do that exists, but running this software on your own laptop will take a very long time. In addition, nobody wants to tie up their own PC for a few weeks just to do that."
"We work in a faculty where traditionally we have not had much to do with ICT. For average data sets of about 2,000 respondents, you can manage with SPSS and a laptop most of the time. That is why there is relatively little infrastructure available here. Our research is creating greater demands. That is why, a while back, I started looking for alternatives. Of course, you can obtain these commercially, but the HPC Cloud service from SURFsara was a more obvious choice, if only because it keeps the data in the Netherlands. That avoids jumping through a lot of legal hoops."
Differences between news sources
"My aim? To get a better understanding of how the news supply works. How people keep up to date with the news, what news actually is, and the differences between news sources. Also how we link the news to the user data, so that we can track how news is consumed. This of course has implications for privacy agreements. We recently had a discussion about the new legislation on data retention (GDPR). Security is essential, and that is something that is SURF can provide. As well as the practical functionality that we want, allowing data to be processed in an analytical environment and colleagues to log in remotely."
"Because we are detecting changes over the long term, our research is never over. Conclusions are, of course, published regularly. This yields a series of articles every year. We hope to complete the research into the differences in reporting on groups of migrants around the end of the year. In addition, we have just started up a little project that we call the 'hype detector', which we want to use to predict peaks in the news. But if you ask me what is at the heart of my motivation, then what I want is for communication science to continue searching for new ways to analyse data. Because otherwise we are always one step behind. Using old methods you can gain insight into a part of reality, but by no means all of it."
Photos: Vera Duivenvoorde
Strycharz, J., Strauss, N., & Trilling, D. (2018). The role of media coverage in explaining stock market fluctuations: Insights for strategic financial communication. International Journal of Strategic Communication, 12(1), 67–85. doi:10.1080/1553118X.2017.1378220
Trilling, D., Tolochko, P., & Burscher, B. (2017). From newsworthiness to shareworthiness: How to predict news sharing based on article characteristics. Journalism & Mass Communication Quarterly, 94(1), 38-60. doi:10.1177/1077699016654682
Jonkman, J. G. F., Trilling, D., Verhoeven, P., & Vliegenthart, R. (2016). More or less diverse: An assessment of the effect of attention to media salient company types on media agenda diversity in Dutch news paper coverage between 2007 and 2013. Journalism. doi:10.1177/1464884916680371
- In this video Damian Trilling and Anne Kroon explain how knowledge of programming helps them to analyse the current media landscape – and why this is a good thing to do
- Our HPC Cloud service
This article also appeared in SURF Magazine 03 (September 2018)
Author: Edwin Ammerlaan
- Number of times shown: