Improving the prediction of regional well-being from tweets

Social media posts can offer insights to residents’ well-being and this can help governments plan for better support systems.

| By Assistant Professor Kokil Jaidka |

In the COVID-19 era that we are living in today, social media posts can help us understand how people are adapting to, and coping with, the new normal. But our words are useful not just to understand what we — as individuals — think and feel. They are also useful clues about the community we live in. Why is this so? 

The puzzle of why people talk the way they do, and why different groups might speak the same language differently, has always been of interest. In the olden days, studying how people “live a language” — and what they feel — was rather cumbersome. For example, in the early 1900s, a linguist in Paris recruited a grocer to bicycle his way up to hundreds of French villages and get answers to 1500 questions about how they express themselves. 

Luckily, we no longer need to plan cross-country cycling trips to get our answers! In essence, we can simply study how people in different places in the world post on social media. Their intermittent bursts of words and emojis in cyberspace can provide important signals to understand people and cultures. Other scientists have shown that people’s social media posts can reveal their age, gender, general happiness, and even their likelihood of being diagnosed with depression. 

But if we consider all the social media posts of all the people living in a region, would it help us understand regional happiness? In the pursuit of this answer, many studies have tried to measure the health and happiness of regions using the social media posts of the people who live there. Still, the findings have been mixed so far. 

There might finally be a convincing answer, in a new study which I have worked on with a team of researchers from the U.S. and Australia. Published this week in the Proceedings of the National Academy of Sciences, our study collected and analysed 1.53 billion Twitter posts over seven years in the U.S. We found that Twitter posts can produce regional happiness and life satisfaction estimates that are about as accurate as other ways that governments and economists typically use to measure the same things. The way to do this correctly, however, is with the help of sophisticated machine learning methods anchored in labeled prior data. The study sheds some misconceptions about measurements — for instance, posting something positive, with words such as “lol,” “love,” and “good” would actually throw the analysis off. In fact, these three words alone were causing many of the measurement errors reported in previous studies, and removing them could improve well-being predictions done through simple word-counting approaches. We tested our findings over time, over different population samples, and against other kinds of regional indicators, to ensure that our techniques were indeed robust and that the findings applied across the situations that we examined.

So, why is “lol” so misleading? Our study found that — and internet linguist Gretchen Muccullough has pointed this out as well — words like ”lol” are used to express many different things on social media. I have myself tweeted it to flirt, express irony, annoyance, and sometimes just pure surprise. But the preferred methods of social media language measurement, which were trained on language from the 1990s, still think that “lol” depicts its original meaning of “laughing out loud.” 

What do the findings mean? Our results show that social media posts can offer a signal into the well-being of residents, over and above their socioeconomic markers. Our study humbly contributes better methods to unobtrusively measure people’s mental and emotional health through social media posts. In the current climate, measurements based on social media posts can help governments to plan for better support systems, better infrastructure, and better techniques for interventions and outreach.

Read the complete study: Estimating geographic subjective well-being from twitter: a comparison of dictionary and data-driven language methods


About the author

Picture1 copy.jpg

Dr Kokil Jaidka is an Assistant Professor with the Department of Communications and New Media at the NUS Faculty of Arts and Social Sciences, and a Principal Investigator at the NUS Centre for Trusted Internet and Community. She is interested in examining the role of social media platforms in enabling self-presentation and social behaviour, particularly in developing computational models of language for the measurement and understanding of computer-mediated communication.