2019 on #econtwitter, in a million tweets
What did academic economists talk about in 2019? I collected one million tweets from popular academic economists over the year, and analysed the topics discussed.
What did economists talk about in 2019? To understand this, I analysed one million tweets from academic economists. The list of academics is based on the RePEcs’ ranking of the 25% most popular economists by number of followers. The map below shows where they reside, and as we can see the vast majority comes from the US.
In the wordcloud below I visualised the most frequent words in the corpus. Which are the topics inside this corpus? How can we understand and separate the different discussions? In order to analyse this text, I applied a specific Natural Language Processing technique known as topic modeling.
Topic modelling starts from a corpus of texts (be it of news articles, tweets or legal documents), about which we have no information a priori and helps us extract topics from it by modelling the frequency distribution of the words (interested readers can find more details in the technical note below). I allowed the model to recognise a maximum of 15 topics in the corpus of tweets.
The graph below shows 8 clusters of words. For each cluster, the graph shows the 10 most salient words that it contains, where the size of the font represents the frequency used. The next thing to do is to see whether the clustering produced and the salience of the keywords can be interpreted as a coherent topic.
This interpretation process is neither obvious nor trivial. Following the numbering in the picture: “Brexit”, “Climate Change”, “Monetary Policy” and “Trade Wars” were quite easy to interpret. I use the label “Publications and Conferences” for topic 4, as it refers to the announcement of new papers, books and research. Also, I label topic number 6 in the picture “Inequality and Fiscal Policy”.
Topic 5 was somehow more difficult to interpret, as it listed women and gender as salient words for the topic, but seemed to be broader than gender inequality, given that also black, and children are relevant here. Therefore, I gave it a more general label of “Discrimination and Diversity”.
Two topics were the hardest to interpret, out of the total of 15: one seemed to be about housing and cities (topic 7 in the picture), and the other a sort of “epistemology” cluster where this community would discuss social science itself (not shown). For the rest of the topics, out of the total 15, the top keywords were very conversational (really, true, think, right, read etc..) and have been aggregated into a single topic named “General Conversation”.
Which were the topics most talked about, by this list of RePEc economists? I first filtered out those tweets for which my topic modelling technique assigns any tweet to any single topic with a probability less than 90%. This has left me with 100 thousand tweets for which the tool has assigned probability greater than 90% of them belonging to one topic.
The following chart plots the volume of tweets by topic, excluding those classified as “General Conversation”. Perhaps unsurprisingly, given that RePEc is a network created for academic economists to promote academic research, the most popular topic was the one on “Publications and conferences”.
The topics appeared to be quite stable over time, with some exceptions linked to particular events. In the following chart, I plot the volume of tweets classified as “Brexit”, observing spikes at certain dates: May 27th (Brexit Party wins the European elections), September 24th (when the Supreme Court ruled the Parliament Suspension as unlawful) or October 19th (vote on Boris Johnson’s Deal).
Who is tweeting the most? Using RePEc’s information, I have been able to classify this group of economists according to their academic fields. The 344 economists were associated with 89 unique fields. The following chart represents this information in three different panes.
In the first pane, I plot the volume of tweets per field, while the second pane represents the number of economists listed by field. In the third panel I derived a normalised measure of twitter activity by field, as the average number of tweets by field, restricting the field to only where there were at least 10 economists listed. In all cases, the top 10 fields are represented. Labour Economists on the Repec list seem to be tweeting more than the colleagues from other fields, followed by the fields “European Economics” and “Public Economics”.
Who is talking about which topic? Are specialists tweeting about their area of expertise? I break down the topics by academic field, so to observe which topics are the most associated with different fields of economics.
In the chart, the area of each square is proportional to the number of tweets by economists register under a field, in the topic of interest (it is possible to explore the different topics by changing the menu). Only the most relevant fields are shown in the chart.
Unsurprisingly, the specialists tweet the most in their respective topics. For example, environmental economists and energy economists tweet mostly about climate change, whereas Macroeconomists are not as present in climate discussions. It is Urban and Real Estate economists, that tweet about Housing and cities, international economists that tweet about Trade wars. As for the topic of Discrimination and diversity, it is Labour and Urban economists that tweet the most, as well as academics in the fields of Education and Informal and Underground Economics.
The good consistency between fields and topics doesn’t only tell us that, in general, specialists tend to write about their area of domain (making the scientific community probably different from other Twitter communities), but also gives confidence in this first topic modelling exercise.
These types of exercises can feed into research that studies how academics and scientists communicate on social media. See for example a study by Marina della Giusta (LSE) who compared RePEc economists to natural scientists, and showed that economists use a more complex and less inclusive vocabulary.
In this blog post, I attempted to study what academics have talked about on Twitter in 2019, in future research it would be interesting to study how information about economics propagates between different communities, how academics relate to journalists and the general public sphere.
Annex: data and methodology
Over 2019, this list of 344 economists have tweeted more than one million times. I have been collecting this dataset using the Twitter API. I subsequently geolocated the accounts by crossing RePEc’s and Twitter information, and merged the data with RePEc’s classification by academic field.
In order to be able to process only the English tweets for topic modelling, I applied a common language-recognition algorithm, that helped separate the tweets into several languages: 600 thousand of the tweets collected were in English, followed by Spanish and Italian (see below).
Natural language processing was applied to the sub-corpus of English tweets. As for the modelling technique, a common Latent Dirichlet Allocation (LDA), in its standard form, is agnostic about the topics in the corpus.
In this case, I applied a slightly different version, known as guided LDA, based on Jagarlamudi, Jagadeesh, Hal Daumé III, and Raghavendra Udupa (2012) “Incorporating lexical priors into topic models.” With respect to a standard LDA, this allows me to “suggest” to the model a topic to recognise. I have mapped two topics that I knew a priori would have been in the corpus: Brexit and Climate Change.
I nudge the model into recognising Brexit by suggesting the words Brexit, Johnson, deal, leave, negotiate, backstop as a first topic, and the words climate, change, emissions, carbon, temperature, energy, clean as another topic. The rest of the topics were first recognised and iteratively nudged a priori. The model’s performance was assessed with a coherence score matched against a standard LDA.
Republishing and referencing
Bruegel considers itself a public good and takes no institutional standpoint. Anyone is free to republish and/or quote this post without prior consent. Please provide a full reference, clearly stating Bruegel and the relevant author as the source, and include a prominent hyperlink to the original post.