Using group detection and computational text analysis to examine mechanisms of disinformation on Reddit

Tyler Crick


This paper presents initial findings from a research project on the diffusion and impacts of online disinformation campaigns. Previous research suggests that misinformation, disinformation, and other false information spreads widely, enters echo-chambers, and far outpaces the reach of any subsequent corrective information. For this reason, it is crucial to understand the mechanisms of disinformation in order to counteract it before it pervades, and network analysis offers uniquely powerful concepts and methods that shed light on the dynamics of disinformation diffusion. Empirically, this paper presents the results of network and computational text analyses of Reddit, which Alexa – the de facto web traffic analyzer – ranks ahead of Google as the top news aggregator in the world, fifth among all sites in Canada, and sixth among all sites in the United States. Reddit has also housed some of the most infamous online gatherings for controversial ideologies, including r/The_Donald and r/TheRedPill, where a number of known ideologically-motivated disinformation campaigns started. I present a network analysis of a 2 billion post subset of Reddit from 2015-2017, during and immediately following Donald Trump’s political campaign for the US Presidency in 2016. This time period was a kind of “golden age” for overtly fake news. Reddit is organized into communities, called subreddits, that are usually centered around a common interest, providing an excellent opportunity to study group cohesion using community detection methods. In this paper, I consider these methods in three separate analyses. One network examined is comprised of subreddits connected by having common users who shared fake news. Another network is made up of users with subreddit activity in common. The final network is made up of users tied by replying to each other. A number of community detection methods, such as Louvain and its Leiden extension for signed networks, are employed and then validated by referring to the central topics of the subreddits that form the communities. This research also includes the use of machine learning and computational text analysis, in the interest of characterizing the topics that disinformation focuses on, as well as the interests of the users who post it. Due to the nature of subreddit divisions by topic, it is possible to train highly accurate topic modelling algorithms. The same can be said for sentiment or stance analysis towards these topics (for example, regular members of r/HillaryForPrison can be expected to make many posts expressing negative sentiment towards Hillary Clinton). With this information, as well as Reddit’s in-built post score metric, the influence of fake news posts is investigated by tracking conversation disruption and shifts of sentiment. Influential posts thus contribute more to edge weights, enhancing centrality measures compared to just using unqualified post volume. Research results demonstrate very clear ideological divisions between the communities detected. One example includes the grouping of r/The_Donald, r/HillaryForPrison/, r/MensRights, and r/progun, whereas another community contains r/The_Farage, r/Le_Pen, r/europeannationalism, and r/euromigration. At the same time, the edge weight between r/The_Donald and both r/politics and r/news shows that these subcommunities do spread disinformation outside of themselves.

← Schedule