Top(ological) News: Examining the Social-Media Discourse Around Current Events as a Dynamic Weighted Network
Samuel Rosenblatt, Joshua MinotMicro-blogging sites like Twitter are rich in information pertaining to discourse around current events, social movements, and public entities such as brands and popular figures. However, the high volume and dimensionality of the available lexical data, inconsistent grammatical and syntactic norms, and rapidly changing dynamics make distilling meaningful temporal information about a particular topic with these data difficult without context. Simple measures such as time series of Zipf distributions can describe general trends but require additional context to understand and can overlook certain dynamics. Advances in computational linguistics techniques, such as target-dependent sentiment analysis are promising in this regard for many cases, but when a topic of interest is highly polarized, highly popular, and rapidly developing, additional issues arise. Highly polarized topics, such as political scandals, can elicit tweets with negative sentiment from multiple discordant groups, thus, standard positive versus negative sentiment analysis fails to measure the nuance of the discourse. Highly popular topics are often targets of spam or narrative injection whereby opportunistic agents ‘piggyback’ on the popularity of a topic by inserting the topic of interest into unrelated tweets which promote their own products or narratives in order to boost exposure by taking advantage of how users interact with topics by key words. These otherwise unrelated tweets muddy data, making text-mining based tasks difficult or less accurate, a problem sometimes exacerbated by the size of the data corresponding to popular topics. Rapidly developing topics present difficulties stemming from sweeping language changes over short timespans, which makes comparison of analyses across time difficult due to underlying structural changes. Network analysis offers a host of methods to understand dynamical systems of relational data, and word co-occurrence networks have been studied for decades to shed insight on language and its dynamics across time and place. We use these methods to examine the dynamics of discourse on Twitter over a period around polarized, popular, and rapidly evolving events by using daily-constructed word co-occurrence networks from ten percent of all tweets which include a ‘target-word’ signifying the event. Specifically, we use these word co-occurrence networks to perform efficient data cleaning tasks and to measure dynamics of the discourse at the micro- and meso- scales which goes beyond positive versus negative sentiment analysis. Due to the extreme scale and dimensionality of the data, we pay special attention to our computational methodologies for efficient processing, and due to the challenges mentioned, we perform rigorous analysis of the effects of different preprocessing and data cleaning methods, including across multiple existing methods as well as those we develop.