Extracting Conflict Networks from the News with Semantic Hypergraphs

Telmo Menezes, Camille Roth

Contact: telmo@telmomenezes.net

We recently introduced a formal language and knowledge representationmodel that we call "semantic hypergraph" (arXiv:1908.10784) and that aims at providing explicit structure and reducing ambiguity as much as possible in natural language. We take advantage of modern NLP tools, such as part-of-speech tagging, dependency parsing and co-reference resolution to transform free text into semantic hypergraphs, where units of speech (typically sentences) are represented as recursive ordered hyperedges. This bears advantages over conventional approaches such as semantic graphs, including the ability to build new concepts from existing ones, to organize statements into regular structures of predicates followed by an arbitrary number of entities, and to represent statements about other statements, at an arbitrary level of nesting. Of particular interest to this conference session, this representation facilitates the discovery of views of perspectives that summarize and aggregate a certain aspect of what is contained in the text, but that is potentially spread across a large corpus, possibly implicitly, and in an entangled fashion. Our contribution demonstrates the extraction of networks of conflicts between actors from large corpora of news headers. We consider two sets of news headers, collected from Reddit -- a popular social news aggregator and discussion forum. One set is extracted from the "worldnews" subreddit (dedicated to global news in English), the other from "politics" (dedicated to U.S. politics). They both cover the time period from Jan 1st, 2013 to Aug 1st, 2017 and consist of approximately 4 million headlines. We show how we use the semantic hypergraph model to identify actors as well as conflict relationships in the context of some topic. Furthermore, co-references (such as "President Obama", "Barack Obama") and some actor types (e.g., "female", "male", "non-human", "group") are automatically inferred. This higher-order inferred knowledge can then be used to generate both ego-centered conflict networks and contextual conflict networks (summarizing the network of conflicts surrounding a given topic). We automatically build, for example, the conflict network surrounding the topic of "Syria" in English-speaking world news. In this case, we use a simple alliance detection algorithm to show that the factions expressed in this conflict correspond convincingly to actors (countries, politicians and others) aligned with NATO on one side, and Russia/China on the other. More broadly, our contribution aims to advance the state-of-the art in text understanding for social and semantic network analysis by going beyond bag-of-word approaches and conventional topic models.

← Schedule