Nate: A Python Package for Integrating Network Analysis and Applied Natural Language Processing
John McLevey, Tyler Crick, Pierson BrowneThis paper introduces Nate, an open-source Python package for empirical research at the intersection of network analysis and applied natural language processing. Nate is designed to (1) construct various kinds of networks from unstructured text sources; (2) enrich existing social network datasets with structured information extracted or computed from text sources; and (3) easily manage useful metadata on node and edge attributes, including information about when and where specific connections form. It scales efficiently to large and complex datasets. In this paper, we explain the motivation for Nate and describe some foundational design principles, and then demonstrate how to use Nate to construct and analyze networks.
Nate can construct a variety of social and semantic networks from unstructured source text. The nodes in these networks can be social actors (e.g. people, organizations) or semantic units (e.g. words, concepts, topics), and the edges can be directed or undirected. To construct directed networks, we extract sequences of subject > verb > object when the nodes are semantic units and actor_i > action > actor_j when the nodes are social actors. In both cases, we extract the directed relationships by parsing syntactic dependencies in the source texts and exploiting known structural properties of narratives. In the latter case, we also use state-of-the-art machine learning methods to identify social actors and other named entities. To construct undirected networks, we assign edges between nodes when they appear in the same semantic context, such as a phrase, sentence, paragraph, or a sliding window. Nodes can be supplied by the researchers themselves or extracted automatically from the text.
Nate can also construct socio-semantic networks. It does this primarily by enriching existing social networks with information extracted from, or computed from, text data produced by actors in a network. Examples include co-authorship networks enriched with the content of publications, social media interaction networks enriched with the text of shared messages, or the content of claims made by people in conversation with one another. Among other things, Nate can estimate degrees of similarity or difference between the nodes based on the content of the text they produce. To do this, we combine network data structures with word embedding models from natural language processing.
Nate facilitates multiple modes of analysis for all of the networks it constructs – social, semantic, or socio-semantic; directed or undirected. Each can be analyzed separately, or superimposed upon one another to form multi-level networks. Finally, Nate stores useful metadata on the context, timing, and sequence of connections between nodes in a network. This metadata enables novel analyses of the contexts and temporal dynamics of social, semantic, and socio-semantic networks.
Nate is well-suited for a wide variety of substantive research agendas, especially research that focuses in some way on cultural cognition (e.g. comparing frames, cultural schemas, or narratives used by subgroups of people in a social network), or where the context, timing, or sequential ordering of connections are important.