Analyzing collections of annotated connections: automated text analysis at the edge, vertex, group and graph resolutions with NodeXL

Marc Smith, Harald Meier

Contact: marc@smrfoundation.org

Text analysis is enhanced when integrated with network analysis. Text analysis of a single "bag-of-words" is a common approach to understanding collections of content. In contrast, network analysis can improve text analysis by organizing content into four resolutions, from the individual message or edge, to the vertex (a collection of messages), the group (a collection of vertices), as well as the network graph as a whole. Using NodeXL, an analysis pipeline starting with data collection, through network cluster and vertex analysis, to text analysis can be easily performed. NodeXL can be automated and applied to a wide range of types of annotated connections. It outputs text analysis data at each of the four network resolutions. Social media data is a good domain for applying the process of building clusters from patterns of connections. Using NodeXL we demonstrate the ways network-driven text analysis reveals significant differences between the content associated with different clusters and between each cluster and the global content summary. Leveraging the sociological process of homophily, network analysis can cluster text into relevant collections based on patterns of observed interactions. We present the results from an analysis of recent social media content about politics that illustrates the ways network clusters cleanly separate sub-groups who might otherwise be conflated. This approach highlights the diversity of opinions around controversial topics.

← Schedule