Geosemantic Network of Greater Boston Neighborhoods [oral presentation preferred]
Dmitry Zinoviev, Aleksandra NenkoGeosemantic networks (GSNs) represent geographical, social, and anthropological aspects of compact spatialized communities, such as urban neighborhoods and metropolitan areas. They have become a crucial tool in defining spatial borders of cultures or culturally uniform communities and studying their evolution.
In general, a GSN is a weighted graph comprised of tag nodes (stems, words, or expressions). Two nodes are connected with an edge if they are somewhat similar, and the edge weight is a measure of the similarity. Each node belongs either to the geographical domain (location) or the semantic domain (social and anthropological phenomena, e.g., topics discussed by communities). Unlike bipartite networks, GSNs do not inhibit connections between nodes that belong to the same domain. Thus, they have three subsets of edges: homogeneous (geographic and semantic) and heterogeneous (cross-domain).
Ample, easily accessible data from major social networking websites make it possible to construct and analyze large geosemantic networks of the size of a metropolitan area. The goal of this study is to explore network neighborhoods and their interactions defined by each of the edge subsets. The study is based on the Instagram posts associated with the urban neighborhoods and suburbs of the Metropolitan Boston area. Our dataset consists of ~75,000 first comments (usually from the original posters) made over several weeks in 2019-2020. Sixty-six thousand of the comments have Instagram hashtags that we used to construct a geosemantic network. Two hashtags are connected if they are used together in a significant number of comments. For further study, we kept only those hashtags that appeared in the corpus at least 20 times for geographic tags and 75 times for the other tags. Finally, we eliminated all hashtags related directly to Boston as such (e.g., "#boston"), Instagram as a medium (e.g., "#igers"), and photography techniques (e.g., "bwphoto").
Both homogeneous subgraphs of the GSN have small size and excellent network community structure. The geographic subgraph has 363 nodes, 3,742 edges, and 19 node clusters that match the traditional Metro Boston neighborhoods, such as East Boston, Dorchester, and the North Shore. The semantic (socio-anthropological) subgraph has 885 nodes, 20,761 edges, and 28 node clusters referring to such topics as real estate, lifestyle, food, alcohol, pets, and small businesses.
We estimated the most likely socio-anthropological topics for each geographic neighborhood by cross-tabulating the number of posts that simultaneously refer to each neighborhood and each topic. The cross-tabulation matrix reveals strong preferences for each neighborhood. For example, the conglomerate of small towns to the west of Boston (such as Newton and Waltham) is strongly associated with hair saloons and local shopping, while the North Shore (such as Salem and Lynn) is known on Instagram for its #foodies.
Finally, we calculated VADER sentiment polarity scores for all posts from each neighborhood. The majority of the neighborhoods evoke moderately positive sentiments, with the notable exceptions of overly positive Coolidge Corner and Newton, and overly negative area of Massachusetts General Hospital. At the moment, we cannot find a relationship between preferred semantic topics and general sentiment levels.