## Scalable Statistics for Social Networks of Entire Societies

*Sahil Loomba, Nick Jones*

Social networks play a crucial role in determining social outcomes, particularly those related to people's health and well-being. The spread of health behaviors and outcomes---smoking, levels of obesity, mental health, opinions on vaccination, etc.---have network correlates of connectedness, reinforcement and influence. These notions may correspond to different measurements of the network structure, such as the number of friends, reciprocity of friendships, and modularity of societal networks. However, collecting complete topologies of entire communities is hard, particularly in regions with low technology penetration. Learning these measures scalably for individuals, cities, counties, countries, and continents, remains a challenge in sociology and public health.
Traditional social surveys are an inexpensive and efficient means of collecting data, but cannot illicit a full observation of even people's local networks. Using aggregate relational data (ARD) type questions that ask "what proportion of your friends are of the kind X?", one can make partial observations of how people make connections. Our objective is to develop a mathematical framework that can extract statistics of large-scale social networks from cheaply and widely available socio-demographic survey data, in a manner that is sociologically motivated, computationally feasible, directly interpretable, and amenable to hypothesis testing.
We present a method to learn connectivity models cheaply from egocentric surveys. Statistical models of networks define a probability space over graph structures, that once learnt can concisely summarize noisy real-world networks. Since we do not have access to the true network structure, we aim instead to query for distribution of network statistics. Employing stochastic block models (SBM) from the perspective of parameter tying, we analytically describe expectations and uncertainties of nodal, local and global network statistics. Centrality measures are statistics of particular interest to researchers in the field. Through a random-walk approach, we generate model-level definitions of popular centrality measures like Katz centrality, betweenness, closeness and communicability. This exemplifies the use of appropriate matrix functions for computing centralities, which can serve as network structural measures of social capital. In a nutshell, our approach allows us to analytically convert knowledge of an SBM into probability distributions over nodal properties, permitting an estimate for every individual in a society without knowing the full network, and without laborious sampling from the model.
To test the applicability of our model, we use a social survey study in the UK, called Understanding Society. This detailed study asks around 40,000 respondents ARD-type questions across age, sex, distance, ethnicity, education, employment and income, from which we learn the SBM. Our centrality measures explain variation in self-reported health and objective well-being, even after controlling for age and income. In particular, degree and homophily appear to be positively correlated to material deprivation. Among health behaviours, substance abuse does not seem to be informed by social connectivity, while its frequency does. Some of our conclusions concur with the relationship of traditional social network statistics and health. We hope that our method serves as a novel scalable tool in the sociologistâ€™s kit, that puts societal outcomes directly in the purview of social connections that drive them.← Schedule