Missing Tie Imputation in Sparse Networks with Ego-Centric Responses

Brennan Antone, Bálint Néray, Wolfgang Munar, Johan Koskinen, Noshir Contractor

Contact: williamantone2017@u.northwestern.edu

It is often practically impossible to collect full network data on many populations of interest; Yet, even a relatively small number of missing entities may bias estimated properties of network structure. Having an effective way to impute missing ties can help minimize these biases. A recent body of work has shown that network-based models (ERGMs) can be used to perform imputation. The advantage of this approach, in comparison to alternatives not using network models, is that interdependencies between ties - e.g. reciprocity, preferential attachment, etc. - may be identified based on observed ties and then used to inform the imputation of unobserved ties. This work has encompassed both frequentist (Khanna et al., 2018; Sha et al., 2018; Wang et al., 2016) and Bayesian (Koskinen et al., 2013) approaches for tie imputation. Extant work, in testing these approaches, has primarily focused on moderately sized networks of moderate density. When networks are sparse, and large, it introduces new statistical and computational challenges in estimating tie probabilities. To address these challenges, we propose a modified frequentist approach to imputing ties. Our approach entails estimating the probability of a missing tie conditional on the values of other observed ties and simulated values of the missing ties. We compare this modification of the frequentist approach to imputing ties in sparse networks with an approach using Bayesian estimation (Koskinen et al., 2013) and an existing approach using frequentist estimation (Wang et al., 2016). We evaluate each of these methods using network data collected from 665 individuals in a village in rural Kenya. In this region, promoting adoption of modern methods of contraception is a health issue. We have collected a complete sociocentric network of who reports discussing modern contraception with one another. This network is used to evaluate imputation models by simulating missing data by removing from our data set some egos (between 10% and 60% of the overall network) and all the information about their outgoing ties. We compare the imputation models on simulated response rates between 40% and 90% with the ground truth network to assess model performance. Model performance is evaluated both in terms of micro-level predictive performance metrics such as precision and recall (to what extent are the presence or absence of the imputed ties accurate), and macro-level predictive performance (to what extent do the imputed ties allow estimation of global properties of the ground truth network). We conclude with a principled evaluation of the strengths and weaknesses of the traditional frequentist approach, the modified frequentist approach for sparse networks and the Bayesian approach to network tie imputation.

← Schedule