Examining the Sensitivity of Empirical Reference Distributions for Networks
Zachary GibsonThe ability to compare networks is essential. Whether understanding how brains function (van Wijk, Stam, & Daffertshofer, 2010), identifying differences between healthy and unhealthy individuals (Christakis & Fowler, 2007, 2008a, 2008b), examining the effectiveness of community interventions (Provan & Milward, 1995), or distinguishing proteins from one another (Ideker & Sharan, 2008), networks increasingly lie at the heart of research. Comparison becomes even more necessary when examining how phenomena of interest change in response to shifts in network structure, including both natural shifts over time and shifts that result from an intervention (Valente, 2012). However, without established techniques for network comparison, both scholars and practitioners lack the ability to establish why and how networks differ structurally and how networks relate to various outcomes of interest.
Scholars have thus explored a handful of approaches over the last 20 years to grapple with the mathematical challenges behind network comparison. The earliest approaches clarified a number of the challenges present in network comparison, including size-density dependence (Anderson, Butts, & Carley, 1999), whether to pool networks into a single sample (Martins, 1999), and how to compute standard errors of network indices (Snijders & Borgatti, 1999). Later, with the rise of exponential random graph models (ERGMs), a small wave rose that combined ERGMs with correspondence analysis (e.g., Box-Steffensmeier & Christenson, 2015; Faust, 2006; Faust & Skvoretz, 2002). These analyses have provided a clever but complex way of drawing certain conclusions about how networks of different relations and actors differ.
As a result of these challenges, most network comparative studies either directly compare network indices—which is often inappropriate—or seek to compare networks qualitatively. These issues of complexity and conceptual meaning have pervaded the comparison of networks. Now, a promising new “bootstrapping” technique, published by Smith and colleagues (2016), may offer a way to overcome these issues and provide fertile ground for future comparative research. The current study uses this technique and addresses the following questions: 1) how sensitive the technique is to the sample used for bootstrapping, and 2) whether the technique is applicable to multiplex data. By addressing its sensitivity, the study asserts either its ease-of-use and generalizability or serves as a guide for when this technique is appropriate for use. Further, through an exploratory analysis of multiplex data, this study answers whether the technique is sensitive to the multiplexity of network data and how such multiplex data might be compared.
To examine the first question, I analyze distributions of network indices from the L.A. FANS data set, also used in the original publication first using a leave-one-out method, and then a leave-n-out method at three different levels of sampling. This provides insight into whether the technique is sensitive to which networks from the sample are used in bootstrapping and the number of simulations necessary to develop an adequate reference distribution. I then address the second question by applying this technique to Krackhardt’s office data and drawing comparisons between distributions generated using only the advice data, using only the friendship data, and using both sets.