Structural Sources of Open Collaboration in GitHub’s Online Community

Chao Liu, Steve McDonald


The advent of online “open-source” platforms has transformed the ways that team-based work is done. These platforms have raised new questions about the conditions under which team-based voluntary collaboration is possible in a virtual platform. In other words, why would we expect people to contribute to the open source projects? While existing research has explored the attributes of individual team members that are associated with project involvement in online, less is known about the relational dynamics that may affect collaboration. Specifically, how does network embeddedness within an online community affect individual contributions to team-based production? We examine the question using the data from the GitHub community, which is an open source platform for software developers. The data represents a random sample of 581,273 users, 624,481 repositories, and 272,683 organizations from 2015. The software developers are users. They maintain and share different repositories, which serve as storehouses for software projects. Users may be affiliated with organizations, which are shared accounts that allow workers to collaborate on many projects at once. The entities do not equate to entire companies, as any one company might contain multiple organizations. The data contains a wide range of information about how users interact with other users and the repositories, including the extent to which users commit (contribute codes) to the repositories, issue the repositories, watch the repositories, and follow other users. The number of commits of each user serves as the key outcome variable. Network measures derive from the user-organization co-membership network. From this network, we obtain three indicators. First, degree centrality is a standardized measure of the number of different organizations to which a user is affiliated. Second, eigenvector centrality is a weighted measure of closeness to the center of the user-organization network. Third, constraint is a measure of the extent to which the organizations that a user is connected to tend to contain the same individuals. Centrality measures the extent to which users are central to the network of organizations. A high degree of centrality should be positively related to engagement and acceptance. Users who are central to the network are likely to contribute more to the repositories. Constraint provides a sense of whether users are embedded in tightly clustered sets of organizations and other users. Users who are members of tight clusters (or cliques) of organizations should exhibit lower levels of engagement. Therefore, users with a high degree of constraint are less likely to contribute to the repositories. The results supported the hypotheses. Preliminary results show that network degree and eigenvector centrality are positively associated with commits, while network constraint is negatively associated with commits. This suggests that relationships are especially important status signals in online communities, where ascriptive characteristics that might otherwise signal status are more difficult to discern. Organizational affiliation offers signals of reputation and status within and across communities. These signals affect the ways that individuals are perceived and treated, which has direct impacts on individuals’ behavior in virtual communities.

← Schedule