Comparing two approaches to bibliographic networks construction
Daria Maltseva, Vladimir BatageljIn bibliometric network analysis, there are two opposite strategies for bibliographic networks production, which are used in the studies. The first strategy is based on the extensive, though time-consuming, procedures of cleaning and entities resolution/disambiguation of the initial dataset. The second, so called “believe in statistics”, approach assumes that all important information in any case will pop up. It applies mostly automatic, and only partially manual, on important units identified by a preliminary analysis, data cleaning procedures. However, the question arises whether the first approach necessarily leads to better results or the second approach is really able to identify the most important parts of networks. Having the data sets on the same topic constructed in both ways, this question can be addressed.
The dataset on the literature in social network science produced by Lietz (2017), which consists of 25,760 bibliographical records retrieved from Web of Science up to 2012, required a lot of efforts in cleaning and disambiguation, and that’s why it can be seen as an example of the first strategy for networks construction. Another dataset on the literature in social network analysis presented by Maltseva and Batagelj (2019), consisting of 70,792 bibliographical records from Web of Science up to 2018, can be seen as an example of the second strategy. As the two datasets are devoted to the same topic, the parallel datasets can be constructed (both up to 2012) and used for the analysis. Comparing the results, we check whether several methods lead to equal or different results and evaluate the differences between them.