Analysis of the evolution of scientific collaboration networks for the prediction of new co-authorships


Palabras clave:

Co-authorship networks, Scientific data repositories, Lattes Platform



When publishing an article with other authors, initial links must be formed by a collaboration between authors, a scientific collaboration network. In this context, the papers are represented by the edges, and the authors are represented the nodes, forming a network. At this moment, the following question arises: How does the evolution of the network occur over time? Understanding what factors are essential for creating a new connection to answer this question is necessary. Therefore, the purpose of this article is to foresee connections in co-authorship networks formed by PhDs with curricula registered in Lattes Platform in the areas of Information Sciences and Biology. The following steps are performed: initially the data is extracted and
organized. This step is essential for the continuity of the process. Then, co-authorship networks are generated based on articles published together. Subsequently, the attributes to be used are defined and some metrics are calculated. Finally, machine learning algorithms estimate future scientific collaborations in the selected areas. The Lattes Platform has 6.6 million resumes for researchers and represents one of the most relevant and recognized scientific repositories worldwide. As a result, random
forest and logistic regression algorithms showed the highest hit rates, and preferential attachment attribute was identified as the most influential in the emergence of new scientific collaborations. Through the results, it is possible to establish the evolution of the network of scientific associations of researchers at a national level, assisting development agencies in selecting of future
outstanding researchers.


