Career-long data outperforms recent data in predicting the publication success of physicists
Keywords:
Machine learning, Multilayer perceptron, Neural networks, Physicists. Success predictionAbstract
Predicting the future success of researchers is an important topic that is attracting increasing attention in the research community. However, there are no comprehensive studies that show which predictive bibliometric indices best predict the short-term success of researchers. It does not matter whether these are metrics based on the researcher’s most recent publications or metrics calculated based on all publications throughout the researcher’s career. In this study, we are interested in how the two time windows used as the basis for calculating predictive bibliometric indices affect the empirical results of a classifier in predicting the success of researchers. Using the American Physical Society dataset, we compare the performance of the classifiers using 10-fold cross-validation to determine the most accurate classifier in both scenarios described above. The results of this study disprove our initial hypothesis that it is better to train with bibliometric indices computed from recent publications. This result suggests that there is a need to assess scientists more comprehensively, as focusing on recent publications and not taking into account results from earlier stages of their career may lead to poorer results in predicting scientists’ publication success.
Downloads
References
Acuna, D.E.; Allesina, S.; Kording, K.P. Predicting scientific success. Nature, v. 489, n. 7415, p. 201-202, 2012. Doi: https://doi.org/10.1038/489201a.
Abramo, G.; D’Angelo, C.A.; Felici, G. Predicting publication long-term impact through a combination of early citations and journal impact factor. Journal of Informetrics, v. 13, n. 1, p. 32-49, 2019. Doi: https://doi.org/10.1016/j.joi.2018.11.003.
Akella, A.P. et al. Early indicators of scientific impact: predicting citations with altmetrics. Journal of Informetrics, v. 15, n. 2, p. 101128, 2021. Doi: https://doi.org/10.1016/j.joi.2020.101128.
Ayaz, S.; Masood, N.; Islam, M.A. Predicting scientific impact based on h-index. Scientometrics, v. 114, n. 3, p. 993-1010, 2018. Doi: https://doi.org/10.1007/s11192-017-2618-1.
Bai, X.; Zhang, F.; Lee, I. Predicting the citations of scholarly paper. Journal of Informetrics, v. 13, n. 1, p. 407-418, 2019. Doi: https://doi.org/10.1016/j.joi.2019.01.010.
Chicco, D.; Jurman, G. The advantages of the Matthews Correlation Coefficient (MCC) over f1 score and accuracy in binary classification evaluation. BMC Genomics, v. 21, n. 1, p. 6, 2020. Doi: https://doi.org/10.1186/s12864-019-6413-7.
Frietsch, R.; Gruber, S.; Bornmann, L. The definition of highly cited researchers: the effect of different approaches on the empirical outcome. Scientometrics, 2025. Doi: https://doi.org/10.1007/s11192-024-05158-1.
Hand, D.J.; Christen, P.; Kirielle, N. F*: an interpretable transformation of the F-measure. Machine Learning, v. 110, n. 3, p. 451-456, 2021. Doi: https://doi.org/10.1007/s10994-021-05964-1.
Hirako, J.; Sasano, R.; Takeda, K. Realistic citation count prediction task for newly published papers. In: Vlachos, A.; Augenstein, I. (ed.). Findings of the Association for Computational Linguistics: EACL 2023. Dubrovnik, Croatia: Association for Computational Linguistics, 2023. p. 1131-1141. Doi: https://doi.org/10.18653/v1/2023.findings-eacl.84.
Hirsch, J.E. An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, v. 102, n. 46, p. 16569-16572, 2005. Doi: https://doi.org/10.1073/pnas.0507655102.
Koltun, V.; Hafner, D. The h-index is no longer an effective correlate of scientific reputation. Plos One, v. 16, n.6, p. 1-16, 2021. Doi: https://doi.org/10.1371/journal.pone.0253397.
Manolopoulos, Y. et al. Predicting the dynamics of research impact. Cham: Springer International Publishing, 2021.
Marques-Cruz, M. et al. Ten year citation prediction model for systematic reviews using early years citation data. Scientometrics, v. 129, n. 8, p. 4847-4862, 2024. Doi: https://doi.org/10.1007/s11192-024-05105-0.
Momeni, F.; Mayr, P.; Dietze, S. Investigating the contribution of author- and publication-specific features to scholars’ h-index prediction. EPJ Data Science, v. 12, n. 1, 2023. Doi: https://doi.org/10.1140/epjds/s13688-023-00421-6.
Sinatra, R. et al. Quantifying the evolution of individual scientific impact. Science, v. 354, n. 6312, p. aaf5239, 2016. Doi: https://doi.org/10.1126/science.aaf5239.
Teplitskiy, M. et al. How status of research papers affects the way they are read and cited. Research Policy, v. 51, n. 4, p. 104484, 2022. Doi: https://doi.org/10.1016/j.respol.2022.104484.
Thai-Nghe, N.; Gantner, Z.; Schmidt-Thieme, L. Cost-sensitive learning methods for imbalanced data. In: The 2010 International Joint Conference on Neural Networks (IJCNN). [S.l.: s.n.], 2010. p. 1-8.
Xia, W.; Li, T.; Li, C. A review of scientific impact prediction: tasks, features and methods. Scientometrics, v. 128, n. 1, p. 543-585, 2023. Doi: https://doi.org/10.1007/s11192-022-04547-8.
Zhao, Q.; Feng, X. Utilizing citation network structure to predict paper citation counts: A Deep learning approach. Journal of Informetrics, v. 16, n. 1, p. 101235. 2022. Doi: https://doi.org/10.1016/j.joi.2021.101235.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Transinformação

This work is licensed under a Creative Commons Attribution 4.0 International License.



