Career-long data outperforms recent data in predicting the publication success of physicists

Authors

  • Inez Cavalcanti Dantas Universidade Federal do Maranhão, Centro de Ciências Exatas e Tecnológicas, Programa de Pós-Graduação em Ciência da Computação. https://orcid.org/0009-0009-1955-1957
  • Antonio de Abreu Batista-Júnior Universidade Federal do Maranhão, Centro de Ciências Exatas e Tecnológicas, Programa de Pós-Graduação em Ciência da Computação. https://orcid.org/0000-0002-6013-9704
  • Luciano Reis Coutinho Universidade Federal do Maranhão, Centro de Ciências Exatas e Tecnológicas, Programa de Pós-Graduação em Ciência da Computação. https://orcid.org/0000-0001-7996-7334
  • Jesús Pascual Mena-Chalco Universidade Federal do ABC, Centro de Matemática, Computação e Cognição, Programa de Pós-Graduação em Ciência da Computação. https://orcid.org/0000-0001-7509-5532

Keywords:

Machine learning, Multilayer perceptron, Neural networks, Physicists. Success prediction

Abstract

Predicting the future success of researchers is an important topic that is attracting increasing attention in the research community. However, there are no comprehensive studies that show which predictive bibliometric indices best predict the short-term success of researchers. It does not matter whether these are metrics based on the researcher’s most recent publications or metrics calculated based on all publications throughout the researcher’s career. In this study, we are interested in how the two time windows used as the basis for calculating predictive bibliometric indices affect the empirical results of a classifier in predicting the success of researchers. Using the American Physical Society dataset, we compare the performance of the classifiers using 10-fold cross-validation to determine the most accurate classifier in both scenarios described above. The results of this study disprove our initial hypothesis that it is better to train with bibliometric indices computed from recent publications. This result suggests that there is a need to assess scientists more comprehensively, as focusing on recent publications and not taking into account results from earlier stages of their career may lead to poorer results in predicting scientists’ publication success.

Downloads

Download data is not yet available.

References

Acuna, D.E.; Allesina, S.; Kording, K.P. Predicting scientific success. Nature, v. 489, n. 7415, p. 201-202, 2012. Doi: https://doi.org/10.1038/489201a.

Abramo, G.; D’Angelo, C.A.; Felici, G. Predicting publication long-term impact through a combination of early citations and journal impact factor. Journal of Informetrics, v. 13, n. 1, p. 32-49, 2019. Doi: https://doi.org/10.1016/j.joi.2018.11.003.

Akella, A.P. et al. Early indicators of scientific impact: predicting citations with altmetrics. Journal of Informetrics, v. 15, n. 2, p. 101128, 2021. Doi: https://doi.org/10.1016/j.joi.2020.101128.

Ayaz, S.; Masood, N.; Islam, M.A. Predicting scientific impact based on h-index. Scientometrics, v. 114, n. 3, p. 993-1010, 2018. Doi: https://doi.org/10.1007/s11192-017-2618-1.

Bai, X.; Zhang, F.; Lee, I. Predicting the citations of scholarly paper. Journal of Informetrics, v. 13, n. 1, p. 407-418, 2019. Doi: https://doi.org/10.1016/j.joi.2019.01.010.

Chicco, D.; Jurman, G. The advantages of the Matthews Correlation Coefficient (MCC) over f1 score and accuracy in binary classification evaluation. BMC Genomics, v. 21, n. 1, p. 6, 2020. Doi: https://doi.org/10.1186/s12864-019-6413-7.

Frietsch, R.; Gruber, S.; Bornmann, L. The definition of highly cited researchers: the effect of different approaches on the empirical outcome. Scientometrics, 2025. Doi: https://doi.org/10.1007/s11192-024-05158-1.

Hand, D.J.; Christen, P.; Kirielle, N. F*: an interpretable transformation of the F-measure. Machine Learning, v. 110, n. 3, p. 451-456, 2021. Doi: https://doi.org/10.1007/s10994-021-05964-1.

Hirako, J.; Sasano, R.; Takeda, K. Realistic citation count prediction task for newly published papers. In: Vlachos, A.; Augenstein, I. (ed.). Findings of the Association for Computational Linguistics: EACL 2023. Dubrovnik, Croatia: Association for Computational Linguistics, 2023. p. 1131-1141. Doi: https://doi.org/10.18653/v1/2023.findings-eacl.84.

Hirsch, J.E. An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences, v. 102, n. 46, p. 16569-16572, 2005. Doi: https://doi.org/10.1073/pnas.0507655102.

Koltun, V.; Hafner, D. The h-index is no longer an effective correlate of scientific reputation. Plos One, v. 16, n.6, p. 1-16, 2021. Doi: https://doi.org/10.1371/journal.pone.0253397.

Manolopoulos, Y. et al. Predicting the dynamics of research impact. Cham: Springer International Publishing, 2021.

Marques-Cruz, M. et al. Ten year citation prediction model for systematic reviews using early years citation data. Scientometrics, v. 129, n. 8, p. 4847-4862, 2024. Doi: https://doi.org/10.1007/s11192-024-05105-0.

Momeni, F.; Mayr, P.; Dietze, S. Investigating the contribution of author- and publication-specific features to scholars’ h-index prediction. EPJ Data Science, v. 12, n. 1, 2023. Doi: https://doi.org/10.1140/epjds/s13688-023-00421-6.

Sinatra, R. et al. Quantifying the evolution of individual scientific impact. Science, v. 354, n. 6312, p. aaf5239, 2016. Doi: https://doi.org/10.1126/science.aaf5239.

Teplitskiy, M. et al. How status of research papers affects the way they are read and cited. Research Policy, v. 51, n. 4, p. 104484, 2022. Doi: https://doi.org/10.1016/j.respol.2022.104484.

Thai-Nghe, N.; Gantner, Z.; Schmidt-Thieme, L. Cost-sensitive learning methods for imbalanced data. In: The 2010 International Joint Conference on Neural Networks (IJCNN). [S.l.: s.n.], 2010. p. 1-8.

Xia, W.; Li, T.; Li, C. A review of scientific impact prediction: tasks, features and methods. Scientometrics, v. 128, n. 1, p. 543-585, 2023. Doi: https://doi.org/10.1007/s11192-022-04547-8.

Zhao, Q.; Feng, X. Utilizing citation network structure to predict paper citation counts: A Deep learning approach. Journal of Informetrics, v. 16, n. 1, p. 101235. 2022. Doi: https://doi.org/10.1016/j.joi.2021.101235.

Published

2025-11-24

How to Cite

Cavalcanti Dantas, I., de Abreu Batista-Júnior, A., Reis Coutinho, L., & Pascual Mena-Chalco, J. (2025). Career-long data outperforms recent data in predicting the publication success of physicists. Transinformação, 37. Retrieved from https://seer.sis.puc-campinas.edu.br/transinfo/article/view/15056

Issue

Section

Original