Algoritmos de clasificación de documentos científicos: pasado y presente
DOI:
https://doi.org/10.3145/infonomy.25.026Palabras clave:
Algoritmos de clasificación, Clasificaciones a nivel de documento, Clasificaciones, Clasificación de la ciencia, Bases de datos de ciencia, Cienciometría, Citación, Esquemas de clasificación, ASJC, Scopus, Web of ScienceResumen
Este trabajo se presenta como una recopilación de algoritmos de clasificación de la investigación a nivel de artículo como alternativa a las clasificaciones por revistas que se emplean en las grandes bases de datos de ciencia como Web of Science o Scopus, las cuales causan gran imprecisión en las búsquedas y en la evaluación de la ciencia, ya que utilizando éstas, los artículos no resultan categorizados con fidelidad respecto a su verdadero contenido. En primer lugar hacemos una revisión histórica de las principales ideas planteadas a lo largo de los años desde la misma aparición de las bases de datos, detectando sus contribuciones y sus limitaciones. Los algoritmos de agrupamiento automático y de detección de comunidades han supuesto grandes avances en organización de la ciencia, pero no resultan aplicables como alternativa a la clasificación por revistas. Otros algoritmos no son escalables al conjunto de la ciencia debido a su complejidad, como los basados en redes neuronales o minería de textos. Las propuestas más recientes y prometedoras responden a algoritmos sencillos que, partiendo de la categorización por revistas, reclasifican los artículos en las mismas jerarquías temáticas de las bases de datos, mediante el análisis de simples citas y referencias.Citas
Althouse, B. M.; West, J. D.; Bergstrom, C.T.; Bergstrom, T. (2009). Differences in impact factor across fields and over time. Journal of the Association for Information Science and Technology, 60(1), 27–34. https://doi.org/10.1002/asi.20936
Álvarez-Llorente, J. M. (2025). Nuevos algoritmos de clasificación de documentos científicos individuales basados en referencias para mejorar los análisis cienciométricos en las grandes bases de datos de ciencia [Doctoral thesis, University of Extremadura]. Institutional Repository of the University of Extremadura.
Álvarez-Llorente, J. M.; Guerrero-Bote, V. P.; De-Moya-Anegón, F. (2024). New fractional classifications of papers based on two generations of references and on the ASJC Scopus scheme. Scientometrics, 129(6), 3493–3515. https://doi.org/10.1007/s11192-024-05030-2
Álvarez-Llorente, J. M.; Guerrero-Bote, V. P.; De-Moya-Anegón, F. (2025). New paper-by-paper classification for Scopus based on references reclassified by the origin of the papers citing them. Journal of Informetrics, 19(2), 101647. https://doi.org/10.1016/j.joi.2025.101647
Álvarez-Llorente, J. M.; Guerrero‐Bote, V. P.; De-Moya-Anegón, F. (2023). Creating a collection of publications categorized by their research guarantors into the Scopus ASJC scheme. Profesional de la Información, 32(7). https://doi.org/10.3145/epi.2023.dic.04
Andersen, J. P. (2023). Field-level differences in paper and author characteristics across all fields of science in Web of Science, 2000-2020. Quantitative Science Studies, 4(2), 394–422. https://doi.org/10.1162/qss_a_00246
Blondel, V. D.; Guillaume, J. L.; Lambiotte, R.; Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.
Bornmann, L.; Leydesdorff, L. (2017). Skewness of citation impact data and covariates of citation distributions: A large-scale empirical analysis based on Web of Science data. Journal of Informetrics, 11(1), 164-175. http://dx.doi.org/10.1016/j.joi.2016.12.001
Bornmann, L.; Tekles, A.; Leydesdorff, L. (2019). How well does I3 perform for impact measurement compared to other bibliometric indicators? The convergent validity of several (field-normalized) indicators. Scientometrics, 119(2), 1187-1205. http://dx.doi.org/10.1007/s11192-019-03071-6
Boyack, K. W.; Klavans, R. (2010). Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? Journal of the Association for Information Science and Technology, 61(12), 2389–2404. https://doi.org/10.1002/asi.21419
Boyack, K. W.; Klavans, R. (2020). A comparison of large-scale science models based on textual, direct citation and hybrid relatedness. Quantitative Science Studies (1)4, 1570–1585. https://doi.org/10.1162/qss_a_00085
Boyack, K. W.; Newman, D.; Duhon, R. J.; Klavans, R.; Patek, M.; Biberstine, J. R.; Schijvenaars, B.; Skupin, A.; Ma, N.; Börner, K. (2011). Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLOS ONE, 6(3), Article e18029. https://doi.org/10.1371/journal.pone.0018029
Boyack, K. W.; Small, H.; Klavans, R. (2013). Improving the Accuracy of Co-citation Clustering Using Full Text. J Am Soc Inf Sci Tec, 64: 1759–1767. https://doi.org/10.1002/asi.22896
Chumachenko, A.; Kreminskyi, B.; Mosenkis, I.; Yakimenko, A. (2022). Dynamical entropic analysis of scientific concepts. Journal of Information Science, 48(4), 561–569. https://doi.org/10.1177/0165551520972034
Clauset, A.; Newman, M.; Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70(6). https://doi.org/10.1103/physreve.70.066111
De-Moya-Anegón, F.; Herrero-Solana, V.; Jiménez-Contreras, E. (2006). A connectionist and multivariate approach to science maps: the SOM, clustering and MDS applied to library and information science research. Journal of Information Science, 32(1), 63–77. https://doi.org/10.1177/0165551506059226
Ding, J.; Ahlgren, P.; Yang, L.; Yue, T. (2018). Disciplinary structures in Nature, Science and PNAS: Journal and country levels. Scientometrics, 116(3), 1817–1852. https://link.springer.com/article/10.1007/s11192-018-2812-9
Eykens, J.; Guns, R.; Engels, T. C. E. (2019). Article level classification of publications in sociology: An experimental assessment of supervised machine learning approaches. In: Proceedings of the 17th International Conference on Scientometrics & Informetrics, Rome (Italy), 2–5 September, 738–743. https://hdl.handle.net/10067/1630240151162165141
Fang, H. (2015). Classifying Research Articles in Multidisciplinary Sciences Journals into Subject Categories. Knowledge Organization, 42(3), 139–153.
https://doi.org/10.5771/0943-7444-2015-3-139
Glänzel, W.; Schubert, A.; Czerwon, H. (1999a). An item-by-item subject classification of papers published in multidisciplinary and general journals using reference analysis. Scientometrics, 44(3), 427–439. https://doi.org/10.1007/bf02458488
Glänzel, W.; Schubert, A.; Schoepflin, U.; Czerwon, H. (1999b). An item-by-item subject classification of papers published in journals covered by the SSCI database using reference analysis. Scientometrics, 46(3), 431–441. https://doi.org/10.1007/BF02459602
Glänzel, W.; Thijs, B.; Chi, PS. (2016). The challenges to expand bibliometric studies from periodical literature to monographic literature with a new data source: the book citation index. Scientometrics, 109, 2165–2179. https://doi.org/10.1007/s11192-016-2046-7
Glänzel, W.; Thijs, B.; Huang, Y. (2021). Improving the precision of subject assignment for disparity measurement in studies of interdisciplinary research. In: W. Glänzel, S. Heeffer, PS. Chi, R. Rousseau, Proceedings of the 18th International Conference of the International Society of Scientometrics and Informetrics (ISSI 2021), Leuven University Press, 453–464. https://kuleuven.limo.libis.be/discovery/fulldisplay?docid=lirias3394551&context=SearchWebhook&vid=32KUL_KUL:Lirias&search_scope=lirias_profile&tab=LIRIAS&adaptor=SearchWebhook&lang=en
Gläser, J.; Glänzel, W.; Scharnhorst, A. (2017). Same data—Different results? Towards a comparative approach to the identification of thematic structures in science. Scientometrics, 111(2), 981–998. https://doi.org/10.1007/s11192-017-2296-z
Glenisson, P.; Glänzel, W.; Janssens, F.; De-Moor, B. (2005). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing & Management, 41(6), 1548–1572. https://doi.org/10.1016/j.ipm.2005.03.021
Gómez-Crisóstomo, M. R. (2011). Study and comparison of the Web of Science and Scopus (1996-2007) [Doctoral thesis, University of Extremadura]. Institutional Repository of the University of Extremadura.
Guerrero-Bote, V. P.; De-Moya-Anegón, F. (2012). A further step forward in measuring journals’ scientific prestige: The SJR2 indicator. Journal of informetrics, 6(4), 674-688. https://doi.org/10.1016/j.joi.2012.07.001
Guerrero-Bote, V. P.; Zapico-Alonso, F.; Espinosa-Calvo, M. E.; Gómez-Crisóstomo, R.; De-Moya-Anegón, F. (2007). Import-export of knowledge between scientific subject categories: The iceberg hypothesis. Scientometrics, 71(3), 423–441. https://doi.org/10.1007/s11192-007-1682-3
Guerrero-Bote, V.P.; De-Moya-Anegón, F.; Herrero-Solana, V. (2002). Document organization using Kohonen’s algorithm. Information Processing and Management, 38(1), pp. 79-89. https://doi.org/10.1016/S0306-4573(00)00066-2
Hassan-Montero, Y.; De-Moya-Anegón, F.; Guerrero-Bote, V. P. (2022). SCImago Graphica: a new tool for exploring and visually communicating data. Profesional de la información, 31(5), Article e310502. https://doi.org/10.3145/epi.2022.sep.02
Hassan-Montero, Y.; Guerrero-Bote, V. P.; De-Moya-Anegón, F. (2014). Graphical interface of the SCImago Journal and Country Rank: an interactive approach to accessing bibliometric information. El profesional de la información, 23(3). http://dx.doi.org/10.3145/epi.2014.may.07
Huang, Y.; Glänzel, W.; Thijs, B.; Porter, A. L.; Zhang, L. (2021). The comparison of various similarity measurement approaches on interdisciplinary indicators (pp. 1–24). FEB - KU Leuven
Janssens, F.; Leta, J.; Glänzel, W.; De-Moor, B. (2006). Towards mapping library and information science. Information Processing & Management, 42(6), 1614–1642. https://doi.org/10.1016/j.ipm.2006.03.025
Janssens, F.; Glänzel, W.; De-Moor, B. (2008). A hybrid mapping of information science. Scientometrics, 75(3), 607–631. https://doi.org/10.1007/s11192-007-2002-7
Janssens, F.; Zhang, L.; De-Moor, B.; Glänzel, W. (2009). Hybrid clustering for validation and improvement of subject-classification schemes. Information Processing & Management, 45(6), 683–702. https://doi.org/10.1016/j.ipm.2009.06.003
Javitz, H.; Grimes, T.; Hill, D.; Rapoport, A.; Bell, R.; Fecso, R.; Lehming, R. (2010). U.S. Academic Scientific Publishing. Working paper SRS 11-201. Arlington, VA: National Science Foundation, Division of Science Resources Statistics.
Kandimalla, B.; Rohatgi, S.; Wu, J.; Giles, C. L. (2021). Large scale subject category classification of scholarly papers with deep attentive neural networks. Frontiers in Research Metrics and Analytics, 5, Article 600382. https://doi.org/10.3389/frma.2020.600382
Klavans, R.; Boyack, K. W. (2005). Identifying a better measure of relatedness for mapping science. Journal of the Association for Information Science and Technology, 57(2), 251-263. https://doi.org/10.1002/asi.20274
Klavans, R.; Boyack, K. W. (2006). Quantitative evaluation of large maps of science. Scientometrics, 68, 475–499. https://doi.org/10.1007/s11192-006-0125-x
Klavans, R.; Boyack, K. W. (2016). Which Type of Citation Analysis Generates the Most Accurate Taxonomy of Scientific and Technical Knowledge? Journal of the Association for Information Science and Technology, 68(4), 984–998. https://doi.org/10.1002/asi.23734
Lai, K.; Wu, S. (2005). Using the patent co-citation approach to establish a new patent classification system. Information Processing & Management, 41(2), 313–330. https://doi.org/10.1016/j.ipm.2003.11.004
Lancho-Barrantes, B. S.; Guerrero-Bote, V. P.; De-Moya Anegón, F. (2010b). What lies behind the averages and significance of citation indicators in different disciplines? Journal of Information Science, 36(3), 371-382. https://doi.org/10.1177/0165551510366077
Lancho-Barrantes, B. S.; Guerrero-Bote, V. P.; De-Moya-Anegón, F. (2010a). The iceberg hypothesis revisited. Scientometrics, 85(2), 443–461. http://dx.doi.org/10.1007/s11192-010-0209-5
Leydesdorff, L.; De-Moya‐Anegón, F.; Guerrero‐Bote, V. P. (2010). Journal maps on the basis of Scopus data: A comparison with the Journal Citation Reports of the ISI. Journal of the American Society for Information Science and Technology, 61(2), 352-369. http://dx.doi.org/10.1002/asi.21250
Leydesdorff, L.; De-Moya‐Anegón, F.; Guerrero‐Bote, V. P. (2015). Journal maps, interactive overlays, and the measurement of interdisciplinarity on the basis of scopus data (1996–2012). Journal of the Association for Information Science and Technology, 66(5), 1001-1016. http://dx.doi.org/10.1002/asi.23243
Li, K.; Chen, P.-Y.; Fang, Z. (2019). Disciplinarity of software papers: A preliminary analysis. Proceedings of the Association for Information Science and Technology (56), 706–708. https://doi.org/10.1002/pra2.143
Marshakova-Shaikevich, I. (2005). Bibliometric maps of field of science. Information Processing & Management, 41(6), 1534–1547. https://doi.org/10.1016/j.ipm.2005.03.027
McGillivray, B.; Astell, M. (2019). The relationship between usage and citations in an open access mega-journal. Scientometrics, 121, 817–838. https://doi.org/10.1007/s11192-019-03228-3
Milojević, S. (2020). Practical method to reclassify Web of Science articles into unique subject categories and broad disciplines. Quantitative science studies, 1(1), 183-206. https://doi.org/10.1162/qss_a_00014
Opthof, T.; Leydesdorff, L. (2010). Caveats for the journal and field normalizations in the CWTS (“Leiden”) evaluations of research performance. Journal of informetrics, 4(3), 423-430. https://doi.org/10.1016/j.joi.2010.02.003
Peña-Rocha, M.; Gómez-Crisóstomo, R.; Guerrero-Bote, V. P.; De-Moya-Anegón, F. (2025). Bibliometrics effects of a new paper level classification. Frontiers in Research Metrics and Analytics, 10. https://doi.org/10.3389/frma.2025.1531758
Rees-Potter, L. K. (1989). Dynamic thesaural systems: A bibliometric study of terminological and conceptual change in sociology and economics with application to the design of dynamic thesaural systems. Information Processing & Management, 25(6), 677–689. https://doi.org/10.1016/0306-4573(89)90101-5
Sachini, E.; Sioumalas-Christodoulou, K.; Christopoulos, S.; Karampekios, N. (2022) AI for AI: Using AI methods for classifying AI science documents. Quantitative Science Studies, 3(4), 1119–1132. https://doi.org/10.1162/qss_a_00223
Schildt, H.; Mattsson, J. (2006). A dense network sub-grouping algorithm for co-citation analysis and its implementation in the software tool Sitkis. Scientometrics, 67, 143–163. https://doi.org/10.1007/s11192-006-0054-8
Shu, F.; Julien, C.; Zhang, L.; Qiu, J.; Zhang, J.; Larivière, V. (2019). Comparing journal and paper level classifications of science. Journal of Informetrics, 13(1), 202–225. https://doi.org/10.1016/j.joi.2018.12.005
Šubelj, L.; Van Eck, N. J.; Waltman, L. (2016). Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods. PLOS ONE, 11(4), e0154404. https://doi.org/10.1371/journal.pone.0154404
Thelwall, M.; Pinfield, S. (2024). The accuracy of field classifications for journals in Scopus. Scientometrics, 129(2), 1097–1117. https://doi.org/10.1007/s11192-023-04901-4
Thijs, B.; Huang, Y.; Glänzel, W. (2021). Comparing different implementations of similarity for disparity and variety measures in studies on interdisciplinarity. FEB Research Report MSI_2103, Report No. MSI_2103. https://lirias.kuleuven.be/retrieve/610314
Van Eck, N.J.; Waltman, L. (2010). Software survey: VOSviewer, acomputer program for bibliometric mapping. Scientometrics, 84(2), 523–538. https://doi.org/10.1007/s11192-009-0146-3
Waltman, L.; Van Eck, N. J. (2012). A new methodology for constructing a publication-level classification system of science. Journal of the Association for Information Science and Technology, 63(12), 2378–2392. https://doi.org/10.1002/asi.22748
Waltman, L.; Boyack, K. W.; Colavizza, G.; Van Eck, N. J. (2020). A principled methodology for comparing relatedness measures for clustering publications. Quantitative Science Studies, 1(2), 691-713. https://doi.org/10.1162/qss_a_00035
Wang, Q.; Waltman, L. (2016). Large-scale analysis of the accuracy of the journal classification systems of Web of Science and Scopus. Journal of Informetrics, 10(2), 347-364. https://doi.org/10.1016/j.joi.2016.02.003
Zhang, J.; Shen Z. (2024). Analyzing journal category assignment using a paper-level classification system: multidisciplinary sciences journals. Scientometrics. https://doi.org/10.1007/s11192-023-04913-0
Zhang, L.; Janssens, F.; Liang, L.; Glänzel W. (2010). Journal cross-citation analysis for validation and improvement of journal-based subject classification in bibliometric research. Scientometrics, 82, 687–706. https://doi.org/10.1007/s11192-010-0180-1
Zhang, L.; Rousseau, R.; Glänzel, W. (2016). Diversity of references as an indicator of the interdisciplinarity of journals: Taking similarity between subject fields into account. Journal of the Association for Information Science and Technology, 67(5), 1257-1265. https://doi.org/10.1002/asi.23487
Zhang, L.; Sun, B.; Shu, F.; Huang, Y. (2022). Comparing paper level classifications across different methods and systems: an investigation of Nature publications. Scientometrics, 127(12), 7633–7651. https://doi.org/10.1007/s11192-022-04352-3
Descargas
Publicado
Cómo citar
Descargas
Dimensions
Número
Sección
Licencia
Derechos de autor 2025 Jesús M. Álvarez-Llorente, Vicente P. Guerrero-Bote, Félix De-Moya-Anegón

Esta obra está bajo una licencia internacional Creative Commons Atribución 4.0.