The stopword dilemma in human review of automated processing

Authors

DOI:

https://doi.org/10.3145/infonomy.23.011

Keywords:

Empty words, Stopwords, Dictionaries, Pre-processing, Manual versus automated processes, Weak review, Role of librarians and information scientists

Abstract

The need to pre-process term corpora to eliminate stopwords is discussed, and the dilemma of doing it manually or using an automated system is presented. It is suggested that librarians-information scientists should work on the construction of dictionaries and the semi-automatic creation of domain-specific vocabularies.

Author Biography

Fernanda Peset, Universitat Politècnica de València

References

Blasco-Gil, Yolanda; González, Luis M.; Pavón-Romero, Armando; Mercado-Estrada, Mariano; Pavón-Romero, Carlos; Cabrera, Ana M.; Garzón-Farinós, Fernanda; Peset, Fernanda (2020). “Enriqueciendo la investigación en humanidades digitales. Análisis de textos de claustros académicos de la Universidad de Valencia (1775-1779) con KH Coder”. Revista española de documentación científica, v. 43, n. 1, e257. https://doi.org/10.3989/redc.2020.S1

Burns, Collin; Izmailov, Pavel; Kichner, Jan H.; Baker, Bowen; Gao, Leo; Aschenbrenner, Leopold; Chen, Yining; Ecoffet, Adrien; Joglekar, Manas; Leike, Jan; Sutskever, Ilya; Wu, Jeff (2023). Weak to strong generalization: Eliciting strong capabilities with weak supervision. https://cdn.openai.com/papers/weak-to-strong-generalization.pdf

Calabuig, José-Manuel; Ferrer-Sapena, Antonia; Garcia-Raffi, Lluís-Miquel; Peset, Fernanda; Sánchez-Pérez, Enrique A.; Sánchez-Del-Toro, M. Isabel (2023). “Algoritmos matemáticos para una inteligencia artificial responsable, ética y transparente”. Revista Valenciana d’Estudis Autonòmics, n. 68, pp. 283-305. https://presidencia.gva.es/es/web/begv-gavina/politica/-/asset_publisher/MBYQ47LTEnde/content/revista-valenciana-d-estudis-autonomics

Published

2023-12-30

How to Cite

Peset, F. (2023). The stopword dilemma in human review of automated processing. Infonomy, 1(1). https://doi.org/10.3145/infonomy.23.011

Issue

Section

Outreach