Text analysis with Quanteda: an introduction
DOI:
https://doi.org/10.3145/infonomy.25.040Keywords:
Text analysis, Quanteda, Digital humanities, R, Quantitative analysis, Text mining, Natural language processing, Topic modeling, Data visualization, Computational methodologyAbstract
Quantitative text analysis has become a core methodology within the computational turn in the social sciences and humanities. This paper presents an applied introduction to the Quanteda package for systematic text analysis in the R environment. The aim is to demonstrate, through a tutorial-oriented approach, how to design and implement a basic quantitative text analysis workflow. A set of diary entries by Alexei Navalny is used as a case study to illustrate corpus loading, preprocessing, tokenization, and stopword removal. Results obtained from frequency analysis, word clouds, and topic modeling show Quanteda’s ability to identify relevant lexical and thematic patterns. The paper demonstrates that Quanteda is an accessible and reproducible tool, particularly well suited for teaching contexts and exploratory research in digital humanities.References
Arcila-Calderón, Carlos; Barbosa-Caro, Eduar; Cabezuelo-Lorenzo, Francisco (2016). Técnicas big data: Análisis de textos a gran escala para la investigación científica y periodística. El Profesional de la Información, 25(4), 623-631. https://doi.org/10.3145/epi.2016.jul.12
Arnold, Taylor; Ballier, Nicolas; Lissón, Paula; Tilton, Lauren (2019). Beyond lexical frequencies: Using R for text analysis in the digital humanities. Language Resources and Evaluation, 53(4), 707-733. https://doi.org/10.1007/s10579-019-09456-6
Benoit, Kenneth; Watanabe, Kohei; Wang, Haiyan; Nulty, Paul; Obeng, Adam; Müller, Stefan; Matsuo, Akitaka (2018). Quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/10.21105/joss.00774
Berry, David M. (2011). The computational turn: Thinking about the digital humanities. Culture Machine, 12. https://culturemachine.net/wp-content/uploads/2019/01/10-Computational-Turn-440-893-1-PB.pdf
Gallego-Cuiñas, Ana; Torres-Salinas, Daniel (Eds.). (2024). Humanities and Big Data in Ibero-America: Theory, methodology and practical applications. De Gruyter. https://doi.org/10.1515/9783110753523
Grolemund, G. (2014). Hands-on programming with R: Write your own functions and simulations (Primer). O’Reilly Media.
Navalny, Alexéi (2024). Patriot: A memoir. Alfred A. Knopf.
R Core Team (2021). R: A Language and environment for statistical computing (Version 4.1.2) [Software]. R Foundation for Statistical Computing. https://www.r-project.org
Stone, Philip J. (2020). Thematic text analysis: New agendas for analyzing text content. In C. W. Roberts (Ed.). Text analysis for the social sciences (1st ed., pp. 35-54). Routledge. https://doi.org/10.4324/9781003064060-3
Wickham, H.; Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data (Primer). O’Reilly Media.
Downloads
Published
How to Cite
Downloads
Dimensions
Issue
Section
License
Copyright (c) 2025 Sergio Castro-Cortacero, Nicolás Robinson-García

This work is licensed under a Creative Commons Attribution 4.0 International License.