Text analysis with Quanteda: an introduction

Authors

DOI:

https://doi.org/10.3145/infonomy.25.040

Keywords:

Text analysis, Quanteda, Digital humanities, R, Quantitative analysis, Text mining, Natural language processing, Topic modeling, Data visualization, Computational methodology

Abstract

Quantitative text analysis has become a core methodology within the computational turn in the social sciences and humanities. This paper presents an applied introduction to the Quanteda package for systematic text analysis in the R environment. The aim is to demonstrate, through a tutorial-oriented approach, how to design and implement a basic quantitative text analysis workflow. A set of diary entries by Alexei Navalny is used as a case study to illustrate corpus loading, preprocessing, tokenization, and stopword removal. Results obtained from frequency analysis, word clouds, and topic modeling show Quanteda’s ability to identify relevant lexical and thematic patterns. The paper demonstrates that Quanteda is an accessible and reproducible tool, particularly well suited for teaching contexts and exploratory research in digital humanities.

Author Biographies

Sergio Castro-Cortacero, Universidad de Granada

Nicolás Robinson-García, Universidad de Granada

References

Arcila-Calderón, Carlos; Barbosa-Caro, Eduar; Cabezuelo-Lorenzo, Francisco (2016). Técnicas big data: Análisis de textos a gran escala para la investigación científica y periodística. El Profesional de la Información, 25(4), 623-631. https://doi.org/10.3145/epi.2016.jul.12

Arnold, Taylor; Ballier, Nicolas; Lissón, Paula; Tilton, Lauren (2019). Beyond lexical frequencies: Using R for text analysis in the digital humanities. Language Resources and Evaluation, 53(4), 707-733. https://doi.org/10.1007/s10579-019-09456-6

Benoit, Kenneth; Watanabe, Kohei; Wang, Haiyan; Nulty, Paul; Obeng, Adam; Müller, Stefan; Matsuo, Akitaka (2018). Quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/10.21105/joss.00774

Berry, David M. (2011). The computational turn: Thinking about the digital humanities. Culture Machine, 12. https://culturemachine.net/wp-content/uploads/2019/01/10-Computational-Turn-440-893-1-PB.pdf

Gallego-Cuiñas, Ana; Torres-Salinas, Daniel (Eds.). (2024). Humanities and Big Data in Ibero-America: Theory, methodology and practical applications. De Gruyter. https://doi.org/10.1515/9783110753523

Grolemund, G. (2014). Hands-on programming with R: Write your own functions and simulations (Primer). O’Reilly Media.

Navalny, Alexéi (2024). Patriot: A memoir. Alfred A. Knopf.

R Core Team (2021). R: A Language and environment for statistical computing (Version 4.1.2) [Software]. R Foundation for Statistical Computing. https://www.r-project.org

Stone, Philip J. (2020). Thematic text analysis: New agendas for analyzing text content. In C. W. Roberts (Ed.). Text analysis for the social sciences (1st ed., pp. 35-54). Routledge. https://doi.org/10.4324/9781003064060-3

Wickham, H.; Grolemund, G. (2017). R for data science: Import, tidy, transform, visualize, and model data (Primer). O’Reilly Media.

Published

2026-01-14

How to Cite

Castro-Cortacero, S., & Robinson-García, N. (2026). Text analysis with Quanteda: an introduction. Infonomy, 3(6). https://doi.org/10.3145/infonomy.25.040

Downloads

Download data is not yet available.

Dimensions

Issue

Section

Research