as soon as is submitted to ZB.
A review of query systems for temporal n-gram corpora.
CEUR Workshop Proc. 4022, 18-31 (2025)
Natural languages evolve over time and with increasing digitalization these evolutions are quantitively studied in humanities and social sciences. One important observable is the frequency of individual words, as well as word tuples (n-grams) over time. Different tools exist to analyze these changing frequencies in large text corpora, with different levels of complexity and efficiency. However, a systematic overview and evaluation of the expressiveness and practical usability of these different tools is missing. In this article, we present a structured approach to such an evaluation by defining a query algebra and a set of information needs expressed therein, followed by a comparison of 12 different query systems. Overall, we identify several systems as similar to the Google Books Ngram Viewer (GBNV) or as systems specific to a subcorpus, and find that the theoretically most potent and flexible systems lack a practical implementation, pointing out further research needs.
Additional Metrics?
Edit extra informations
Login
Publication type
Article: Journal article
Document type
Scientific Article
Keywords
Google Books Ngram Corpus ; Query Algebra ; Query System
ISSN (print) / ISBN
1613-0073
Journal
CEUR Workshop Proceedings
Quellenangaben
Volume: 4022,
Pages: 18-31
Publisher
RWTH
Publishing Place
Aachen
Institute(s)
Helmholtz AI - KIT (HAI - KIT)