PuSH - Publication Server of Helmholtz Zentrum München

Richter, F.* ; Schäfer, B.* ; Böhm, K.*

A review of query systems for temporal n-gram corpora.

CEUR Workshop Proc. 4022, 18-31 (2025)
Natural languages evolve over time and with increasing digitalization these evolutions are quantitively studied in humanities and social sciences. One important observable is the frequency of individual words, as well as word tuples (n-grams) over time. Different tools exist to analyze these changing frequencies in large text corpora, with different levels of complexity and efficiency. However, a systematic overview and evaluation of the expressiveness and practical usability of these different tools is missing. In this article, we present a structured approach to such an evaluation by defining a query algebra and a set of information needs expressed therein, followed by a comparison of 12 different query systems. Overall, we identify several systems as similar to the Google Books Ngram Viewer (GBNV) or as systems specific to a subcorpus, and find that the theoretically most potent and flexible systems lack a practical implementation, pointing out further research needs.
Additional Metrics?
Edit extra informations Login
Publication type Article: Journal article
Document type Scientific Article
Keywords Google Books Ngram Corpus ; Query Algebra ; Query System
ISSN (print) / ISBN 1613-0073
Quellenangaben Volume: 4022, Issue: , Pages: 18-31 Article Number: , Supplement: ,
Publisher RWTH
Publishing Place Aachen
Institute(s) Helmholtz AI - KIT (HAI - KIT)