PuSH - Publikationsserver des Helmholtz Zentrums München

Richter, F.* ; Schäfer, B.* ; Böhm, K.*

A review of query systems for temporal n-gram corpora.

CEUR Workshop Proc. 4022, 18-31 (2025)
Natural languages evolve over time and with increasing digitalization these evolutions are quantitively studied in humanities and social sciences. One important observable is the frequency of individual words, as well as word tuples (n-grams) over time. Different tools exist to analyze these changing frequencies in large text corpora, with different levels of complexity and efficiency. However, a systematic overview and evaluation of the expressiveness and practical usability of these different tools is missing. In this article, we present a structured approach to such an evaluation by defining a query algebra and a set of information needs expressed therein, followed by a comparison of 12 different query systems. Overall, we identify several systems as similar to the Google Books Ngram Viewer (GBNV) or as systems specific to a subcorpus, and find that the theoretically most potent and flexible systems lack a practical implementation, pointing out further research needs.
Weitere Metriken?
Zusatzinfos bearbeiten [➜Einloggen]
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
Schlagwörter Google Books Ngram Corpus ; Query Algebra ; Query System
ISSN (print) / ISBN 1613-0073
Quellenangaben Band: 4022, Heft: , Seiten: 18-31 Artikelnummer: , Supplement: ,
Verlag RWTH
Verlagsort Aachen
Institut(e) Helmholtz AI - KIT (HAI - KIT)