möglich sobald bei der ZB eingereicht worden ist.
A review of query systems for temporal n-gram corpora.
CEUR Workshop Proc. 4022, 18-31 (2025)
Natural languages evolve over time and with increasing digitalization these evolutions are quantitively studied in humanities and social sciences. One important observable is the frequency of individual words, as well as word tuples (n-grams) over time. Different tools exist to analyze these changing frequencies in large text corpora, with different levels of complexity and efficiency. However, a systematic overview and evaluation of the expressiveness and practical usability of these different tools is missing. In this article, we present a structured approach to such an evaluation by defining a query algebra and a set of information needs expressed therein, followed by a comparison of 12 different query systems. Overall, we identify several systems as similar to the Google Books Ngram Viewer (GBNV) or as systems specific to a subcorpus, and find that the theoretically most potent and flexible systems lack a practical implementation, pointing out further research needs.
Weitere Metriken?
Zusatzinfos bearbeiten
[➜Einloggen]
Publikationstyp
Artikel: Journalartikel
Dokumenttyp
Wissenschaftlicher Artikel
Schlagwörter
Google Books Ngram Corpus ; Query Algebra ; Query System
ISSN (print) / ISBN
1613-0073
Zeitschrift
CEUR Workshop Proceedings
Quellenangaben
Band: 4022,
Seiten: 18-31
Verlag
RWTH
Verlagsort
Aachen
Institut(e)
Helmholtz AI - KIT (HAI - KIT)