PuSH - Publikationsserver des Helmholtz Zentrums München

Zwerschke, P.* ; Weyrauch, A.* ; Götz, M.* ; Debus, C.*

Taylor Expansion in Neural Networks: How Higher Orders Yield Better Predictions.

Front. Artif. Intell. 392, 2983-2989 (2024)
Verlagsversion DOI
Deep learning has become a popular tool for solving complex problems in a variety of domains. Transformers and the attention mechanism have contributed a lot to this success. We hypothesize that the enhanced predictive capabilities of the attention mechanism can be attributed to higher-order terms in the input. Expanding on this idea and taking inspiration from Taylor Series approximation, we introduce “Taylor layers” as higher order polynomial layers for universal function approximation. We evaluate Taylor layers of second and third order on the task of time series forecasting, comparing them to classical linear layers as well as the attention mechanism. Our results on two commonly used datasets demonstrate that higher expansion orders can improve prediction accuracy given the same amount of trainable model weights. Interpreting higher-order terms as a form of token mixing, we further show that second order (quadratic) Taylor layers can efficiently replace canonical dot-product attention, increasing prediction accuracy while reducing computational requirements.
Impact Factor
Scopus SNIP
Altmetric
3.000
0.475
Tags
Anmerkungen
Besondere Publikation
Auf Hompepage verbergern

Zusatzinfos bearbeiten
Eigene Tags bearbeiten
Privat
Eigene Anmerkung bearbeiten
Privat
Auf Publikationslisten für
Homepage nicht anzeigen
Als besondere Publikation
markieren
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
Sprache englisch
Veröffentlichungsjahr 2024
HGF-Berichtsjahr 2024
ISSN (print) / ISBN 2624-8212
e-ISSN 2624-8212
Quellenangaben Band: 392, Heft: , Seiten: 2983-2989 Artikelnummer: , Supplement: ,
Verlag Frontiers
Begutachtungsstatus Peer reviewed
Institut(e) Helmholtz AI - KIT (HAI - KIT)
Scopus ID 85216667330
Erfassungsdatum 2025-02-11