PuSH - Publikationsserver des Helmholtz Zentrums München: Taylor Expansion in Neural Networks: How Higher Orders Yield Better Predictions.

Informationen

Hinweise zu Qualitätskriterien von Zeitschriften

Open-Access-Richtlinie der Helmholtz-Gemeinschaft 2016

CC Licencences

Metriken für Publikationen

Navigation

Startseite

English

EVA: Veröffentlichen

Neuer EVA Antrag

Recherche

Erweiterte Suche

Durchblättern nach ...

... HMGU-Autoren/Konsortien

... Organisationsstruktur

... Zeitschriften

... Publikationstypen

... Forschungsdaten

... Arbeitsgruppen

... Erscheinungsjahr

Publikationen im Überblick

Statistik

HGF Fortschrittsbericht

OA Publikationen

Eintragen

Neue Publikation eintragen

Neue Publikation holen aus...

...EVA

Fehlende Publikation melden

Highlights

Suche

Hilfe & Kontakt

Ansprechpartner

Hilfe

Datenschutz

Helmholtz Open Science

Bibliometrische Indikatoren

SHERPA/RoMEO

DOAJ

Export:

Text

Endnote (RIS) BIB

BibTeX

Zwerschke, P.* ; Weyrauch, A.* ; Götz, M.* ; Debus, C.*

Taylor Expansion in Neural Networks: How Higher Orders Yield Better Predictions.

Front. Artif. Intell. 392, 2983-2989 (2024)

Verlagsversion

DOI

Abstract
Metriken
Zusatzinfos

Deep learning has become a popular tool for solving complex problems in a variety of domains. Transformers and the attention mechanism have contributed a lot to this success. We hypothesize that the enhanced predictive capabilities of the attention mechanism can be attributed to higher-order terms in the input. Expanding on this idea and taking inspiration from Taylor Series approximation, we introduce “Taylor layers” as higher order polynomial layers for universal function approximation. We evaluate Taylor layers of second and third order on the task of time series forecasting, comparing them to classical linear layers as well as the attention mechanism. Our results on two commonly used datasets demonstrate that higher expansion orders can improve prediction accuracy given the same amount of trainable model weights. Interpreting higher-order terms as a form of token mixing, we further show that second order (quadratic) Taylor layers can efficiently replace canonical dot-product attention, increasing prediction accuracy while reducing computational requirements.

Impact Factor

Scopus SNIP

Altmetric

3.000

0.475