PuSH - Publication Server of Helmholtz Zentrum München

Zwerschke, P.* ; Weyrauch, A.* ; Götz, M.* ; Debus, C.*

Taylor Expansion in Neural Networks: How Higher Orders Yield Better Predictions.

Front. Artif. Intell. 392, 2983-2989 (2024)
Publ. Version/Full Text DOI
Deep learning has become a popular tool for solving complex problems in a variety of domains. Transformers and the attention mechanism have contributed a lot to this success. We hypothesize that the enhanced predictive capabilities of the attention mechanism can be attributed to higher-order terms in the input. Expanding on this idea and taking inspiration from Taylor Series approximation, we introduce “Taylor layers” as higher order polynomial layers for universal function approximation. We evaluate Taylor layers of second and third order on the task of time series forecasting, comparing them to classical linear layers as well as the attention mechanism. Our results on two commonly used datasets demonstrate that higher expansion orders can improve prediction accuracy given the same amount of trainable model weights. Interpreting higher-order terms as a form of token mixing, we further show that second order (quadratic) Taylor layers can efficiently replace canonical dot-product attention, increasing prediction accuracy while reducing computational requirements.
Impact Factor
Scopus SNIP
Altmetric
3.000
0.475
Tags
Annotations
Special Publikation
Hide on homepage

Edit extra information
Edit own tags
Private
Edit own annotation
Private
Hide on publication lists
on hompage
Mark as special
publikation
Publication type Article: Journal article
Document type Scientific Article
Language english
Publication Year 2024
HGF-reported in Year 2024
ISSN (print) / ISBN 2624-8212
e-ISSN 2624-8212
Quellenangaben Volume: 392, Issue: , Pages: 2983-2989 Article Number: , Supplement: ,
Publisher Frontiers
Reviewing status Peer reviewed
Institute(s) Helmholtz AI - KIT (HAI - KIT)
Scopus ID 85216667330
Erfassungsdatum 2025-02-11