as soon as is submitted to ZB.
Taylor Expansion in Neural Networks: How Higher Orders Yield Better Predictions.
Front. Artif. Intell. 392, 2983-2989 (2024)
Deep learning has become a popular tool for solving complex problems in a variety of domains. Transformers and the attention mechanism have contributed a lot to this success. We hypothesize that the enhanced predictive capabilities of the attention mechanism can be attributed to higher-order terms in the input. Expanding on this idea and taking inspiration from Taylor Series approximation, we introduce “Taylor layers” as higher order polynomial layers for universal function approximation. We evaluate Taylor layers of second and third order on the task of time series forecasting, comparing them to classical linear layers as well as the attention mechanism. Our results on two commonly used datasets demonstrate that higher expansion orders can improve prediction accuracy given the same amount of trainable model weights. Interpreting higher-order terms as a form of token mixing, we further show that second order (quadratic) Taylor layers can efficiently replace canonical dot-product attention, increasing prediction accuracy while reducing computational requirements.
Impact Factor
Scopus SNIP
Altmetric
3.000
0.475
Annotations
Special Publikation
Hide on homepage
Publication type
Article: Journal article
Document type
Scientific Article
Language
english
Publication Year
2024
HGF-reported in Year
2024
ISSN (print) / ISBN
2624-8212
e-ISSN
2624-8212
Quellenangaben
Volume: 392,
Pages: 2983-2989
Publisher
Frontiers
Reviewing status
Peer reviewed
Institute(s)
Helmholtz AI - KIT (HAI - KIT)
Scopus ID
85216667330
Erfassungsdatum
2025-02-11