PuSH - Publication Server of Helmholtz Zentrum München: Taylor Expansion in Neural Networks: How Higher Orders Yield Better Predictions.

Information

Clues to quality of journals

Open Access Policy of Helmholtz Association 2016

CC Licencences

Publication Metrics

Navigation

Home

Deutsch

EVA: Electronic Publishing

New EVA Application

Research

Advanced Search

Browse by ...

... HMGU-Authors/Consortia

... Organizational Structure

... Journal

... Publication Type

... Research Data

... Arbeitsgruppen

... Publication Year

Publication overview

Statistics

Statistics (last 5 years)

OA Publications

Publish

Submit Publication

Import publication from...

...EVA

Report missing publication

Highlights

Support & Contact

Contact persons

Help

Data protection

Helmholtz Open Science

JANE Journal Estimator

SHERPA/RoMEO

DOAJ

Export:

Text

Endnote (RIS) BIB

BibTeX

Zwerschke, P.* ; Weyrauch, A.* ; Götz, M.* ; Debus, C.*

Taylor Expansion in Neural Networks: How Higher Orders Yield Better Predictions.

Front. Artif. Intell. 392, 2983-2989 (2024)

Publ. Version/Full Text

DOI

Abstract
Metrics
Extra information

Deep learning has become a popular tool for solving complex problems in a variety of domains. Transformers and the attention mechanism have contributed a lot to this success. We hypothesize that the enhanced predictive capabilities of the attention mechanism can be attributed to higher-order terms in the input. Expanding on this idea and taking inspiration from Taylor Series approximation, we introduce “Taylor layers” as higher order polynomial layers for universal function approximation. We evaluate Taylor layers of second and third order on the task of time series forecasting, comparing them to classical linear layers as well as the attention mechanism. Our results on two commonly used datasets demonstrate that higher expansion orders can improve prediction accuracy given the same amount of trainable model weights. Interpreting higher-order terms as a form of token mixing, we further show that second order (quadratic) Taylor layers can efficiently replace canonical dot-product attention, increasing prediction accuracy while reducing computational requirements.

Impact Factor

Scopus SNIP

Altmetric

3.000

0.475