Fleischmann, S.* ; Dietz, S.* ; Shanbhag, J.* ; Wuensch, A.* ; Nitschké, M.J.E.* ; Miehling, J.* ; Wartzack, S.* ; Leyendecker, S.* ; Eskofier, B.M. ; Koelewijn, A.D.*
Exploring dataset bias and scaling techniques in multi-source gait biomechanics: An explainable machine learning approach.
ACM Trans. Intell. Syst. Technol. 16:20 (2024)
Machine learning has become increasingly important in biomechanics. It allows to unveil hidden patterns from large and complex data, which leads to a more comprehensive understanding of biomechanical processes and deeper insights into human movement. However, machine learning models are often trained on a single dataset with a limited number of participants, which negatively affects their robustness and generalizability. Combining data from multiple existing sources provides an opportunity to overcome these limitations without spending more time on recruiting participants and recording new data. It is furthermore an opportunity for researchers who lack the financial requirements or laboratory equipment to conduct expensive motion capture studies themselves. At the same time, subtle interlaboratory differences can be problematic in an analysis due to the bias that they introduce. In our study, we investigated differences in motion capture datasets in the context of machine learning, for which we combined overground walking trials from four existing studies. Specifically, our goal was to examine whether a machine learning model was able to predict the original data source based on marker and GRF trajectories of single strides and how different scaling methods and pooling procedures affected the outcome. Layer-wise relevance propagation was applied to understand which factors were influential to distinguish the original data sources. We found that the model could predict the original data source with a very high accuracy (up to 99%), which decreased by about 15 percentage points when we scaled every dataset individually prior to pooling. However, none of the proposed scaling methods could fully remove the dataset bias. Layer-wise relevance propagation revealed that there was not only one single factor that differed between all datasets. Instead, every dataset had its unique characteristics that were picked up by the model. These variables differed between the scaling and pooling approaches but were mostly consistent between trials belonging to the same dataset. Our results show that motion capture data is sensitive even to small deviations in marker placement and experimental setup and that small inter-group differences should not be overinterpreted during data analysis, especially when the data was collected in different labs. Furthermore, we recommend scaling datasets individually prior to pooling them which led to the lowest accuracy. We want to raise awareness that differences in datasets always exist and are recognizable by machine learning models. Researchers should thus think about how these differences might affect their results when combining data from different studies.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
Publikationstyp
Artikel: Journalartikel
Dokumenttyp
Wissenschaftlicher Artikel
Typ der Hochschulschrift
Herausgeber
Schlagwörter
Biomechanics ; Dataset Combination ; Datasets ; Explainable Ai ; Lrp ; Machine Learning ; Motion Capture ; Neural Networks ; Scaling
Keywords plus
Sprache
englisch
Veröffentlichungsjahr
2024
Prepublished im Jahr
0
HGF-Berichtsjahr
2025
ISSN (print) / ISBN
2157-6904
e-ISSN
2157-6912
ISBN
Bandtitel
Konferenztitel
Konferzenzdatum
Konferenzort
Konferenzband
Quellenangaben
Band: 16,
Heft: 1,
Seiten: ,
Artikelnummer: 20
Supplement: ,
Reihe
Verlag
Association for Computing Machinery
Verlagsort
1601 Broadway, 10th Floor, New York, Ny Usa
Tag d. mündl. Prüfung
0000-00-00
Betreuer
Gutachter
Prüfer
Topic
Hochschule
Hochschulort
Fakultät
Veröffentlichungsdatum
0000-00-00
Anmeldedatum
0000-00-00
Anmelder/Inhaber
weitere Inhaber
Anmeldeland
Priorität
Begutachtungsstatus
Peer reviewed
POF Topic(s)
30205 - Bioengineering and Digital Health
Forschungsfeld(er)
Enabling and Novel Technologies
PSP-Element(e)
G-540008-001
Förderungen
Deutsche Forschungsgemeinschaft (DFG, German Research foundation)
Copyright
Erfassungsdatum
2025-04-15