PuSH - Publikationsserver des Helmholtz Zentrums München

Fuchs, M.* ; Krautenbacher, N.

Minimization and estimation of the variance of prediction errors for cross-validation designs.

J. Stat. Theory Pract. 10, 420-443 (2016)
DOI
Open Access Green möglich sobald Postprint bei der ZB eingereicht worden ist.
We consider the mean prediction error of a classification or regression procedure as well as its cross-validation estimates, and investigate the variance of this estimate as a function of an arbitrary cross-validation design. We decompose this variance into a scalar product of coefficients and certain covariance expressions, such that the coefficients depend solely on the resampling design, and the covariances depend solely on the data’s probability distribution. We rewrite this scalar product in such a form that the initially large number of summands can gradually be decreased down to three under the validity of a quadratic approximation to the core covariances. We show an analytical example in which this quadratic approximation holds true exactly. Moreover, in this example, we show that the leave-p–out estimator of the error depends on p only by means of a constant and can, therefore, be written in a much simpler form. Furthermore, there is an unbiased estimator of the variance of K–fold cross-validation, in contrast to a claim in the literature. As a consequence, we can show that Balanced Incomplete Block Designs have smaller variance than K–fold cross-validation. In a real data example from the UCI machine learning repository, this property can be confirmed. We finally show how to find Balanced Incomplete Block Designs in practice.
Impact Factor
Scopus SNIP
Scopus
Cited By
Altmetric
0.370
0.753
5
Tags
Icb_biostatistics
Anmerkungen
Besondere Publikation
Auf Hompepage verbergern

Zusatzinfos bearbeiten
Eigene Tags bearbeiten
Privat
Eigene Anmerkung bearbeiten
Privat
Auf Publikationslisten für
Homepage nicht anzeigen
Als besondere Publikation
markieren
Publikationstyp Artikel: Journalartikel
Dokumenttyp Wissenschaftlicher Artikel
Schlagwörter Cross-validation ; Design ; Model Selection ; U-statistic
Sprache englisch
Veröffentlichungsjahr 2016
HGF-Berichtsjahr 2016
ISSN (print) / ISBN 1559-8608
e-ISSN 1559-8616
Quellenangaben Band: 10, Heft: 2, Seiten: 420-443 Artikelnummer: , Supplement: ,
Verlag Taylor & Francis
Verlagsort Colchester
Begutachtungsstatus Peer reviewed
POF Topic(s) 30205 - Bioengineering and Digital Health
Forschungsfeld(er) Enabling and Novel Technologies
PSP-Element(e) G-503800-001
Scopus ID 84964687886
Erfassungsdatum 2016-04-19