PuSH - Publikationsserver des Helmholtz Zentrums München: Differentially Private Active Learning: Balancing Effective Data Selection and Privacy.

Navigation

Startseite

English

Recherche

Erweiterte Suche

Durchblättern nach ...

... Zeitschriften

... Publikationstypen

... Forschungsdaten

... Erscheinungsjahr

Publikationen im Überblick

Hilfe & Kontakt

Ansprechpartner

Hilfe

Datenschutz

Schwethelm, K.* ; Kaiser, J.* ; Kuntzer, J.* ; Yigitsoy, M.* ; Rueckert, D.* ; Kaissis, G.

Differentially Private Active Learning: Balancing Effective Data Selection and Privacy.

In: (2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 09-11 April 2025, Copenhagen, DENMARK). 10662 Los Vaqueros Circle, Po Box 3014, Los Alamitos, Ca 90720-1264 Usa: Ieee Computer Soc, 2025. 858-878

DOI

Abstract
Metriken
Zusatzinfos

Active learning (AL) is a widely used technique for optimizing data labeling in machine learning by iteratively selecting, labeling, and training on the most informative data. However, its integration with formal privacy-preserving methods, particularly differential privacy (DP), remains largely underexplored. While some works have explored differentially private AL for specialized scenarios like online learning, the fundamental challenge of combining AL with DP in standard learning settings has remained unaddressed, severely limiting AL's applicability in privacy-sensitive domains. This work addresses this gap by introducing differentially private active learning (DP-AL) for standard learning settings. We demonstrate that naively integrating DP-SGD training into AL presents substantial challenges in privacy budget allocation and data utilization. To overcome these challenges, we propose step amplification, which leverages individual sampling probabilities in batch creation to maximize data point participation in training steps, thus optimizing data utilization. Additionally, we investigate the effectiveness of various acquisition functions for data selection under privacy constraints, revealing that many commonly used functions become impractical. Our experiments on vision and natural language processing tasks show that DP-AL can improve performance for specific datasets and model architectures. However, our findings also highlight the limitations of AL in privacy-constrained environments, emphasizing the trade-offs between privacy, model accuracy, and data selection accuracy.

Altmetric

Weitere Metriken?

[➜Einloggen]

Zusatzinfos bearbeiten [➜Einloggen]

Publikationstyp Artikel: Konferenzbeitrag

Schlagwörter active learning; differential privacy; data selection

ISSN (print) / ISBN 979-8-3315-1711-3

Konferenztitel 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)

Konferzenzdatum 09-11 April 2025

Konferenzort Copenhagen, DENMARK

Quellenangaben Seiten: 858-878

Verlag Ieee Computer Soc

Verlagsort 10662 Los Vaqueros Circle, Po Box 3014, Los Alamitos, Ca 90720-1264 Usa

Institut(e) Institute for Machine Learning in Biomed Imaging (IML)

Förderungen Medical Informatics Initiative as part of the PrivateAIM Project, and from the German Academic Exchange Service (DAAD) under the Kondrad Zuse School of Excellence for Reliable AI (RelAI)
Bavarian State Ministry for Science and the Arts under the Munich Centre for Machine Learning (MCML), from the German Ministry of Education and Research
German Federal Ministry of Education and Research
Bavarian Collaborative Research Project PRIPREKI of the Free State of Bavaria Funding Programme "Artificial Intelligence - Data Science"