Tag: Privacy-Preserving Machine Learning

All the articles with the tag "Privacy-Preserving Machine Learning".

How much do language models memorize?

Published: 3 Jun, 2025 at 11:44 AM

87.61 🤔

本文提出了一种基于信息论的记忆量化方法，通过区分无意记忆和泛化，测量GPT风格语言模型的容量约为每个参数3.6比特，并揭示了数据集规模与模型容量比对双重下降和成员推断性能的影响。
CB-cPIR: Code-Based Computational Private Information Retrieval

Published: 8 May, 2025 at 11:08 AM

93.98 🤔

CB-cPIR introduces a code-based single-server computational private information retrieval scheme that enhances security against subquery attacks by using high-weight secret vectors and dual queries, achieving lower communication and computational costs compared to lattice-based schemes like XPIR and SimplePIR.
Differentially Private Bilevel Optimization

Published: 14 May, 2025 at 11:12 AM

92.72 🤔

This paper introduces the first differentially private first-order algorithms for bilevel optimization, ensuring privacy with theoretical convergence guarantees for hypergradient norms in both empirical and population settings while avoiding Hessian computations.
The Mosaic Memory of Large Language Models

Published: 17 May, 2025 at 11:08 AM

89.04 🤔

This paper introduces the concept of 'mosaic memory' in Large Language Models, demonstrating through experiments on canaries and real-world datasets like SlimPajama that LLMs memorize training data via fuzzy duplicates with partial overlaps, predominantly syntactically, challenging existing deduplication practices and raising concerns for privacy, model utility, and benchmark fairness.
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

Published: 10 May, 2025 at 10:59 AM

85.36 🤔

This paper introduces a taxonomy of language model memorization into recitation, reconstruction, and recollection, demonstrating through experiments with Pythia models that different factors influence each category, with a taxonomy-based predictive model outperforming baselines in predicting memorization likelihood.

Tag: Privacy-Preserving Machine Learning

How much do language models memorize?

CB-cPIR: Code-Based Computational Private Information Retrieval

Differentially Private Bilevel Optimization

The Mosaic Memory of Large Language Models

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon