Tag: Privacy-Preserving Machine Learning
All the articles with the tag "Privacy-Preserving Machine Learning".
-
How much do language models memorize?
本文提出了一种基于信息论的记忆量化方法,通过区分无意记忆和泛化,测量GPT风格语言模型的容量约为每个参数3.6比特,并揭示了数据集规模与模型容量比对双重下降和成员推断性能的影响。
-
CB-cPIR: Code-Based Computational Private Information Retrieval
CB-cPIR introduces a code-based single-server computational private information retrieval scheme that enhances security against subquery attacks by using high-weight secret vectors and dual queries, achieving lower communication and computational costs compared to lattice-based schemes like XPIR and SimplePIR.
-
Differentially Private Bilevel Optimization
This paper introduces the first differentially private first-order algorithms for bilevel optimization, ensuring privacy with theoretical convergence guarantees for hypergradient norms in both empirical and population settings while avoiding Hessian computations.
-
The Mosaic Memory of Large Language Models
This paper introduces the concept of 'mosaic memory' in Large Language Models, demonstrating through experiments on canaries and real-world datasets like SlimPajama that LLMs memorize training data via fuzzy duplicates with partial overlaps, predominantly syntactically, challenging existing deduplication practices and raising concerns for privacy, model utility, and benchmark fairness.
-
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
This paper introduces a taxonomy of language model memorization into recitation, reconstruction, and recollection, demonstrating through experiments with Pythia models that different factors influence each category, with a taxonomy-based predictive model outperforming baselines in predicting memorization likelihood.