Contextual Privacy in LLMs: Benchmarking and Mitigating Inference-Time Risks

Digital Life Seminar

Niloofar Mireshghallah

Meta AI’s FAIR Alignment Group

When

October 16, 2025 at 5:25:00 PM

Where

Bloomberg 161/165

Contextual Privacy in LLMs: Benchmarking and Mitigating Inference-Time Risks

Abstract

As large language models integrate into daily workflows—from personal assistants to workplace tools—they handle sensitive information from multiple sources yet struggle to reason about what to share, with whom, and when. In this talk, we explore critical gaps in LLMs' privacy reasoning through complementary benchmarks. First, ConfAIde reveals that even advanced models like GPT-4 inappropriately disclose private information in contexts where humans would maintain boundaries. Second, we extend this analysis to persistent memories—an increasingly adopted personalization feature—showing failures in handling compositional secrets with multiple attributes and contextual cues. We then present a data minimization framework that formally defines the least privacy-revealing disclosure that maintains task utility. Our experiments show frontier models can tolerate up to 85% data redaction without losing functionality, yet they lack awareness of what information they actually need—leading to systematic oversharing. We conclude with techniques for restoring performance when privacy measures are applied, offering a path toward AI systems that respect contextual privacy norms while remaining useful.

About

Niloofar Mireshghallah is a Research Scientist at Meta AI’s FAIR Alignment group in San Francisco. Beginning Fall 2026, she will join Carnegie Mellon University’s Engineering & Public Policy (EPP) Department and Language Technologies Institute (LTI) as an Assistant Professor.

Niloofar's research interests are privacy, natural language processing, and the societal implications of ML. She explores the interplay between data, its influence on models, and the expectations of the people who regulate and use these models. Her work has been recognized by the NCWIT Collegiate Award and the Rising Star in Adversarial ML Award. Previously, Niloofar was a postdoctoral scholar at University of Washington, advised by Yejin Choi and Yulia Tsvetkov. I received my PhD from UC San Diego, advised by Taylor Berg-Kirkpatrick, and during that time I was also a part-time researcher / intern at Microsoft Research—working with the Privacy in AI, Algorithms, and Semantic Machines teams on differential privacy, model compression, and data synthesis.

Digital Life Seminar

Niloofar Mireshghallah

​