1. Intuition: Plausible Deniability
Before complex algorithms, we start with Randomized Response. Imagine we ask a sensitive question: "Have you ever cheated on a test?" To protect your privacy, we introduce a coin flip mechanism. This gives you plausible deniability—even if you say "Yes", we don't know if it's the truth or the coin.
The Protocol
- Flip a coin (hidden from the analyst).
- If Heads: Answer Truthfully.
- If Tails: Flip again. Answer Yes if Heads, No if Tails (Random Answer).
Step 1: The Coin Flip
Click to flip
Instruction...
Analyst View
We recover the true statistics by removing the expected noise mathmatically.
2. The Privacy Budget (ε)
Modern DP adds noise from a specific distribution (like Laplace). The amount of noise is controlled by Epsilon (ε).
Scenario: Employee Salaries
We are querying the average salary of a department. To protect any single high-earner, we add Laplace Noise.
The Laplace Mechanism
Noise is drawn from a Laplace distribution:
Lap(sensitivity / ε).
Lower ε = Wider curve = More possible noise.
Noise Distribution Visualization
The blue area is the probability distribution of the noise. The orange bar is a single random sample added to the data.
Key Algorithms
Different data types require different noise mechanisms.
1. Laplace Mechanism
Used for counting queries (e.g., "How many people have diabetes?"). Adds noise from the Laplace distribution centered at 0. Simple and effective for numerical answers.
2. Exponential Mechanism
Used when picking the "best" item (e.g., "What is the most common disease?"). Instead of adding noise to a count, it makes the probability of selecting an item proportional to its score.
3. Local DP (Randomized Response)
Noise is added on the user's device before data is ever sent to the server. Used by Apple and Google for keyboard metrics.
DP vs. Others
Why Differential Privacy is the gold standard.
| Method | Technique | Vulnerability |
|---|---|---|
| Anonymization | Removing names/IDs | Linkage Attacks (Netflix Prize) |
| K-Anonymity | Grouping k-users together | Homogeneity Attacks |
| Differential Privacy | Adding Mathematical Noise | Provably Secure |
How to Apply (The Recipe)
Identify Sensitivity
Determine the "Sensitivity" of your query. How much can one person change the result? (e.g., for a count query, sensitivity is 1).
Set Budget (ε)
Choose your Epsilon based on privacy needs. Common values are between 0.1 (strict) and 10 (loose).
Add Noise
Sample from the mechanism (Laplace/Gaussian) and add it to the final aggregate result. Never release raw data.