Differential Privacy: The Interactive Guide

1. Intuition: Plausible Deniability

Before complex algorithms, we start with Randomized Response. Imagine we ask a sensitive question: "Have you ever cheated on a test?" To protect your privacy, we introduce a coin flip mechanism. This gives you plausible deniability—even if you say "Yes", we don't know if it's the truth or the coin.

The Protocol

Flip a coin (hidden from the analyst).
If Heads: Answer Truthfully.
If Tails: Flip again. Answer Yes if Heads, No if Tails (Random Answer).

Step 1: The Coin Flip

Click to flip

Simulated Users: 0

Analyst View

We recover the true statistics by removing the expected noise mathmatically.

Insight: Even though ~25% of people are forced to lie, with enough data, we can estimate the true percentage accurately, while no single user can be proven to have answered truthfully.

2. The Privacy Budget (ε)

Modern DP adds noise from a specific distribution (like Laplace). The amount of noise is controlled by Epsilon (ε).

High ε (e.g., 10)

Low Privacy, High Accuracy

Low ε (e.g., 0.1)

High Privacy, Low Accuracy

Adjust Epsilon (ε): 1.0

More Privacy (Noise) More Utility (Accuracy)

Scenario: Employee Salaries

We are querying the average salary of a department. To protect any single high-earner, we add Laplace Noise.

True Average: $75,000

DP Reported: $75,000

Error: 0%

The Laplace Mechanism

Noise is drawn from a Laplace distribution:
Lap(sensitivity / ε).
Lower ε = Wider curve = More possible noise.

Noise Distribution Visualization

The blue area is the probability distribution of the noise. The orange bar is a single random sample added to the data.

Key Algorithms

Different data types require different noise mechanisms.

1. Laplace Mechanism

Numerical Data

Used for counting queries (e.g., "How many people have diabetes?"). Adds noise from the Laplace distribution centered at 0. Simple and effective for numerical answers.

2. Exponential Mechanism

Categorical / Best Item

Used when picking the "best" item (e.g., "What is the most common disease?"). Instead of adding noise to a count, it makes the probability of selecting an item proportional to its score.

3. Local DP (Randomized Response)

Client-Side

Noise is added on the user's device before data is ever sent to the server. Used by Apple and Google for keyboard metrics.

DP vs. Others

Why Differential Privacy is the gold standard.

Method	Technique	Vulnerability
Anonymization	Removing names/IDs	Linkage Attacks (Netflix Prize)
K-Anonymity	Grouping k-users together	Homogeneity Attacks
Differential Privacy	Adding Mathematical Noise	Provably Secure

* Comparison shows that removing identifiers is insufficient. DP provides a mathematical guarantee regardless of auxiliary information an attacker might possess.

How to Apply (The Recipe)

Identify Sensitivity

Determine the "Sensitivity" of your query. How much can one person change the result? (e.g., for a count query, sensitivity is 1).

Set Budget (ε)

Choose your Epsilon based on privacy needs. Common values are between 0.1 (strict) and 10 (loose).

Add Noise

Sample from the mechanism (Laplace/Gaussian) and add it to the final aggregate result. Never release raw data.