Writing

Informal notes and explainers on machine learning, memorization, and AI & law.

2025

A brief(ish) note on (probabilistic) extraction, memorization and language models Coming soon

A self-contained tour of what memorization and extraction actually mean for language models, and why thinking in terms of extraction probabilities sharpens the picture.

2024

The Files are in the Computer: Copyright, Memorization, and Generative-AI Systems (The Blog Post)

With James Grimmelmann. Memorized training data are “copied” inside models — in the way that copyright law cares about “copies.”

2023

Talkin’ ’Bout AI Generation: Copyright and the Generative-AI Supply Chain (The Blog Post)

With Katherine Lee and James Grimmelmann. Mapping the many relationships between complex generative-AI supply chains and U.S. copyright law.

The Devil is in the Training Data

With Katherine Lee and Daphne Ippolito. Generative-AI training datasets are really different from datasets in more traditional machine learning.