Writing
← main
2025
A probabilistic perspective on memorization in LLMs
:
Coming soon
2024
The Files are in the Computer: Copyright, Memorization, and Generative-AI Systems (The Blog Post)
:
(with James Grimmelmann) Memorized training data are "copied" inside models (in the way that copyright law cares about "copies")
2023
Talkin’ ’Bout AI Generation: Copyright and the Generative-AI Supply Chain (The Blog Post)
:
(with Katherine Lee and James Grimmelmann) Outlining the very many relationships between complex generative-AI supply chains and U.S. copyright law
The Devil is in the Training Data
:
(with Katherine Lee and Daphne Ippolito) Generative-AI training datasets are really different from training datasets in more traditional machine learning