Featured image of post Challenges in Scalable Training Data Attribution

Challenges in Scalable Training Data Attribution

Roger Grosse


Time: 4:00-5:00 PM; 04/23/2024
Location: Allen 101X


Roger Grosse
Associate Professor,
Computer Science Department,
University of Toronto


How can we trace surprising behaviors of machine learning models back to their training data? Influence functions and related methods aim to predict how the trained model would change if a specific training example were added or removed. Two issues have blocked their applicability to large neural nets: the difficulty of computing with neural net Hessians, and the inability of influence functions to capture implicit bias of optimizers. To address both questions, we reformulate training data attribution in terms of differentiating through the training procedure and present a scalable algorithm for approximating this higher-order derivative. This opens up the possibility of training data attribution in multi-stage training settings such as continual learning or foundation models.



Roger Grosse is an Associate Professor of Computer Science at the University of Toronto, and a founding member of the Vector Institute for Artificial Intelligence. His research focuses on using our understanding of deep learning to improve the safety and alignment of AI systems. He has held the Sloan Research Fellowship, CIFAR Canada AI Chair, and Canada Research Chair. Since 2022, he has also been a Member of Technical Staff on the Alignment Team at Anthropic.


Logo designed by Seohyun Jeon
Theme Stack designed by Jimmy