Logistics
Time: 3:00-4:00 PM; 05/29/2025
Location: 380-380Y, Sloan Math Corner
Presenter
Brad Knox
Associate Professor,
Computer Science Department,
University of Texas at Austin
Abstract
RLHF algorithms assume a preference probability function, a mapping from a human's desires to preference labels, given a pair of options. This mapping is effectively a psychological model of how humans form preferences and is used in RLHF to invert a preference dataset to infer the hidden desires of humans. In the context of general decision-making (e.g., robotics) and LLMs: I will discuss the accuracy of various psychological models of preference, the impact of the RLHF assuming different psychological models than those used to generate the preference labels, and how designers can help people conform more to a chosen psychological model. Along the way, I will consider what type of reward function can be drawn from different preference probability functions, impacting when learning superhuman performance is possible.
Reference
Bio
Brad is a Research Associate Professor of Computer Science at the University of Texas at Austin. His research has largely focused on the human side of reinforcement learning. He is currently concerned with how humans can specify reward functions that are aligned with their interests. Brad’s dissertation, “Learning from Human-Generated Reward”, comprised early pioneering work on human-in-the-loop reinforcement learning and won the 2012 best dissertation award for the UT Austin Department of Computer Science. His postdoctoral research at the MIT Media Lab focused on creating interactive characters through machine learning on puppetry-style demonstrations of interaction. Stepping away from research during 2015–2018, Brad founded and sold his startup Bots Alive, working in the toy robotics sector. In recent years, Brad co-led the Bosch Learning Agents Lab at UT Austin and was a Senior Research Scientist at Google. He has won multiple best paper awards and was named to IEEE Intelligent System’s AI’s 10 to Watch in 2013.
Recording