Human reward-guided learning is often modeled using simple RL algorithms that summarize past experiences into key variables like Q-values, representing expected rewards. However, recent findings suggest that these models oversimplify the complexity of human memory and decision-making. For instance, individual events and global reward statistics can significantly influence behavior, indicating that memory involves more than just summary statistics. ANNs, particularly RNNs, offer a more complex model by capturing long-term dependencies and intricate learning mechanisms, though they often need to be more interpretable than traditional RL models.
Researchers from institutions including Google DeepMind, University of Oxford, Princeton University, and University College London studied human reward-learning behavior using a hybrid approach combining RL models with ANNs. Their findings suggest that human behavior needs to be adequately explained by algorithms that incrementally update choice variables. Instead, human reward learning relies on a flexible memory system that forms complex representations of past events over multiple timescales. By iteratively replacing components of a classic RL model with ANNs, they uncovered insights into how experiences shape memory and guide decision-making.
A dataset was gathered from a reward-learning task involving 880 participants. In this task, participants repeatedly chose between four actions, each rewarded based on noisy, drifting reward magnitudes. After filtering, the study included 862 participants and 617,871 valid trials. Most participants learned the task by consistently choosing actions with higher rewards. This extensive dataset enabled significant behavioral variance extraction using RNNs and hybrid models, outperforming basic RL models in capturing human decision-making patterns.
The data was initially modeled using a traditional RL model (Best RL) and a flexible Vanilla RNN. Best RL, identified as the most effective among incremental-update models, employed a reward module to update Q-values and an action module for action perseverance. However, its simplicity limited its expressivity. The Vanilla RNN, which processes actions, rewards, and latent states together, predicted choices more accurately (68.3% vs. 58.9%). Further hybrid models like RL-ANN and Context-ANN, while improving upon Best RL, still fell short of Vanilla RNN. Memory-ANN, incorporating recurrent memory representations, matched Vanilla RNN’s performance, suggesting that detailed memory use was key to participants’ learning in the task.
The study reveals that traditional RL models, which rely solely on incrementally updated decision variables, need to catch up in predicting human choices compared to a novel model incorporating memory-sensitive decision-making. This new model distinguishes between decision variables that drive choices and memory variables that modulate how these decision variables are updated based on past rewards. Unlike RL models, where decision and learning variables are intertwined, this approach separates them, providing a clearer understanding of how learning influences choices. The model suggests that human knowledge is influenced by compressed memories of task history, reflecting both short- and long-term reward and action histories, which modulate learning independently of how they are implemented.
Memory-ANN, the proposed modular cognitive architecture, separates reward-based learning from action-based learning, supported by evidence from computational models and neuroscience. The architecture comprises a “surface” level of decision rules that process observable data and a “deep” level that handles complex, context-rich representations. This dual-layer system allows for flexible, context-driven decision-making, suggesting that human reward learning involves simple surface-level processes and deeper memory-based mechanisms. These findings agree that complex models with rich representations must capture the full spectrum of human behavior, particularly in learning tasks. The insights gained here could have broader applications, extending to various learning tasks and cognitive science.
Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 48k+ ML SubReddit
Find Upcoming AI Webinars here
Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models
The post Unraveling Human Reward Learning: A Hybrid Approach Combining Reinforcement Learning with Advanced Memory Architectures appeared first on MarkTechPost.