The provided computational neuroscience code is designed to simulate and analyze reinforcement learning (RL) mechanisms, specifically focusing on striatal dopamine dynamics and their role in flexible learning and decision-making. Here is a breakdown of the biological basis the code is attempting to model: ### Biological Basis 1. **Reinforcement Learning (RL):** - The code models two RL algorithms: Q-learning and SARSA. These algorithms are computational models for learning optimal actions in environments where rewards are received. Both are inspired by the way animals, including humans, learn from rewards and punishments. 2. **Cortico-Basal Ganglia Circuits:** - The code likely targets the neural circuits involving the cortex and basal ganglia, which are critical for action selection and decision-making tasks. The basal ganglia, particularly the striatum, is known for its role in reward processing and action selection. 3. **Dopamine Ramping:** - Dopamine is a neuromodulator believed to signal reward prediction errors (a core concept in RL), enabling the organism to learn which actions lead to positive outcomes. The term "dopamine ramping" refers to a gradual increase in dopamine levels as a predicted reward approaches, which is hypothesized to help in delaying gratification and predicting future rewards. 4. **TD Errors (Temporal Difference Errors):** - The TD errors computed in the model (and analyzed in the code) relate closely to dopamine neuron activity, which correlates with discrepancies between expected and received rewards. This is a critical signal for learning in the cortico-basal ganglia circuits. 5. **Free-Choice vs. Forced-Choice Paradigms:** - The code distinguishes between free-choice and forced-choice scenarios, which correspond to different experimental paradigms. This allows for exploration of how flexible decision strategies are employed by organisms under varying circumstances. 6. **Reward Contingencies:** - The `rew_S8` and `rew_S9` variables represent different reward conditions in the simulation. This setup reflects experiments where variable reinforcement conditions are essential for studying learning dynamics in animal models. 7. **Forgetting and Value Decay:** - The RL model includes parameters for decay, which might represent biological processes of forgetting or diminishing utility of previously learned associations. These mechanisms are crucial for adaptive learning in dynamic environments. 8. **Simulated Trials and Decision Outcomes:** - Simulated trials and analysis of decisions (`Choices`) mimic laboratory experiments where animals undergo repeated trials to study learning patterns. Decisions influenced by RL algorithms are compared to real-life behavior. ### Connection to Experimental Studies - The code refers to parameters from a study by Morita and Kato (2014), hinting at biological experiments involving reinforcement paradigms. This study likely investigates dopamine's role in adaptive learning, particularly how these signals enable complex behaviors like learning with forgetting and flexibility in decisional strategies. In summary, the code models the biological processes involved in reinforcement learning, emphasizing dopamine's role in reward prediction and learning adaptability within the cortico-basal ganglia circuits. It simulates experimental paradigms that help elucidate how these neural systems might implement RL principles observed in behavior.