The ability to rapidly and flexibly adapt decisions to available rewards is crucial for survival in dynamic environments. Reward-based decisions are guided by reward expectations that are updated based on prediction errors, and processing of these errors involves dopaminergic neuromodulation in the striatum. To test the hypothesis that the COMT gene Val(158)Met polymorphism leads to interindividual differences in reward-based learning, we used the neuromodulatory role of dopamine in signaling prediction errors. We show a behavioral advantage for the phylogenetically ancestral Val/Val genotype in an instrumental reversal learning task that requires rapid and flexible adaptation of decisions to changing reward contingencies in a dynamic environment. Implementing a reinforcement learning model with a dynamic learning rate to estimate prediction error and learning rate for each trial, we discovered that a higher and more flexible learning rate underlies the advantage of the Val/Val genotype. Model-based fMRI analysis revealed that greater and more differentiated striatal fMRI responses to prediction errors reflect this advantage on the neurobiological level. Learning rate-dependent changes in effective connectivity between the striatum and prefrontal cortex were greater in the Val/Val than Met/Met genotype, suggesting that the advantage results from a downstream effect of the prefrontal cortex that is presumably mediated by differences in dopamine metabolism. These results show a critical role of dopamine in processing the weight a particular prediction error has on the expectation updating for the next decision, thereby providing important insights into neurobiological mechanisms underlying the ability to rapidly and flexibly adapt decisions to changing reward contingencies.