Reinforcement Learning algorithms
Model based
Markov Decision Process (MDP)
Dynamic Programming (DP)
Bellman Expectation and Optimality Eqs
Policy, Value Function Iterations
Model free
Q-Learning
Temporal Difference (lambda)
Sarsa (lambda)
Off-policy learning
Monte Carlo
Value Function Approximation
Policy Gradient Methods
Exploration & Exploitation
ε-Greedy
Multi-armed bandits
Contexture bandits