Reinforcement Learning algorithms

Model based

Markov Decision Process (MDP)

Dynamic Programming (DP)

Bellman Expectation and Optimality Eqs 

Policy, Value Function Iterations

Model free

Q-Learning

Temporal Difference (lambda)

Sarsa (lambda)

Off-policy learning

Monte Carlo

Value Function Approximation

Policy Gradient Methods

Exploration & Exploitation

ε-Greedy

Multi-armed bandits

Contexture bandits