Open DQN: Deep Q-Networks
Deep Q-Networks: Foundations
From Q-tables to neural networks: scaling reinforcement learning to complex environments.
Why Deep Q-Networks?
Remember Q-learning? We learned optimal action-values $Q^*(s, a)$ by storing them in a table. But what happens when your state space is continuous, or astronomically large?
The CartPole Problem:
- 4 continuous observations (cart position, velocity, pole angle, angular velocity)
- Infinite possible states
- A Q-table would need infinite entries
The Solution: Replace the table with a neural network that learns the Q-function:
$$s \to (Q^(s, a_1), Q^(s, a_2), \ldots, Q^*(s, a_n))$$
The network takes a state as input and outputs Q-values for all possible actions.
The Bellman Target Problem
In tabular Q-learning, we updated:
$$Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right]$$
With a neural network, we want to minimize the temporal difference (TD) error:
$$L(\theta) = \mathbb{E} \left[ \le