RL LUNAR LANDER
Watch Actor-Critic AI learn to land. DQN → A2C → PPO.
ALGO DQN
EPISODE 0
LAND % 0
ε 1.00
LEVEL EASY
CRITIC: V(s)
0.0
BAD
GOOD
ACTOR: π(a|s)
NOOP
MAIN
LEFT
RIGHT
ROT L
ROT R
Deep Q-Network (DQN)
The same algorithm that learned Pong. A neural network (8→64→48→6) estimates Q-values for each action. Uses Double DQN with a replay buffer and target network for stability. 2,198 parameters. The replay buffer makes DQN the most consistent learner at this scale.