RL LUNAR LANDER

Watch Actor-Critic AI learn to land. DQN → A2C → PPO.

ALGO DQN EPISODE 0 LAND % 0 ε 1.00 LEVEL EASY

INITIALIZING NEURAL NETWORKS...

[ 0% ]

CRITIC: V(s)

0.0
BAD GOOD

ACTOR: π(a|s)

NOOP
MAIN
LEFT
RIGHT
ROT L
ROT R

Deep Q-Network (DQN)

The same algorithm that learned Pong. A neural network (8→64→48→6) estimates Q-values for each action. Uses Double DQN with a replay buffer and target network for stability. 2,198 parameters. The replay buffer makes DQN the most consistent learner at this scale.