< BLOG.FEED />Writing
// dispatches from the intersection of math, ML, and madness
notes, reflections, and the occasional deep dive
The Multi-GPU Training Gauntlet: Threads, Processes, and the GIL
Four attempts to parallelize PyTorch training across 8 GPUs — from threading deadlocks to GIL starvation to torch.compile explosions. A war story about why OS process isolation beats every clever concurrency trick.
Landing the Actor-Critic: DQN, A2C & PPO in the Browser
How we built a Lunar Lander with DQN, A2C, and PPO in pure JavaScript — ~4,200 parameters, zero GPUs, and a live actor-critic visualization that makes the GAN-RL connection click.
Teaching Pong to Play Itself: 3 RL Algorithms in the Browser
How we trained Q-Learning, DQN, and REINFORCE to play Pong in pure JavaScript -- 579 parameters, zero GPUs, and the reward shaping tricks the textbooks leave out. A build log of what worked, what didn't, and what we changed.
When GANs Meet RL: The Adversarial Game Behind Generative AI
GANs and RL are secretly the same game. A mathematical deep dive into how generators are actors, discriminators are critics, and RLHF completes the picture -- through game theory, f-divergences, and the Boltzmann connection.
Score Matching Explained: From Theory to Diffusion Models
A deep dive into score matching and DDPM -- the two intertwined threads behind modern diffusion models. From score functions to reverse-time SDEs to classifier-free guidance, with all the math you need.