Archive
Writing
Notes on agentic RL, video AI, and the path from research to production.
A reflection on Hesse's Siddhartha. Free will, awareness, and why Nomadland felt like coming home.
Most RL research treats the agent as a policy optimizing a fixed reward. Agentic RL is different — the agent reasons, plans, uses tools, retries.
Video generation models need evaluation beyond FID/FVD. Reward design, credit assignment, training instability, and the data flywheel cold start.
The gap between “generates text well” and “evaluates video well” is enormous. From CLIP scores to learned reward models calibrated on expert data.
Comparative analysis of how users engage with ChatGPT and Claude, based on the latest AI research.
How I chose DS/ML over SWE, interview prep, and reflections on working at Microsoft.