booting predictor horde…
About me →

Real-time online temporal-difference learning · live, in your browser

predicting your next click

Click anywhere in the grid, in any order; then settle into a rhythm & repeat it. A small horde of SwiftTDSwiftTD is a fast, robust algorithm for temporal-difference learning (Javed, Sharifnassab & Sutton, 2024). It learns predictions online from a single stream of data — no replay buffer — by adapting a per-feature step size and bounding each update, so it learns quickly without diverging.

[1] Javed, K., Sharifnassab, A., & Sutton, R. S. (2024). Swifttd: A fast and robust algorithm for temporal difference learning. In Reinforcement Learning Conference.
predictors, one per cell, learns whatever routine emerges from a continuous online stream of clicks. The RL agent tries to predict where you'll click next.

Nothing is hand-coded. No replay buffers. Online. Real-time.

click anywhere · find a rhythm and repeat it

What you're watching

Each tick (~30 Hz) the demo builds one sparse, binary feature vector from your cursor — tile-coded position plus a one-hot for the target you last clicked — and feeds it to every predictor at once.

Each cell owns a General Value Function: a SwiftTD learner whose reward is 1 only on the tick you click it. So its value answers “how soon will I be clicked?” Every cell's value tinted together is the intent field; the brightest cell (excluding the one you just clicked) is the pre-lit next-click guess.

SwiftTD adapts a per-feature step size online and bounds each update so it learns fast without diverging — which is why a few repeats of your rhythm are enough for it to light up your next click before you make it.