Real-time online temporal-difference learning · live, in your browser

predicting your next click

Click anywhere in the grid, in any order; then settle into a rhythm & repeat it. A small horde of SwiftTD predictors, one per cell, learns whatever routine emerges from a continuous online stream of clicks. The RL agent tries to predict where you'll click next.

Nothing is hand-coded. No replay buffers. Online. Real-time.

● click anywhere · find a rhythm and repeat it

Anticipation

0.00 v(next)

predicted intent of your next click

Value trace prediction reward

Next-click accuracy —

Clicks0

Streak0

Step-size0.000

Settings +

Meta-step θ 1e-3 Trace λ 0.90

What you're watching +

Each tick (~30 Hz) the demo builds one sparse, binary feature vector from your cursor — tile-coded position plus a one-hot for the target you last clicked — and feeds it to every predictor at once.

Each cell owns a General Value Function: a SwiftTD learner whose reward is 1 only on the tick you click it. So its value answers “how soon will I be clicked?” Every cell's value tinted together is the intent field; the brightest cell (excluding the one you just clicked) is the pre-lit next-click guess.

SwiftTD adapts a per-feature step size online and bounds each update so it learns fast without diverging — which is why a few repeats of your rhythm are enough for it to light up your next click before you make it.