A TRMNL for VPS alerts, polling or webhook

After the ntfy pushes, the Telegram alerts fired by my homemade HIDS with AIDE, and a command bot that replies over chat, the alerts on my VPS work reasonably well. If something happens, it reaches my phone with the right priority. The problem, if you can even call it a problem, is that everything lives in my pocket. To find out about anything I have to unlock my phone, and to check the overall status I end up opening the app and scrolling through old notifications until I find the latest one. It's a pattern of "I go looking for the information" disguised as "the information comes to me".

I've been looking at the TRMNL for a few weeks. It's a 7.5" e-ink screen with WiFi, a default refresh every fifteen minutes, and a private plugin system that pulls from your own URLs and renders with Liquid templates. The idea of having a quiet little card next to my monitor, with no notifications to ignore and no app to open, fits how I'd like to consume infra status. If the panel looks green at a glance, I don't touch anything. If I see something red, then I'll go take a look.

The ad hoc plugin I'd need to write is trivial at its core. A grid of status lights, a couple of numbers (CrowdSec alerts in 24 hours, disk space, down containers), and a short list of the latest active alerts. The non-trivial part, and why this post exists, is deciding how that information gets to the TRMNL. There are two clean paths, both compatible with how my network is set up today, and they mainly differ in what surface I expose and how much control I have over what shows up on screen.

Path A, polling with a JSON endpoint

The first path is the pattern closest to what TRMNL calls a Private Plugin. The screen asks one of my URLs for JSON every N minutes, and a Liquid template renders it. I set up an endpoint on the VPS (or a Cloudflare Worker in front of it, so I don't expose anything from the origin) that aggregates the status of the check-* scripts I already have running. CrowdSec, AIDE, disk space, Cloudflare token, pending reboot, down containers. Everything that's already feeding the Telegram bot, read from the same place, returned in a payload of a hundred or two hundred bytes.

The advantage of polling is that control over what's on screen lives in the endpoint, not on the device. If I want to change the rules for "what counts as red", I edit the script on the server and the next refresh applies it. The TRMNL only knows how to render what it receives. The Liquid template doesn't need heavy logic, just a handful of conditionals to color the status lights based on the status field of each block.

The cost is that I have to open an endpoint to the Internet, or at least to TRMNL's IPs. Right now my monitoring uses an outbound channel, pushes to ntfy and Telegram, it doesn't receive anything from outside. Adding a new endpoint means one more piece of surface area, even if it's small. The clean way to close off that surface, and conveniently the one I already use for private Dokploy panels, is to put it behind Cloudflare Access with a Service Token. The TRMNL includes the token in every request and CF Access decides whether it gets through before the request touches the origin. Authentication stops being my code and becomes Cloudflare configuration.

The other cost of polling is that it comes with built-in latency. If I set the refresh to every ten minutes, in the worst case an alert takes ten minutes to show up on the screen. For the kind of things I want to see there, aggregate status and not live incidents, I don't care. Serious incidents still go to ntfy and Telegram with priority five. But it's worth being clear about that before committing to this path.

Path B, webhook from my current timers

The second path flips the flow around. Instead of TRMNL asking, my check-*.timer units notify TRMNL when something changes. TRMNL supports its own endpoint that you can POST a payload to, and the screen redraws with that information on the next refresh. I don't expose anything. The requests are still outbound from the VPS, just like the ones it already makes to ntfy or the Telegram API.

The appeal of this model is that it matches the rest of my monitoring. Each timer that's already feeding a Telegram alert would also feed the TRMNL panel in the same run. There's no new endpoint, no CF Access to set up, no extra surface to watch. And the latency goes away, because the panel updates at the exact moment the change is detected, not on the next polling tick.

The problem is that I lose aggregation. Each webhook I send replaces whatever was on the screen, so if I want a full panel (CrowdSec, AIDE, disk, Cloudflare token, reboot) I have to orchestrate a payload that combines all the states. That means each timer needs to know the state of the others before sending the webhook, or I set up an "aggregator" script that reads the other scripts and builds the final JSON. In practice it's almost the same as the endpoint in path A, except instead of serving it on demand I push it every time something changes.

I also lose the fallback that polling gives you. If the TRMNL misses a webhook update (WiFi down, device asleep outside the interval) the screen stays on the last state it received until the next push. With polling, the next refresh brings it up to date on its own. You can mitigate that by sending a heartbeat webhook every hour, but then you're basically reinventing polling with more steps.

Which way I'm leaning

Path A makes more sense to me. The surface it adds is very small if it sits behind Cloudflare Access, a Service Token and one rule, the same setup I already have for Dokploy or Umami. And in return I get two things I care about. One is that the logic for "what shows up on the panel" lives in one place, not spread across five timers. The other is that the endpoint I write for TRMNL will be useful tomorrow for something else. An internal status page, a widget on the Mac screen, a Telegram bot command that returns the same summary. The webhook closes that door because it ties the format to what TRMNL expects.

There is one case where path B is still the cleanest option, and that's when I only care about one signal and I want to see it in real time. An "incident mode" screen that only turns on when there's an active incident and turns off when it's resolved. For that, the webhook from the timer that detects the incident is the most direct option. But that's not the use case I'm after, mine is the "everything green" panel that confirms the infra is healthy without me having to go look for it.

What signals would go on the screen

Before I decide anything for good, I want to narrow down what goes in. The rule I'm following is the same one I used with the ntfy digests, I only count things that change state and not glossary metrics. A CPU load graph on e-ink doesn't add much. A status light that says "CrowdSec, twelve alerts in the last 24 hours, within threshold" does.

The first cut would be the same five events my timers already feed. Cloudflare token alive, AIDE with no unexpected changes, disk below 85%, no reboot pending, all containers up. Five rows in black with a green, red or gray dot next to each one. Under that, a small block with the latest active alert if there is one, and the time of the last refresh so I know whether the screen got stuck.

What won't go in, at least in the first version, are metrics that still aren't alerting me about anything. Blog response latency, number of searches in the FTS, traffic by subdomain. That information exists, but putting it into an alerts panel turns the panel into a dashboard, and I already ruled that out back in the day for the same reason I ruled out Grafana. If I'm not going to look at it, it doesn't deserve pixels.

When I build it I'll write a second post with the endpoint, the Liquid template, and whatever details ended up being more awkward than expected. If anyone already has a TRMNL up and running and has gone down either of these paths, what I'd most like to know is what refresh cadence you ended up with, because that's where e-ink usually starts arguing with reality.

Path A, polling with a JSON endpoint

Path B, webhook from my current timers

Which way I'm leaning

What signals would go on the screen

A TRMNL for VPS alerts, polling or webhook

Path A, polling with a JSON endpoint

Path B, webhook from my current timers

Which way I'm leaning

What signals would go on the screen

Leave the first comment

Docker desde cero para gente de QA que no toca infraestructura

OpenClaw para testing y QA: automatiza lo que antes hacías a mano

Cómo montamos la infraestructura con Dokploy (y por qué dejamos Vercel)

A TRMNL for VPS alerts, polling or webhook

Path A, polling with a JSON endpoint

Path B, webhook from my current timers

Which way I'm leaning

What signals would go on the screen

Leave the first comment

Docker desde cero para gente de QA que no toca infraestructura

OpenClaw para testing y QA: automatiza lo que antes hacías a mano

Cómo montamos la infraestructura con Dokploy (y por qué dejamos Vercel)