Cron job monitoring: How to know when your scheduled tasks fail
Last updated: March 03, 2026
Cron jobs have a nasty habit of failing silently.
Your backup script stops running. Your daily report never gets sent. Your database cleanup job crashes halfway through. And you don't find out until someone asks "hey, why haven't we had a backup in three weeks?"
The problem is that cron doesn't care if your job succeeds or fails. It just runs the command and moves on. No alerts, no notifications, nothing.
Let's fix that.
Table of contents
- Why cron jobs fail silently
- How cron job monitoring works
- Setting up cron job monitoring
- Common issues with cron jobs
- What to monitor
Why cron jobs fail silently
Cron was designed in the 1970s. Back then, the assumption was that a sysadmin would check the server regularly and notice if something was wrong.
That's not how most of us work today.
When a cron job fails, a few things might happen:
- Nothing - The job crashes, cron shrugs, and nobody knows
- An email gets sent - Cron can email output to root, but who checks that?
- A log entry appears - Somewhere, buried in
/var/log/syslog
None of these are great for catching problems quickly.
The real killer is when a job doesn't run at all. Maybe the server rebooted and cron didn't start. Maybe someone accidentally deleted the crontab. Maybe the disk filled up and cron couldn't write its lockfile.
In these cases, there's nothing to log. The job just... doesn't happen.
How cron job monitoring works
The solution is a "dead man's switch" (also called heartbeat monitoring).
The idea is simple:
- At the end of your cron job, you ping a URL
- A monitoring service tracks these pings
- If a ping doesn't arrive when expected, you get an alert
It's called a dead man's switch because the alert triggers on absence of activity, not presence. If your job stops running for any reason - crash, server down, crontab deleted - you'll know.
Here's what it looks like in practice:
# Before: Your cron job
0 2 * * * /home/user/backup.sh
# After: With monitoring
0 2 * * * /home/user/backup.sh && curl -fsS --retry 3 https://oonchk.com/abc123
The && is important - the curl only runs if backup.sh exits successfully. If your script fails, the ping doesn't get sent, and you get an alert.
Setting up cron job monitoring
Here's how to set it up with OnlineOrNot:
1. Create a heartbeat monitor
Give it a name (like "nightly-backup") and set the expected schedule. If your job runs daily at 2am, tell the monitor to expect a ping every 24 hours.
2. Add a grace period
Jobs don't always run at exactly the same time. A backup might take 5 minutes one day and 20 minutes the next. Set a grace period that accounts for normal variation.
For a daily job, 30-60 minutes of grace is usually fine. For an hourly job, maybe 10 minutes.
3. Add the ping to your cron job
You'll get a unique URL. Add it to the end of your cron command:
0 2 * * * /home/user/backup.sh && curl -fsS --retry 3 https://oonchk.com/your-unique-id
The flags:
-f- Fail silently on HTTP errors-s- Silent mode (no progress output)-S- Show errors if they occur--retry 3- Retry a few times if the network hiccups
4. Configure your alerts
Choose where you want to be notified: Slack, email, PagerDuty, SMS, etc.
For critical jobs (like backups), I'd recommend at least two channels. Slack for awareness, phone/SMS for wake-you-up urgency.
That's it. If your job stops running, you'll know within minutes instead of weeks.
Common issues with cron jobs
Over the years, I've seen the same problems come up again and again:
The job runs but fails
Your script starts, hits an error, and exits with a non-zero status. If you're using && before your ping (as above), this is caught automatically.
If you're not using &&, your monitoring will think everything's fine when it isn't.
The job takes too long
A job that usually takes 10 minutes suddenly takes 2 hours. This might be fine, or it might mean something's wrong.
Good monitoring tools let you track job duration, not just completion. If a job starts taking significantly longer than usual, that's worth investigating.
The job overlaps with itself
Your hourly job takes 90 minutes to run. Now you've got two instances running at once, possibly fighting over the same resources.
This isn't strictly a monitoring problem, but it's something to watch for. Use a lockfile or flock to prevent overlapping runs:
0 * * * * flock -n /tmp/myjob.lock /home/user/myjob.sh && curl ...
Environment differences
This one's classic: the job works when you run it manually, but fails under cron.
Cron runs with a minimal environment. Your $PATH is different, your shell config isn't loaded, and environment variables you take for granted aren't set.
Always use full paths in cron jobs, and set any required environment variables explicitly.
The server rebooted
Cron usually starts automatically on boot, but not always. And even if cron is running, your job might depend on other services that aren't ready yet.
This is where heartbeat monitoring really shines - if the job doesn't run after a reboot, you'll know.
What to monitor
Not every cron job needs monitoring. Here's how I think about it:
Always monitor:
- Backups (you really don't want to find out these stopped working when you need them)
- Jobs that affect money (billing, invoices, payments)
- Data pipelines and syncs
- Security-related jobs (certificate renewal, log rotation)
Probably monitor:
- Reports and notifications
- Cleanup jobs
- Health checks
Maybe skip:
- Low-importance jobs you'd barely notice if they stopped
- Jobs that have other visible effects when they fail
The rule of thumb: if you'd want to know within an hour that this job stopped running, monitor it.
Silent failures are the worst kind of failures. By the time you notice something's wrong, the damage is already done.
Adding a simple ping to your cron jobs takes about 30 seconds per job. It's one of those small investments that pays off enormously when things go wrong.
And things always go wrong eventually.
