Cron job monitoring: How to know when your scheduled tasks fail

Last updated: March 03, 2026

Cron jobs have a nasty habit of failing silently.

Your backup script stops running. Your daily report never gets sent. Your database cleanup job crashes halfway through. And you don't find out until someone asks "hey, why haven't we had a backup in three weeks?"

The problem is that cron doesn't care if your job succeeds or fails. It just runs the command and moves on. No alerts, no notifications, nothing.

Let's fix that.

Table of contents

Why cron jobs fail silently
How cron job monitoring works
Setting up cron job monitoring
Common issues with cron jobs
What to monitor

Why cron jobs fail silently

Cron was designed in the 1970s. Back then, the assumption was that a sysadmin would check the server regularly and notice if something was wrong.

That's not how most of us work today.

When a cron job fails, a few things might happen:

Nothing - The job crashes, cron shrugs, and nobody knows
An email gets sent - Cron can email output to root, but who checks that?
A log entry appears - Somewhere, buried in /var/log/syslog

None of these are great for catching problems quickly.

The real killer is when a job doesn't run at all. Maybe the server rebooted and cron didn't start. Maybe someone accidentally deleted the crontab. Maybe the disk filled up and cron couldn't write its lockfile.

In these cases, there's nothing to log. The job just... doesn't happen.

How cron job monitoring works

The solution is a "dead man's switch" (also called heartbeat monitoring).

The idea is simple:

At the end of your cron job, you ping a URL
A monitoring service tracks these pings
If a ping doesn't arrive when expected, you get an alert

It's called a dead man's switch because the alert triggers on absence of activity, not presence. If your job stops running for any reason - crash, server down, crontab deleted - you'll know.

Here's what it looks like in practice:

# Before: Your cron job
0 2 * * * /home/user/backup.sh

# After: With monitoring
0 2 * * * /home/user/backup.sh && curl -fsS --retry 3 https://oonchk.com/abc123

The && is important - the curl only runs if backup.sh exits successfully. If your script fails, the ping doesn't get sent, and you get an alert.

Setting up cron job monitoring

Here's how to set it up with OnlineOrNot:

1. Create a heartbeat monitor

Give it a name (like "nightly-backup") and set the expected schedule. If your job runs daily at 2am, tell the monitor to expect a ping every 24 hours.

2. Add a grace period

Jobs don't always run at exactly the same time. A backup might take 5 minutes one day and 20 minutes the next. Set a grace period that accounts for normal variation.

For a daily job, 30-60 minutes of grace is usually fine. For an hourly job, maybe 10 minutes.

3. Add the ping to your cron job

You'll get a unique URL. Add it to the end of your cron command:

0 2 * * * /home/user/backup.sh && curl -fsS --retry 3 https://oonchk.com/your-unique-id

The flags:

-f - Fail silently on HTTP errors
-s - Silent mode (no progress output)
-S - Show errors if they occur
--retry 3 - Retry a few times if the network hiccups

4. Configure your alerts

Choose where you want to be notified: Slack, email, PagerDuty, SMS, etc.

For critical jobs (like backups), I'd recommend at least two channels. Slack for awareness, phone/SMS for wake-you-up urgency.

That's it. If your job stops running, you'll know within minutes instead of weeks.

Common issues with cron jobs

Over the years, I've seen the same problems come up again and again:

The job runs but fails

Your script starts, hits an error, and exits with a non-zero status. If you're using && before your ping (as above), this is caught automatically.

If you're not using &&, your monitoring will think everything's fine when it isn't.

The job takes too long

A job that usually takes 10 minutes suddenly takes 2 hours. This might be fine, or it might mean something's wrong.

Good monitoring tools let you track job duration, not just completion. If a job starts taking significantly longer than usual, that's worth investigating.

The job overlaps with itself

Your hourly job takes 90 minutes to run. Now you've got two instances running at once, possibly fighting over the same resources.

This isn't strictly a monitoring problem, but it's something to watch for. Use a lockfile or flock to prevent overlapping runs:

0 * * * * flock -n /tmp/myjob.lock /home/user/myjob.sh && curl ...

Environment differences

This one's classic: the job works when you run it manually, but fails under cron.

Cron runs with a minimal environment. Your $PATH is different, your shell config isn't loaded, and environment variables you take for granted aren't set.

Always use full paths in cron jobs, and set any required environment variables explicitly.

The server rebooted

Cron usually starts automatically on boot, but not always. And even if cron is running, your job might depend on other services that aren't ready yet.

This is where heartbeat monitoring really shines - if the job doesn't run after a reboot, you'll know.

What to monitor

Not every cron job needs monitoring. Here's how I think about it:

Always monitor:

Backups (you really don't want to find out these stopped working when you need them)
Jobs that affect money (billing, invoices, payments)
Data pipelines and syncs
Security-related jobs (certificate renewal, log rotation)

Probably monitor:

Reports and notifications
Cleanup jobs
Health checks

Maybe skip:

Low-importance jobs you'd barely notice if they stopped
Jobs that have other visible effects when they fail

The rule of thumb: if you'd want to know within an hour that this job stopped running, monitor it.

Silent failures are the worst kind of failures. By the time you notice something's wrong, the damage is already done.

Adding a simple ping to your cron jobs takes about 30 seconds per job. It's one of those small investments that pays off enormously when things go wrong.

And things always go wrong eventually.