How OnlineOrNot uses OnlineOrNot to run OnlineOrNot
Jumping into monitoring software for the first time can be pretty overwhelming. If you're not in an exploring mood, it can be easy to get lost, and you're not entirely sure what all these knobs and buttons do.
To help lighten this feeling for OnlineOrNot, I thought it might be useful to let folks know how I use OnlineOrNot, to monitor OnlineOrNot (as part of running OnlineOrNot day to day).
You might think it's silly to monitor your own site as an uptime monitoring service, however as our monitoring infrastructure is kept separate from our marketing website and web app, I actually get notified when it goes down.
To start with, I monitor:
- the main landing page of my marketing site https://onlineornot.com/
- the URL for our marketing site sitemap: https://onlineornot.com/sitemap.xml
- the main API endpoint https://onlineornot.com/api/graphql
As OnlineOrNot's marketing site is mainly static HTML (not powered by anything server-based like WordPress), monitoring the main landing page covers almost every page that could go down. I also monitor the sitemap as it's generated by a script at build time, and that script has failed in the past.
I use the following settings for both my main landing page, and the sitemap.xml file.
To start with, I have OnlineOrNot check its own landing page every minute:
Things can get noisy on the internet, and it's possible for a website to "go offline" for a minute or two without it being a particular drama (assuming it's not a regular occurrence). As a result, I only want to be notified if my landing page check fails 5 times in a row:
To be sure it's actually the page I expect that OnlineOrNot is checking, I also set the 'Text to search for' to look for part of my main heading.
For APIs, things are a little bit different. If the API check fails, I know something is wrong, and needs investigating immediately.
I have OnlineOrNot check its own API every minute:
To make OnlineOrNot actually check the API correctly, I have OnlineOrNot make an API request as a real (test) user, with a valid GraphQL query.
I set the following HTTP request settings:
As GraphQL APIs can return 200 OK even when things are going horribly wrong, it's important to set assertions to check the data you queried is coming back correctly:
Finally, in advanced settings, I set the check to monitor from a location close to my database (for fastest results), and set it to only alert me if two checks in a row fail.
As I'm already checking the response via Assertions, I don't set 'Text to search for' for my API check.
As I check my phone (way too much), I find email notifications to work quite well when things go wrong (no additional settings required).
For added redundancy though, I also have alerts sent to Slack and Discord, which I've added as integrations for my account: