What I learned running a SaaS for a second year
Max Rozen / February 20, 2023
Two years ago, OnlineOrNot started as a little toy app I built in an afternoon to see what it's like using the Next.js framework, to see if a URL is down from around the world.
I gave myself a week to turn that toy into a SaaS people could pay for. It looked like this when it went live:
It wasn't ready for real users, but that didn't matter. I had something out there, that people could sign-up for, tell me what they were expecting, and how OnlineOrNot fell short of their expectations.
Since then, I've been shipping features in two hour blocks, cutting down scope aggressively to ensure something goes out into the world, each time I write code for OnlineOrNot. I then get feedback on what goes out, and the app improves.
These days, OnlineOrNot can do a quite a bit more than just visit a page, and send an email alert when it's down:
OnlineOrNot Uptime Checks get used to monitor web apps, APIs, internet connections, residential power, IoT devices, blogs, and of course, regular websites. It's also no longer just an uptime monitor. OnlineOrNot is more of a status page service, with built-in uptime monitoring.
I learned quite a bit in my first year of running a SaaS, and this year's learning builds on top of that.
Table of Contents
- OnlineOrNot runs on opportunity cost
- No, running two SaaS businesses isn't a good idea
- Ship small, invalidate your assumptions
- Talk to potential customers, even if you think you have nothing for them
- What I actually shipped
I don't spend anything on customer acquisition apart from opportunity cost. I build features and write content for OnlineOrNot's customers at the expense of possibly other lucrative activities I could be doing.
Folks often look at my "2 hours per work day, every work day" rule, calculate roughly what they think my hourly rate is, and say "pffft I'd rather be doing nothing".
The thing is, I'd rather be doing this. To me, OnlineOrNot is like painting, and if years down the track I decide to start a new artwork, I don't have to learn to paint again.
Early on in the year, I had the bright idea to spin out OnlineOrNot's internal feature flag service, and make it a SaaS project.
The thought process was essentially:
|it was not different this time, dear reader.|
I spent a weekend building a prototype SaaS app, launched it, and nothing happened. I talked to potential customers throughout my professional network, and folks were just not that keen. It turns out, I wasn't that keen either. I worked a bit on content marketing, let the project sit for a few months, and eventually shut it down.
It cost me time I could have spent building features for OnlineOrNot, and was just a distraction, in the end.
It turns out no one cares if you came up with a really fast way to do feature flags over Cloudflare's shiny new developer platform - your service didn't exist a week ago and has almost no features, and there are feature flag services that have been around for a very long time, with an adequate feature set.
It made me realise OnlineOrNot's moat is that I don't plan on giving up on it, I'm genuinely interested in the problem space, and I keep myself employed full-time so that I can build OnlineOrNot the way I want, with zero risk to my livelihood.
The trouble with holding off from releasing a feature because it "isn't ready", is that folks can't tell you if your assumptions are wrong, until you spend months building on top of your incorrect assumptions.
OnlineOrNot's Status Page feature came from a conversation with a customer trying to figure out how to sign up hundreds of email addresses to get notifications when an uptime check fails.
I built something as quickly as possible, showed the customer (and a few other customers, friends, and colleagues), realised where I went wrong, and tried again.
OnlineOrNot itself was released after only 7 days of part-time development, after all.
I was contacted by a CTO at some point during the year asking if OnlineOrNot supported some particular feature.
My normal reaction would've been to just say "sorry, no" and leave it at that, but I got curious and started asking what they'd like to achieve with the feature, and told them how I figured I would build the feature.
They signed up to a paid plan the next day, they've been customers ever since, and the feature gets used by other customers too.
That's all for what I learned this year. Below is a sort of CHANGELOG of what I shipped in OnlineOrNot.
I recently wrote 2022: I just kept shipping, and while it's nice to tell folks "just keep shipping", I think it's valuable to point out just how much you can get done if you give yourself 2 hours, and ship ever day.
There were months where my focus was more on marketing/writing docs/helping and talking to customers/drinking wine by the lake/I just didn't feel like writing code, so not all months resulted in the same amount of feature development, but I still released my articles/docs/etc ASAP, and iterated on them while they were live.
- The uptime dashboard now actually tracks the last 24h of uptime (used to be response time)
- Added auto-refresh to the uptime dashboard so folks don't need to remember to refresh the page
- Made the uptime dashboard mobile-friendly
- Migrated the database from Intel to ARM
- Made the rest of the web app mobile-friendly
- Made it clearer why OnlineOrNot thinks a check is failing
- Reached 50 million all-time uptime checks
- Published Communicating to Users During Incidents
- Cleaned up my sign-up form to remove distractions, halving the drop-off rate in the process
- Built the first screen of a new onboarding flow and immediately released it
- Added a second screen to the onboarding flow to enable people to join the mailing list
- Added a third screen to the onboarding flow to figure out how people found out about OnlineOrNot (majority of folks come from me shitposting on Twitter, and commenting on hacker news) and what their name/company name is (so I finally stopped sending emails starting with "Hey there,")
- Added a fourth screen to the onboarding flow to let folks add their whole team in one go (it's just a textfield, and I parse it to invite folks)
- Added GitHub Auth to the login/signup screen
- Added a free trial sign-up option to the onboarding flow (that's right, I only added free trials after 11 months of running the business)
- After shipping the free trial sign-up, I had 14 days to implement what happens at the end of a free trial, so I shipped that separately. In the meanwhile, I was manually sending folks their "getting the most out of your free trial" emails
- Removed the paywall from a few uptime monitoring features
- Wrote the first year's version of this blog post, around 33k people read it
- Wrote How OnlineOrNot uses OnlineOrNot to run OnlineOrNot
- Added a "you're on a free trial!" banner to OnlineOrNot
- Wrote a pair of guides for the docs to help folks getting started with their free trial
- Added human-sounding errors when uptime assertions fail
- for example: "Looked for a value at
true, which we expected to be
- for example: "Looked for a value at
- Added the ability to report on SSL certificate validity
- Worked on the deployment pipeline, getting a 7 minute build down to 1 minute 45 seconds
- Got tired of forgetting to write docs after releasing features, so I moved the docs into my app's monorepo, and gave it the same look and feel as the rest of my web presence (I still get customers telling me they wish their app had this - but I'm not planning on spinning off a docs product)
- Made it possible to BYO Twilio account to enable unlimited SMSes for free
- Added a "home screen" to OnlineOrNot
- Used my new docs as a template for an Incident Management mini-site. Before this, I had a few blog posts about on-call, run-books, and navigating between them was annoying.
- Moved house, took the time to explore my new town/region, so didn't ship much
- Tested and tweaked copy on my landing pages
- Unified browser checks and regular uptime checks into a single dashboard
- Moved most of my DNS management from AWS to Cloudflare (I got hired by Cloudflare for $DAY_JOB, and it's a lot faster than AWS)
- Built and shipped the DNS system behind OnlineOrNot's Status Pages (what allows me to simultaneously display status pages at custom domains and on OnlineOrNot's subdomain)
- Started showing the last 14 days of incident history on status pages
- Actually made it possible for my paid customers to sign up for a status page at a custom domain
- Made it possible to manually add incidents to status pages
- Made it possible to manually update existing incidents on status pages
- Started an Early Access Program so I didn't have to feature flag early features one account at a time
- Migrated most of my core uptime check workload off AWS and onto fly.io, and wrote about it
- Made it possible to add components to a status page
- Made it possible to pick components affected by an incident
- Published Writing your first runbooks
- Published Guidelines for writing better runbooks
- Made it possible to link existing uptime checks to a status page, and automatically start and resolve incidents based on uptime check data
- Hacker News went down, 19k people ended up checking out https://hackernews.onlineornot.com, and I wrote about the experience
- Made it possible to sign up to status page updates, without needing to sign up to OnlineOrNot
- Published Postmortem Templates
- Edited and republished Improving your team's on-call experience
- Summer holidays
- Added a way to highlight active incidents on the status page
- Made it possible to make a status page private (password-protection)
- Added live system metrics to status pages
- Used OnlineOrNot to acquire the @OnlineOrNot handle on Twitter
- Worked on my landing pages
- Fought off an attack from spammers looking to abuse OnlineOrNot
- Added support for webhook events from Uptime Robot
- Migrated the business from Australia to France
- Holidays in Australia
- Added Cloudflare Workers to the list of "backup uptime checkers" that OnlineOrNot uses to verify a website is actually down
- Published Saving your team from alert fatigue
- Made it even clearer why OnlineOrNot thinks a check is failing
- Added support for HTTP Basic Auth in uptime checks
- Hit 120 million uptime checks
- Made it possible to duplicate an uptime check