I have had this article brewing for some time now but it has never really had a point. I think the drafts in my head were trying to be too measured, even handed, educational, whereas this probably works better if I be an old man shaking my fist at a cloud.
Go and read Rachel Kroll on being a sysadmin or what.
Similar to Rachel, when I was starting my career I looked up to people who ran large systems and called themselves sysadmins: at that time the bleeding edge of scalability was the middle of the sigmoid adoption curve in universities, and in early ISPs, so “large” was 10k - 100k users. And these sysadmins were comfortable with custom kernels and patched daemons.
The first big open source project I got involved with was the Apache httpd, which was started by webmasters who had to fix their web servers, and who helped each other to solve their problems. Hacking C to build the world-wide web.
About ten years later, along came DevOps and SRE, and I thought, yeah, I code and do ops, so what? I like the professionalism both of them have promoted, but they tend to be about how to run LOTS of bespoke code.
Go and read David MacIver on “situated code”.
A lot of the code I have inherited and perpetrated has been glue code that’s inherently “situated” - tied to a particular place or context. ETL scripts, account provisioning, for example.
There are actually two dimensions here: situatedness (local vs. global) and configness vs hardcodedness. The line between code and config is blurry: If you can’t configure a feature, do you write a wrapper script, or do you hack the code to add it?
Local patches are just advanced build-time configuration.
Docker images are radically less-situated configuration.
Code is not as bad as personal data - that’s toxic waste. Code is more like a costly byproduct of providing a service. Write code to reduce the drudgery of operating the service; then operations becomes maintaining the bespoke code.
Maintaining the code becomes bureaucratic overhead. A bureaucracy exists to sustain itself.
How to reduce the overhead? DELETE THE CODE. How do you do that? Simplify the code. Share the code. Offload the code.
Unless you are
Red Hat IBM.
There has been a lot of argument in recent months about open source companies finding it hard to make money when all the money is going to AWS.
Go and read this list of fail.
The code I write is a by-product of providing a service. This is how Apache httpd and Exim came to be. The point was not to make money from the software, the point was to make some non-software thing better. And sharing improvements to code that solves common problems is the point of open source software.
Don’t solve a problem with open source software using code that you can’t share.
Amazon has an exceptionally strict policy of taking open source code and never sharing any of the improvements they make. Their monopolizing success is the main cause of the recent crisis amongst open source software businesses. It isn’t open source’s fault, it’s because Amazon are rapacious fuckers, and monopolies have somehow become OK.
Everything I have learned about software quality in practice I have learned from open source.
From the ops perspective, before you can even start to consider the usual measures of quality (documentation, testing, reliability …) open source forces you to eliminate situatedness. Make the code useful to people other than yourselves. Then you can share it, and if you are lucky, offload it.
If some problem is difficult to solve with your chosen package, you can often solve it with a wrapper script or dockerfile. You can share your solution in a blog post or on GitHub. That’s all good.
Even better if you can improve the underlying software to make the problem easier to solve, so the blog posts and wrapper scripts can be deleted. It’s a lot more work, but it’s a lot more rewarding.