28. May 2015

How I would have rearchitected Github Pages

This is a response to this blog post

The GH engineers correctly identify the MySQL dependency as a problem. It’s such an easy one to remove, though!

The MySQL database is serving as a simple map of URLs to fileservers. We can do that with cryptography instead.

There I fixed it - the original architecture with MySQL scribbled out

In that Lua call, instead of doing a database query, take an md5sum of the request. Compare the result to a static table you hardcode into your script. That table splits up the entire possible range of MD5 hashes into n buckets. It looks something like:

1a..1c
  bucket1
1d..1f
  bucket2
etc...

Every time you get a request for a given URL it will hash exactly the same way and be routed to exactly the same bucket.

The Lua function returns that bucket. nginx then proxy_passes the request to a given upstream. Those upstreams are all statically defined in the nginx config file like:

bucket1:
  fileserver1.internalDNS.github.com
  fileserver2.internalDNS.github.com
bucket2:
  fileserver3.internalDNS.github.com
  fileserver4.internalDNS.github.com

You can have as many fileservers as you like in each bucket depending on how much redundancy you want. Nginx will balance traffic between them based on one of a few possible methods. You can configure it however you like.

Every time a request for a URL is made it hashes exactly the same way and goes to exactly the same place. You can make changes to your infrastructure by pointing your DNS at different places.

The number of buckets to choose is up to you. You want something large enough that it will accommodate future demand (must be greater than the total number of fileservers you will need to store your assets) but small enough that it’s not a pain in the bum to work with. 100 is probably as good a number as any.

The other side of the coin is the Jekyll workers. Presumably in this system there are servers whose job it is to render the markdown into static HTML when you git push. All they need to do is implement the same algorithm with the same table. That way they always know which set of fileservers they should be persisting their assets to. Neat!

Yay for less moving parts.

23. May 2015

Photobomb

I wrote photobomb about a year ago to scratch an itch. I wanted to share photos with my friends and family, but was sick of handing Facebook all the control. When Facebook inevitably becomes as painfully uncool as Myspace, I don’t want all my memories to be trapped there.

What I needed was something that was really easy to use. That had to be true for both the people uploading images and the people viewing images.

Photobomb creates a not-hideous image gallery based on a folder on disk. It preserves the directory hierarchy so things are kept organised. When you click on an image, you get a bigger version of that image with a pretty lightbox and the option to download the original. It’s lightweight and responsively designed.

List of Galleries

Gallery view

Lightbox

On the upload side, photobomb just watches a directory on disk. Any time an image appears in it, it creates the thumbnails it needs and starts serving it all. That’s it.

Personally I use BTSync to get the photos from my phone/laptop onto my server. You could use Dropbox, rsync, carrier pigeon, etc. Photobomb don’t care. So long as images show up in the filesystem, it’s happy.

As a little bonus, you can also configure a Facebook token so that people can Like and Share your galleries back on the FB platform. The content is still on your domain, on your server, though. If FB goes away in the future, we can just replace that little share button with whatever the new hotness is.

I’ve been happily using for the last year. It’s been great.

https://github.com/davidbanham/photobomb

20. February 2015

Coffeescript PSA

Coffeescript 1.9.0 contains a backwards incompatible change.

Changed strategy for the generation of internal compiler variable names. Note that this means that @example function parameters are no longer available as naked example variables within the function body.

I know this, because it broke my production code.

I love coffee-script, I think it’s great. I don’t really care one way or the other about semver, but I think it’s necessary.

npm uses semver. If you want to publish your package on npm, you need to use semver. If you don’t want to use semver, use a different package manager.

Publishing your package on npm and not using semver is breaking a promise to your users. That’s a crummy thing to do, and it wasted a bunch of my life this evening.