This post was heavily inspired by GoSquared - Lambda@Edge: putting Serverless even closer to your users which describes how to enable gzip compression using Lambda@Edge along with CloudFront and S3 storage. In this writeup I extend this with 3 additional points:
tl;dr Source code is available on github
There's been one thing I've been missing in my AWS S3 + CloudFront setup that I knew from Spring Boot world - GzipResourceResolver which lets you aggressively compress your static resources during build time and server it to clients supporting Content-Encoding: gzip.
Having a build time script means you can apply more agressive rules because you don't need to worry about response latency which is the case in case of on-the-fly compression. Which might give you much better results, which in turn means your content gets delivered quicker and with less bandwidth.
I was really really happy to see article by GoSquared - Lambda@Edge: putting Serverless even closer to your users which showed exactly how to achieve it with AWS infrastrcture, leveraging Lambda@Edge. Let's start with their approach and then I'll show you how to upgrade it with support for more modern algorithm from google called brotli which is said to improve compression by 37% and it's supported by all browsers!
Let's start with Lambda@Edge equivalent of GzipResourceResolver from Spring Boot - a simple lambda. It's job is to check if client supports gzip encoding and under the hood redirect traffic to compressed resource which is served to client.
By default AWS CloudFront everything else than gzip is stripped from in accept-encoding header, so it's fairly simple. Next we need to upload gzipped files.
Initialy my number of files wasn't big enough to encourage me to write automated syncing script but because I was about to have it doubled - second part generated by automated script - I decided to do both in the same run.
Let's start with automatic compression
You can notice that following script does 2 things:
First it selectively compresses all text files: html, css, js, svg and m3u8 (which is multimedia playlist file).
Secondly it follows symlinks - in my project multimedia files are organized outside code and symlinked so it was kind of surprise that I didn't get those compressed as well.
Now that there's a gzipped version of every text file we simply need to upload them to S3. When uploading gzipped files you need to tell S3 that these files should be served as gzip. So I ended up with 2 scripts - one for syncing non-compressed files and second for gzipped files.
As a way of verifying if everything worked well I invalidated one file so CloudFront removed it from edge locations and it simply worked!
If your like me and you're curious what exactly hapeened here's a simple explanation. I checked metadata of a file, to see what is the difference between compressed and non-compressed version. You can see screenshots from S3 below for index.html.
Obviosly S3 sync is clever enough to know that gzip will be used only for sending compressed content to the browser and handles all extra headers for us 😌
As mentioned before: CloudFront removes by default anything else than gzip from Accept-Encoding header so in order to prevent it you need to add this header to CloudFront whitelist
So now we're able to extend original lambda with br encoding
As you might have noticed I also had to tweak extracting ecncodings from accept-encoding header as this method worked after some trial and error.
I also added filtering of file types just to the ones there are actually compressed, so I don't end up serving already compressed images or h264 video in gzip or brotli (you get the point - compressing already compressed).
In order to use brotli compression you must have a way of compressing your text files using this algorithm - the easiest way for me was to use brew but brotli project page mentions node.js module and 7zip plugin.
Now we're ready to brotling our files - I used very similar method to gzip. Using find I provide list of files to be compressed and execute command line tool for compressing them. Output is stored in the same directory.
You might expect that this is going to be as easy as with gzip sync but unfortunately it's not. If you simply run aws s3 sync with updated content-encoding you'll end up binary/octet-stream header 😭.
This means Content-Type header needs to be provided manually. So unfortunately for each mime type, you'd like to upload into S3, you need separate command.
Now it looks much, much better 😁
This is it - now your lambda can support 2 types of content-encoding and you can autmatically synchronise all your files.