AWS Lambda@Edge brotli content-encoding

This post was heavily inspired by GoSquared - Lambda@Edge: putting Serverless even closer to your users which describes how to enable gzip compression using Lambda@Edge along with CloudFront and S3 storage. In this writeup I extend this with 3 additional points:

  1. Brotli compression
  2. fully autmatic sync script (that's a bit harder than I thought)
  3. Structure of AWS lambda headers and test events for verifying if your lambda works

tl;dr Source code is available on github

Problem description

There's been one thing I've been missing in my AWS S3 + CloudFront setup that I knew from Spring Boot world - GzipResourceResolver which lets you aggressively compress your static resources during build time and server it to clients supporting Content-Encoding: gzip.

Having a build time script means you can apply more agressive rules because you don't need to worry about response latency which is the case in case of on-the-fly compression. Which might give you much better results, which in turn means your content gets delivered quicker and with less bandwidth.

I was really really happy to see article by GoSquared - Lambda@Edge: putting Serverless even closer to your users which showed exactly how to achieve it with AWS infrastrcture, leveraging Lambda@Edge. Let's start with their approach and then I'll show you how to upgrade it with support for more modern algorithm from google called brotli which is said to improve compression by 37% and it's supported by all browsers!

Gzip compression support with GoSquared lambda

Let's start with Lambda@Edge equivalent of GzipResourceResolver from Spring Boot - a simple lambda. It's job is to check if client supports gzip encoding and under the hood redirect traffic to compressed resource which is served to client.

exports.handler = (event, context, callback) => { const request = event.Records[0].cf.request; const headers = request.headers; let gz = false; const ae = headers['accept-encoding']; if (ae) { for (let i = 0; i < ae.length; i++) { const value = ae[i].value; const bits = value.split(',').map(x => x.split(';')[0].trim()); if (bits.indexOf('gzip') !== -1) { gz = true; } } } // If gzip is supported, use the pre-compressed version of the file, // which is the same URL with .gz on the end if (gz) request.uri += '.gz'; callback(null, request); };

By default AWS CloudFront everything else than gzip is stripped from in accept-encoding header, so it's fairly simple. Next we need to upload gzipped files.

Automatic gzip compression script

Initialy my number of files wasn't big enough to encourage me to write automated syncing script but because I was about to have it doubled - second part generated by automated script - I decided to do both in the same run.

Let's start with automatic compression

find -L . \ \( -name "*.html" -o -name "*.css" -o -name "*.js" -o -name "*.svg" -o -name "*.m3u8" \) \ ! -path "./node_modules/**" ! -path "**/node_modules/**" -exec gzip -f --keep --best {} \\;

You can notice that following script does 2 things:

First it selectively compresses all text files: html, css, js, svg and m3u8 (which is multimedia playlist file).

Secondly it follows symlinks - in my project multimedia files are organized outside code and symlinked so it was kind of surprise that I didn't get those compressed as well.

Automatic gzip sync script

Now that there's a gzipped version of every text file we simply need to upload them to S3. When uploading gzipped files you need to tell S3 that these files should be served as gzip. So I ended up with 2 scripts - one for syncing non-compressed files and second for gzipped files.

#synchronise non-compressed files aws s3 sync . s3://$bucket_name --exclude="*.gz" --exclude="**/*.gz" \ --exclude=".git/**" --exclude="*.DS_Store" --exclude=".gitignore" \ --exclude="*tsconfig.json" --exclude="node_modules/**" --exclude="**/node_modules/**" \ --exclude="package.json" --exclude="*.ts" #synchronise gziped files aws s3 sync . s3://$bucket_name --exclude="*" --include="*.gz" \ --include="**/*.gz" \ --exclude="*.ts" --content-encoding gzip

As a way of verifying if everything worked well I invalidated one file so CloudFront removed it from edge locations and it simply worked!

If your like me and you're curious what exactly hapeened here's a simple explanation. I checked metadata of a file, to see what is the difference between compressed and non-compressed version. You can see screenshots from S3 below for index.html.

Obviosly S3 sync is clever enough to know that gzip will be used only for sending compressed content to the browser and handles all extra headers for us 😌

Brotli compression on AWS

CloudFront brotli accept-encoding setup

As mentioned before: CloudFront removes by default anything else than gzip from Accept-Encoding header so in order to prevent it you need to add this header to CloudFront whitelist

So now we're able to extend original lambda with br encoding

exports.handler = (event, context, callback) => { const request = event.Records[0].cf.request; const headers = request.headers; if (headers && ( request.uri.endsWith('.css') || request.uri.endsWith('.html') || request.uri.endsWith('.js') || request.uri.endsWith('.svg') || request.uri.endsWith('.m3u8') )) { let gz = false; let br = false; const ae = headers['accept-encoding']; if (ae) { for (let i = 0; i < ae.length; i++) { const value = ae[i].value; const bits = value.split(/\s*,\s*/); if (bits.indexOf('br') !== -1) { br = true; break; } else if (bits.indexOf('gzip') !== -1) { gz = true; break; } } } // If br is supported use .br sufffix, .gz for gzip :) if (br) request.uri += '.br'; else if (gz) request.uri += '.gz'; } callback(null, request); };

As you might have noticed I also had to tweak extracting ecncodings from accept-encoding header as this method worked after some trial and error.

I also added filtering of file types just to the ones there are actually compressed, so I don't end up serving already compressed images or h264 video in gzip or brotli (you get the point - compressing already compressed).

Brotli compression setup

In order to use brotli compression you must have a way of compressing your text files using this algorithm - the easiest way for me was to use brew but brotli project page mentions node.js module and 7zip plugin.

brew install brotli

Automatic brotli compression script

Now we're ready to brotling our files - I used very similar method to gzip. Using find I provide list of files to be compressed and execute command line tool for compressing them. Output is stored in the same directory.

find -L . \( -name "*.html" -o -name "*.css" -o -name "*.js" -o -name "*.svg" -o -name "*.m3u8" \) \ ! -path "./node_modules/**" ! -path "**/node_modules/**" -exec brotli -f --keep -9 {} \\;

Automatic synchronisation of brotli files

You might expect that this is going to be as easy as with gzip sync but unfortunately it's not. If you simply run aws s3 sync with updated content-encoding you'll end up binary/octet-stream header 😭.

#synchronise brotli files aws s3 sync . s3://$bucket_name --exclude="*" --include="*.br" \ --include="**/*.br" \ --exclude="*.ts" --content-encoding br

This means Content-Type header needs to be provided manually. So unfortunately for each mime type, you'd like to upload into S3, you need separate command.

#synchronise brotli/html files aws s3 sync . s3://$bucket_name --exclude="*" --include="**/*.html.br" \ --include="*.html.br" --include="*.html.br" --content-encoding br --content-type="text/html" \ --exclude="*.ts" #synchronise brotli/js files aws s3 sync . s3://$bucket_name --exclude="*" --include="**/*.js.br" \ --include="*.js.br" --include="*.js.br" --content-encoding br --content-type="application/javascript" \ --exclude="*.ts"

Now it looks much, much better 😁

Summary

This is it - now your lambda can support 2 types of content-encoding and you can autmatically synchronise all your files.