For my latest project, I've created a set of AWS Lambda functions each following its own single responsibilities. I've utilized Javascript for my functions since I wanted to take advantage of it's HTML-parsing capabilities.
To simplify navigation between the source code of my lambdas, I've organized them all inside a mono repo available on Github. Originally, each function had its own node_modules folder with necessary dependencies and a dedicated test folder. The project structure looks as below
Let's have a look at two different modules located in two different functions.
/*global console, fetch*/
import jsdom from "jsdom";
import readability from "@mozilla/readability";
import sanitize from "./sanitizer.mjs";
import removeFootnotes from "./footnote-remover.mjs";
export const handler = async (url) => {
console.trace(`entered article tokenizer. processing ${url}`);
try {
const res = await fetch(url);
const html = await res.text();
const doc = new jsdom.JSDOM(html);
const reader = new readability.Readability(doc.window.document);
const article = reader.parse();
console.trace(`extracted article text: ${article.textContent}`);
return sanitize(removeFootnotes(article.textContent));
}
catch (err) {
console.log(err);
}
}
And another one
/*global console*/
import jsdom from "jsdom";
const blacklist = ["youtube.com"];
const extractUrl = (url) =>
{
const qIndex = url.indexOf('q=');
const saIndex = url.indexOf('&sa=');
return url.substring(qIndex + 2, saIndex);
}
const extractLinks = (data) => {
console.trace(`extracting links from ${data}`)
const links = [];
const dom = new jsdom.JSDOM(data);
const anchors = dom.window.document.querySelectorAll('a[data-ved]');
anchors.forEach(a => {
if (!a.href.startsWith('/search')) {
links.push(extractUrl(a.href));
}
});
console.trace(`returning ${links}`);
return links.filter(_ => !containsBlacklistedItems(_));
}
const containsBlacklistedItems = (str) => {
for (let i = 0; i < blacklist.length; i++) {
if (str.includes(blacklist[i])) {
return true;
}
}
return false;
}
export default extractLinks;
As you might notice, although they are from different modules both of them rely on the same jsdom module. Thus we can conclude that such project structure leads to dependency duplication for different functions.
Another level of duplication comes from deploying scripts. To perform deploy of the function we need to complete the following steps.
- Restore dependencies
- Run linter
- Run tests
- Package function
- Deploy to AWS
To eliminate duplication, we want to restore dependencies and run linter and tests once and for all functions.
There is, however, a way to build and package all the dependencies at once using Lambda layers. This approach lets us package all the dependencies into a separate layer and treat it for our functions as a common runtime.
We’ll reorganize our repository to look like this.
A couple of things changed.
- We've merged node_modules out of the src
- We've merged tests
- We've merged all separate builds into a single build.yml
Apart from fixing import paths source code largely left untouched. Let's, however, take a closer look at the build and deploy actions since there are a couple of interesting moments here, worth highlighting.
First of all, let’s have a look at a separate action that deploys the layer.
name: Deploy Modules Layer
on:
workflow_call:
secrets:
AWS_ACCESS_KEY_ID:
required: true
AWS_SECRET_ACCESS_KEY:
required: true
jobs:
layer:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [20.x]
# See supported Node.js release schedule at https://nodejs.org/en/about/releases/
steps:
- uses: actions/checkout@v3
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
cache-dependency-path: "./src/package-lock.json"
- uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: eu-central-1
- run: cd ./src && npm ci
- run: cd ./src && zip -r layer.zip node_modules
- run: cd ./src && aws lambda publish-layer-version --layer-name poeme-concrete-modules --zip-file fileb://layer.zip
Nothing special is happening here apart from the fact that now we are using AWS lambda publish-layer-version. Now, let’s jump to consuming the deployed layer when we deploy our functions.
name: Article Extractor Deploy
on:
push:
branches: [ "master" ]
jobs:
layer:
uses: ./.github/workflows/modules-layer-deploy.yml
secrets: inherit
lambda:
runs-on: ubuntu-latest
needs: layer
strategy:
matrix:
node-version: [20.x]
# See supported Node.js release schedule at https://nodejs.org/en/about/releases/
steps:
- uses: actions/checkout@v3
- name: Use Node.js ${{ matrix.node-version }}
uses: actions/setup-node@v3
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
cache-dependency-path: "./src/package-lock.json"
- uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: eu-central-1
- run: cd ./src && npm ci
- run: cd ./src/article-extractor && zip -r lambda1.zip ./
- run: cd ./src/article-extractor && aws lambda update-function-code --function-name=article-extractor --zip-file=fileb://lambda1.zip
- run: echo "layer-arn=$(aws lambda list-layer-versions --layer-name poeme-concrete-modules --region eu-central-1 --query 'LayerVersions[0].LayerVersionArn')" >> $GITHUB_ENV
- run: aws lambda update-function-configuration --function-name=article-extractor --layers="${{ env.layer-arn }}"
Here, a couple of things are worth noting.
First of all, it is how we rely on the deploy layers job.
jobs:
layer:
uses: ./.github/workflows/modules-layer-deploy.yml
secrets: inherit
The thing that might be unobvious to newcomers is how we use secrets: inherit to pass secrets down to the layer deploy action. One might naturally assume that it will infer secrets from the Github storage. However, this is not true, and child action infers secrets from parent workflow.
Another important thing is forcing newly deployed functions to use the latest version of the published layer. We achieve this in two steps.
Step 1. Querying for the latest layer version and storing it inside the environment variable.
echo "layer-arn=$(aws lambda list-layer-versions --layer-name poeme-concrete-modules --region eu-central-1 --query 'LayerVersions[0].LayerVersionArn')" >> $GITHUB_ENV
Step 2. Using stored value to configure update function configuration.
aws lambda update-function-configuration --function-name=article-extractor --layers="${{ env.layer-arn }}"