26 Jul 2017

Transferring files with AWS Lambda

Following up on Philippe's excellent review on AWS Lambda, let's use it for heavy duty task: transfer files from Autodesk Data Management to another online storage and vice-versa. 

Why?

Transfer a big file will require a lot of bandwidth (i.e. internet connection). If the server that allocates the entire webapp is dimensioned to handle this transfer, it will most likely be underused most of the time, but will be super busy when transferring the big file, and there is a big chance of being non-responsive. Using AWS Lambda server-less approach, the webapp can be hosted on a "small" server and user infinite elastic and scalable power ONLY when needed. Isn't that nice?

As per Philippe's article, we need the AWS Lambda and the AWS API Gateway. This post will assume both services are setup and will focus on creating only the Lambda. 

Now we need 2 pieces: first, the webapp itself holds the 3-legged OAuth tokens for both Autodesk and the other storage. Second, the Lambda need only the source and destination with the appropriate headers, which is actually a good way to keep it generic.

1. Webapp

This piece will prepare the source and destination headers. For this sample, let's assume a transfer from Autodesk Data Management (BIM 360 Team & Docs, Fusion 360) to Google Drive. As Autodesk keep the actual storage URL of the file on the VERSION endpoint, the source header will be like:

var source = {
  url: version.relationships.storage.meta.link.href,
  method: "GET",
  headers: {
    'Authorization': 'Bearer ' + autodeskBearerToken
  },
  encoding: null
};

Now the destination on Google Drive will be similar, but it requires a file ID and mime type, which are easy to obtain with their SDK (here for NodeJS)

var destination = {
  url: 'https://www.googleapis.com/upload/drive/v2/files/' + googleFileId + '?uploadType=media',
  method: 'PUT',
  headers: {
    'Content-Type': newFileMimeType,
    'Authorization': 'Bearer ' + googleBearerToken
  }
};

From the above, we notice that (as RESTfull APIs) the download is a GET and the upload is a PUT. This sample function has the complete implementation for this route (Autodesk to Google Drive), but you'll find the other routes at the same project.

And why not use the respective SDKs to handle download and upload? First, the SDKs are prepared to download the entire file and then upload, which will require some persistence (e.g. AWS S3), and we don't want to keep a temporary copy of files. Second, why make this copy if we can stream it?

2. Lambda

The code of the function in NodeJS is quite simple with Request package:

request(source).pipe(request(destination))

Really, that's the essence of download from the source and pipe to the destination! Ok, you need some error checking and feedback when the task is done, at least. Here is a more complete version and the index.js.

Done!

With this approach, the AWS Lambda is generic enough so it can transfer from any source to any destination. The API Gateway Authorization protects our API from malicious usage, we don't want to expose this to outsiders, right? The webapp will delegate every file transfer and provide the user with progress feedback.

This sample full source code is available here! Note there is an exception: Box upload requires multipart upload, will improve that in future :-)

Enjoy!

It also works without AWS Lambda if you deploy and don't specify the respective environment variables.

Related Article