Migrating a Legacy App to Cloud Native — Part 5

Migrating a Legacy App to Cloud Native — Part 5

A storm in brewing — photo by Adam Fanello

This is part 5 in a series. If you haven’t been following it before now, here are the previous posts:

Now that I have added cloud storage via Amplify CLI, and configured it, it’s time to use the new storage in my app.

Preface

Generally this series is written as I use the tool and presented in the order that I created the content. There’s some editing from draft to publishing, but this series is about the journey and so I present it as such.

This time though, I’m coming back and adding this note before you get into the content of the post. This was absolutely the most challenging part of the migration so far, and I suspect it will remain so throughout the series. Roughly half of my time was spent reverse engineering Amplify and the Javascript SDK due to poor documentation and my frustration shows below. I want to state here that I really do like AWS, and I came into this really looking to succeed with Amplify. The engineers at AWS have created amazing tools and services that have transformed what it is to be a software developer over the past decade, making it possible to build products at scales and paces never before possible. So while I am frustrated with imperfection, I also acknowledge that without the amazing work at Amazon and AWS I’d probably still be back in part 1 building an authentication system. 😬

Be sure to read the conclusion too before getting scared off. Here we go…

Add Storage Support to Amplify App

The first step is to add the Amplify storage module to my Angular app as shown here. In anticipation of this need though, I already did that step when adding authentication. Moving on…

Basic Use — Not so Easy

The web app had been using Feathers’s client library to communicate with the Feathers backend. That must now be replaced with Amplify Storage. Fortunately, I designed the application pretty well by isolating all of the actual communication into a PersistenceService class. 🎉

The first thing to do is write something to storage, which leads to Storage.put(key, content, config). What are the config options? No single place tells you all of them. The Typescript declarations give nothing, only this section of the documentation tells you what may or may not be a complete list through a series of examples. There’s an API reference here, obviously generated from some Typescript and simply listing the config options as type any. 😕 The API for the get function is worse in a way, as it indicates that it returns “either a presigned url or the object”. Which one is a mystery perhaps controlled via the equally mysterious config?: any second parameter. Regardless, the Amplify Storage documentation states that get “retrieves a publicly accessible URL” so this would seem the most likely if less useful result. Perhaps so, but my earlier suspicion appeared confirmed when I tracked down the source code and found that the API documentation is not generated from the entirety of the code. The JsDoc proved more useful:

/**  
 * Get a presigned URL of the file or the object data when download:true  
 *  
 * @param {String} key - key of the object  
 * @param {Object} \[config\] - { level : private|protected|public, download: true|false }  
 * @return - A promise resolves to either a presigned url or the object  
 */  
public async get(key: string, config?): Promise<String | Object> {

Thank goodness for open source! The actual logic is in AWSS3Provider.ts the contents of which answered my next question regarding error handling. I see the Promise returns the raw response from the AWS-SDK’s s3.getObject. So tracking that API down reveals only that the reject data type is Error, as generic as it gets. 😕 Also, only here do I learn that upon choosing to download, the resulting Object will have a property named Body of “Typed Array”… whatever that is. Through experimentation (i.e. console.log), I found that the body is Uint8Array, and a toString(‘UTF-8’) on that gives back the string that I stored. (I'm generally under the opinion that any API with anything that amounts to a getter and setter aught to return a value in the same form it was set.)

The situation is even worse again for Storage.list(). The example here simply indicates that it returns a result. That’s it. It returns something. Once again I traced it down to the SDK s3.listObjects documentation, but this time the actual results did not even match what is there and thus showing that AWS’s documentation problem extends beyond Amplify. (FYI: The actual returned result is just the Contents array, and its fields start with lower-case, not the documented upper-case.)

I can probably dig further into the source, but this exercise has reached a level of absurdity. It is a failure of an API, and a sign of unmaintainable code, when anything past the documentation and function signature needs to be read by someone trying to use it. The point of Amplify is to make using AWS easy. Its storage module turned out to be just another layer of mystery to sleuth my way through. It’s particularly a miss here given that the Amplify source itself has a bit more documentation sitting there ready to provide help, but it is hidden away.

Here are highlight snippets of what actually worked with the Amplify Storage API. The full code can be found in PR #4.

import {AmplifyService} from 'aws-amplify-angular';  
import {StorageClass} from 'aws-amplify';

@Injectable()  
export class PersistenceService {  
  /** API to the cloud storage */
  private readonly cloud: StorageClass;

 constructor(private readonly amplifySvc: AmplifyService) {  
    this.cloud = this.amplifySvc.storage();  
  }

  async loadUser(): Promise<UserSettings> {  
    const downloadedObj = await this.cloud.get(  
      settingsKey,   
      {level: 'private', download: true}  
    );  
    const downloadedStr = (downloadedObj as any).Body.toString('utf-8');
    const downloadedJson = JSON.parse(downloadedStr) as UserSettingsJSON;  
    // etc  
  }

  private async saveModelToCloud<T extends AbstractStorableModel>(model: T, id: string, level: 'private'|'protected'): Promise<T> {  
    let json = model.toJSON() as AbstractStorableModelJSON;  
   await this.cloud.put(  
      id, JSON.stringify(json),  
      {level, contentType: "application/json"}  
    );
  }  
}

The good news is, I now have a file in my storage S3 bucket! 🎉

When sub is not sub

In part 4 of this series I explored the storage policies and found the policy secure the S3 storage path "private/${cognito-identity.amazonaws.com:sub}/" in the CloudFormation.

Once I stored a file though, I saw it actually create the S3 prefix “private/us-west-2:ecee2298-97c5-4331-867c-908eef1660c8/”, while by CognitoUser has sub "508903f1-9203-4cf6-b0d8-353fc54c2916". 🤔 Why aren’t these matching?

After a good bit of digging, I finally noticed that the first part of the CloudFormation says “cognito-identity”. This is the Cognito Identity Pool. Meanwhile, the CognitoUser sub is from the Cognito User Pool. Two different things. Whoops. This is a gotcha and it is talked about at length in Amplify CLI issue #1847 and Amplify JS issue #54, where some commenters have gone a bit bizerk with complicated workarounds. It’s a surprise, but a solvable one with just a couple lines of code. Simply, I call AuthService#currentUserInfo() and use the ID from this (Identity Pool ID) instead of the value from AuthService#authStateChange$. The code changes can be seen in this commit.

Storage NoSQL Option?

In digging through Amplify to discover how to use the Storage module, I learned that it is a thin wrapper over the AWS SDK and its behavior is tightly coupled to S3. What would happen if I had chosen the NoSQL (DynamoDB) option when doing the amplify add storage command? Would the same APIs provide entirely different results? I looked back in the amplify-js storage source code, and found only the S3 provider. 😲

Then I took a peek at the iOS and Android documentation. Like the Javascript documentation, they say to choose the “Content” option. (I missed that remark in my earlier readings.) I suppose the team writing the Angular CLI is simply ahead of the client library teams. I’m glad I choose the S3 option when contemplating this during the architecting phase of the project!

Conclusion

While the Amplify CLI makes setting up AWS infrastructure easier, and the Amplify Authentication module makes managing users really easy, the Storage module has been a bust. Given that I had to dig down to the underlying SDK to figure out how to use any of it, it would have been easier to just use that underlying SDK. It is far from a lost cause though. The capabilities are there; redemption only requires proper API documentation! Proper use of Typescript would also go a long way. (Avoid type any.)

This is unfortunately a too common problem with developer-driven products. We developers like to make features, not documentation. We make the feature to the “works for me” point, call it complete, and move on to the next shiny thing. Few of us like writing for humans. Even when there is documentation (and Amplify does have a lot of documentation written) it turns out to be more fluff. “Look at this shiny new feature! Use it to do great things!” When you actually try to use it though, we get things like this:

The list function takes two parameters of something, and returns a something. 🤷‍♂️ This is not an AWS problem, it’s an industry problem that I’ve seen throughout my career. Perhaps part of the reason open source has become so dominant isn’t economic, but rather because given a choice between an undocumented proprietary library and an undocumented open source library, we use the only one that can be used.

Coming next time…

I need to spend a bit more time testing my app and making sure this storage is working, so there may be a bit of delay before I can move on. The next part shall involve using Amplify CLI to setup the very small GraphQL API so that it will generate the DynamoDB table. To get data into that table though, I’ll be monitoring the S3 data bucket for changes and storing the meta data into DynamoDB. As seen in my architectural diagram (end of part 2), I may need to step outside of Amplify for this. More to come!

(Story originally published here in September 2019.)