Monday, May 30, 2011

Google App Engine: Deployment Ensuring Sustainable Evolution


I use Google App Engine for Java since it was relased. I did a few smaller projects like Facebook application for behej.com backed by AppEngine in the past. Recently I started to use AppEngine more seriously. Therefore I had to define environments and procedures that are closer to these that are commonly used in the enterprise environment.

I defined 3 environments:
  • Development
    • This is where I do actual development of the application apart to localhost. There are no restrictions on changing the code and data. Data might be screwed (even intentionally) and/or deleted. The purpose of this environment is to enjoy the freedom of development and testing AppEngine features and APIs online.
    • It is separate AppEngine application deployed at the URL like http://foo-development.appspot.com
  • Pre-production
    • I use pre-production instance to verify that new version of the application (that I plan to release) works with the production data and that the upgrade will be smooth and easy. The data are replicated from production to pre-production only from time to time. Therefore there is always a slight difference which is perfectly OK, because this instances is not supposed to be backup. For the replication of the data I use REST interface of my application. 
    • Alternatively you may consider for example incremental pull-style (replica is pulling data from the master) replication of the data using TX backed Tasks (copy only if TX succeeds).
    • This is also separate AppEngine application deployed at the URL like http://projectname-preproduction.appspot.com
  • Production
    • This is where the production version of the application runs and where real users are working with the real data. 
    • This is obviously separate AppEngine application deployed at the URL like http://projectname.appspot.com
There are a couple of AppEngine features that could be used to define these environments. For example namespaces. After I compared different pros and cons I decided to go with the option described above. It gives me the highest level of isolation and data separation. Therefore it seems to be the most robust and safe solution. I also defined upgrade procedures that differ based on the type of the change and its impact:
  • Presumption
    • AppEngine supports multiple versions of the applications to be deployed. Note that these applications share the same storage. Let me presume that the production application is deployed as version 1 and this version is made default.
  • New version upload
    • Upload new version of the application - version 2.
  • Test upload
    • Make the final check that the application works as expected by accessing and testing it at http://2.foo.appspot.com
  • Release
    • Make the version 2 the default.
This is zero downtime upgrade. You don't have to define new version each time you deploy a new application version - I rotate just 3 versions as shown in the diagram above.

In case of a major release, that includes data model change (find the definition of the major release in the best practices paragraph below), there might have to be a scheduled maintenance window:
  • Presumptions 
    • The same as above.
  • Disable the access 
    • Disable the access to the application and display maintenance info page to users.
    • AppEngine storage can be also switched to read-only mode from the administration console. 
  • Backup
    • AppEngine provides no backup. Fortunately there are a couple of doable options:
      • Copy data using Datastore Admin (experimental)
        • With Datastore Admin tool You can safely copy data between different applications. This tool can be used to migrate from Master/Slave to HRD datastore, but also for a backup.
      • Export the repository using:
        • Bulk loader tool (yes, it works with Java).
        • Alternatively with your own backup service (for example REST-based) and tool - as I do.
      • Snapshots
        • Although AppEngine doesn't support snapshots (yet ;-) you can easily implement them yourself. For example by creating set of "snapshot" variants of your persistence beans. By adding 'snapshot' field to these beans you can even maintain multiple snapshots. Again you can use namespaces in this case.
        • This is in place (scope of single application) backup and/or transformation. 
    • The backup can include only entities (and entity instances) impacted by data model changes. Typically there is no need to dump whole storage. 
  • Transform (if needed)
    • Based on the size of the repository transformation to the new data model schema can be performed:
      • Offline on the exported image which requires subsequent drop of the all affected datastore tables and import of the new image. 
      • In-place transformation that is realized by running a set of statements (export/drop/import not required).
  • Deploy new application code
    • The same steps as above - new version upload, test and release.
  • Enable the access
    • Switch off the maintenance info page and enable the access to application again.
The procedure described above is pretty expensive as the amount of the data grows. You have to basically pay for re-imaging the application (requests, bandwidth, read and write operations). Such approach obviously becomes unusable once your storage size is bigger than small.

Zero downtime could be achieved for example using 2 isolated applications, data replication and DNS reconfiguration. I'm not that far yet. 

Let me also mention data model change best practices. They enable in place upgrades and work with any size of the storage. It might be obvious, but it is worth to remind. The motivation is to minimize the impact of changes you do (e.g. avoid maintenance windows):
  • Do not change field names
    • If there is really need for such a change, rather create a field with new name, deprecate the old one and define an administrative task that will copy the data from the old field to new field and/or default value. 
  • Do not change fields data types
    • Use deprecation in the same way as above.
  • Do not delete fields 
    • Simply deprecate them.
  • Make the fields optional (if possible)
    • This guideline might be translated to - allow null values on the persistence tier, because it can make your upgrades doable on top of AppEngine repository. For example consider that you want to introduce a new field and compare boolean vs. Boolean. The datastore upgrade will fail in the first case.
  • Use major releases to purge your data model
    • Major application release is opportunity to remove all deprecated fields, polish your data model and do the transformation of the data.
My application is opened to users for several months and I'm still in learning new tricks and improving the procedures described above. The obvious motivation is to  make sure that the application will run without any problems and outages .

Labels: , , ,

0 Comments:

Post a Comment

<< Home