Saturday, August 15, 2009

The good, the bad, and the lucky

I recently had to upgrade the schema of one of my App Engine applications. In order to add a new feature, every Bar entity needed a new property "foo" that my app could execute queries on. The data for foo was not new; it was a string to be computed from other properties of the model.

The new computation logic was part of the updated Bar code, so all it would take was a little program to load all entities and re-save them. Tasks like that can be easily done via the remote API by running a relatively straightforward script. Here is a simplified version of it; for a more detailed and stable example take a look at the rietveld project:

last_key = None  # last key already handled.
while True:

# Create a query over all entities. Order by key,
# so that one can get to
# all elements in the store.
q = Bar.all()
q.order('__key__')

# Eliminate already handled entities by a previous iteration
if last_key:
q.filter('__key__ >', last_key)

# Try to fetch 10 elements. End loop if
# nothing was found, since we're done
batch = q.fetch(10)
if not batch:
break

# Rewrite to the store. Remember the last key for the
# next run of the loop
keys = db.put(batch)
last_key = keys[-1]


In summary, App Engine already had provisions that made it easy for me to achieve my task. That was good :-)

The bad


Since I did not want any downtime in my app, I decided to perfom some kind of open heart surgery: I uploaded the latest revision of my app to an inactive version number and would run the remote API against that version. This way, I could keep my original app running for the users and do the migration in the background. The last step would then simply be the push of a button, by which I mean flipping the main version of the app to the new code.

Little had I considered the potential implications of this approach. By the time I was ready to run the script, a few other changes had accumulated in my app. One of them was some "code cleanup", in which I eliminated a couple of properties that my models no longer needed. Other changes were experimental, and I needed to be able to undo them quickly if they proved problematic in the field. No worries, I thought, I could always downgrade to the previous code.

I uploaded the new code, connected my remote terminal, and started executing my script when it hit me: I would never be able to go back to my old version again! I had mixed new features with data migration, and one of the properties my old version relied on would be gone in the new code. Every time the script executed a batch of 10, I would loose that property on another 10 Bar entities. I could never go back if the new features did not work!

The lucky


Since there was no sense leaving the store in an inconsistent state, I allowed the script to finish its job. I then opened the developer console's data viewer to take a look at the damage done. To my big surprise, the old column was still there. Had the script failed to overwrite my entities? Apperently not, since the new column was also shown. What had happened?

Fortunately, since App Engine's API code is part of the python SDK and thus open for anyone to inspect, the mystery was easily explained. The solution lies in the way entities (the lower-level data objects) are converted into models. Take a look at this excerpt from the db module:


  @classmethod
def from_entity(cls, entity):
"""Converts the entity representation of this model to an instance.

Converts datastore.Entity instance to an instance of cls.

Args:
entity: Entity loaded directly from datastore.

Raises:
KindError when cls is incorrect model for entity.
"""
if cls.kind() != entity.kind():
raise KindError('Class %s cannot handle kind \'%s\'' %
(repr(cls), entity.kind()))

entity_values = cls._load_entity_values(entity)
instance = cls(None, _from_entity=True, **entity_values)
instance._entity = entity # THIS IS WHAT SAVED ME!!!!!
del instance._key_name
return instance


When a new model is instantiated from an entity, the Model instance remembers the original entity by storing it in the _entity field. An entity is essentially a bag of properties that can be written to the store. Upon Model.put(), each model overwrites the properties in _entity with its newly computed values. As an interesting side effect, this means that a property that no longer exists will remain unchanged in the entity that is written back to the store.

What does that mean for users of the Model API? It means that as long as one does not bypass Model by using the lower level data store directly, data for unused hidden columns will be preserved in the store. Important data will not get purged unless one writes a script that explicitly does so. What did that mean for me? It meant I just got extremely lucky!

Lessons learned



  • Higher level APIs can be a good thing! One of the most visited posts on this site is a series of articles on Hacking Google App Engine (mostly thanks to Alex Martelli and his talk at PyCon 2009). In those posts, I show how bypassing the Model API can provide leverage to do some very powerful stuff. My recent experience however also shows that APIs like Model are there for more reasons than just convenience: they can prevent users like myself from doing stupid things. I guess if the food at Google wasn't free anyway, I ought to send a box of bagels to the App Engine team :-).

  • Do not mix schema migration with new features. Had I focused on not changing a single feature with the upload but just migrating the schema, I would have needed to focus on keeping things downwards compatible. The problem I was facing only occurred because I did not isolate the new features from the changes I had to weave into the store.

  • Always backup your store before major surgery. While this sounds pretty self-evident, the fact that I did not use a tool like gae-bar before running the script shows how easily it can be forgotten. Part of the problem was certainly that I had not grasped the severity of my changes, but it certainly showed me that I should better err on the side of caution next time. While the loss of the property would not have been catastrophic, it certainly could have caused me some major headache in figuring out how to disable the experimental features in my new version. And while we're at it:

  • Make backup as easy as 1 2 3. Part of why I did not a backup was that my app did not support it. While a "download/upload snapshot" button in the developer console would have been ideal, I could have achieved something similar myself in the app. Before I work on any new features for the app, this should be a new internal task I need to do first. Users will never get to see it, but sometimes the absense of visible positive changes can still be better than the presence of new visible bugs or downtime!

2 comments:

Anil B Pai said...

i want to filter age to get values between 6 and 14.. so i used query.filter('age>=',6).filter('age<=',14) .... Got errors :( :( pls help... I guess some syntax errors in this

The App Engine Fan said...

That doesn't quite seem related to this blog post, but let's see if I can help anyhow :-)

Do you have any other filters in your query, or are those the only ones? In other words, are you by any chance violating any of the restrictions outlined here: http://code.google.com/appengine/docs/java/datastore/queriesandindexes.html#Restrictions_on_Queries

Also, can you post what the error is?