Sunday, December 7, 2008

Gotcha

Every now and then, one runs into something he or she would not have expected. This was recently the case for me, and since it turned out to a typical case of RTFM, I thought I'd share it with the rest ;-)

I had not used entity groups a lot in the past, but I recently ran into a situation where they would have been useful (I wanted to write several objects in one transaction). Everything seemed to work just fine -- until I tried to retrieve the objects from the store. The following script reproduces what happened to me:

import os
from google.appengine.api import datastore_file_stub
from google.appengine.api import apiproxy_stub_map
from google.appengine.ext import db

class TestModel(db.Model):
text = db.StringProperty()

# Set up an in-memory db store
os.environ['APPLICATION_ID'] = 'test'
stub = datastore_file_stub.DatastoreFileStub('test', None, None)
apiproxy_stub_map.apiproxy.RegisterStub('datastore_v3', stub)

# Store a model without parent in the db
model1 = TestModel(key_name='the key', text='t1')
key = model1.put()
assert 't1' == TestModel.get(key).text
assert 't1' == TestModel.get_by_key_name('the key').text

# Store a second model with the same key_name,
# but with a parent
parent = TestModel().put()
model2 = TestModel(parent=parent, key_name='the key', text='t2')
key = model2.put()
assert 't2' == TestModel.get(key).text
assert 't2' == TestModel.get_by_key_name('the key').text,\
'Text is %s' % TestModel.get_by_key_name('the key').text


The very last line of this script fails. Here is the error that
it produces:

Traceback (most recent call last):
File "keytest.py", line 27, in <module>
'Text is %s' % TestModel.get_by_key_name('the key').text
AssertionError: Text is t1


What exactly had gone wrong? Why did the second model not overwrite the first one? The answer to this question can be found in the documentation:

The keys of two different entities can have similar parts as long as at least one part is different. For instance, two entities can have the same kind and name if they have different parents. Similarly, two entities can have the same parent (or no parent) and name if they are of different kinds.


In other words, just because I specify a key_name in a model's constructor, does not necessarily mean that I can look this model up with get_by_key_name -- at least if I do not know the parent. Like in this example, there can be several independent entities that all have the key as key but are not the same. One can (and probably should) use the key name for fast lookup (it certainly beats GQL where available), but know that if we choose to do so, this might limit some of our flexibility in partitioning the data using ancestors.

In retrospect, I probably should have known. App Engine is by far not the first store that got me into trouble for using the ids for more than they were intended to (in a "past life", I had an experience not unlike in this Hibernate article). I guess it's true: Those who do not remember their past are condemned to repeat their mistakes.

PS: This also means that the gae-sqlite project is going to need a schema overhaul at some point, since my store currently discards the entity group entirely...

1 comments:

Arachnid said...

Yup, a key (as opposed to a key name or ID) is an ordered list of (kind, id_or_name) tuples. You can look up an entity 'by name' or 'by 'id' only if you know all the ids or names in the path.

And as far as gae-sqlite goes, you can continue to throw away the entity group part of the protobuf, as long as you're retaining the entirety of the key. As it stands, the entity group will always be the first tuple of the key.