Sunday, September 14, 2008

AppEngine and SQLite: getting values out of the database

Today, we are going to focus on implementing the _Dynamic_Get method of our SQLite-based datastore implementation (well, at least for the few data types we currently support). At the end of the article, I will also briefly address the what-happens-to-this-code-once-it's-done question I have been asked a few times.

How does a get work?


Let's take a look at the implementation from the file-based datastore:

  def _Dynamic_Get(self, get_request, get_response):
for key in get_request.key_list():
app = self.ResolveAppId(key.app())
key.set_app(app)
last_path = key.path().element_list()[-1]

group = get_response.add_entity()
try:
entity = self.__entities[app, last_path.type()][key]
except KeyError:
entity = None

if entity:
group.mutable_entity().CopyFrom(entity)



Apparently, the request-protocolbuffer contains a list of keys. The method iterates through these keys and does a lookup for each of them in the in-memory datastructure. If the key points to an existing protocol buffer, the result is copied into a list of entities contained in the response-protocolbuffer. Otherwise, no value (an empty group) is added. Since None does not contain any particular information about what key it belongs to, it has to be assumed that the order in which the elements are added to the response must correspond to the order of the keys.

Prep-work


This is test driven development, so let's go ahead and write a unit test first. Come to think of it, let's make it four:

  def testGetSingleElement(self):
"""Gets a single model from the datastore."""
helpers.create_tables([TestModel()], self.connection)
model1 = TestModel(number=1)
model2 = TestModel(number=2, text='#2')
model3 = TestModel(number=3)
key1 = model1.put()
key2 = model2.put()
key3 = model3.put()
fetched = TestModel.get(key2)
self.assertEquals(2, fetched.number)
self.assertEquals('#2', fetched.text)

def testGetSingleElementByCustomKey(self):
"""Gets a single model from the datastore with a string key."""
helpers.create_tables([TestModel()], self.connection)
model1 = TestModel(number=1)
model2 = TestModel(key_name='custom', number=2, text='#2')
model3 = TestModel(number=3)
key1 = model1.put()
key2 = model2.put()
key3 = model3.put()
fetched = TestModel.get_by_key_name('custom')
self.assertEquals(2, fetched.number)
self.assertEquals('#2', fetched.text)


def testGetMultipleElements(self):
"""Gets a several models from the datastore."""
helpers.create_tables([TestModel()], self.connection)
model1 = TestModel(number=1)
model2 = TestModel(number=2)
model3 = TestModel(number=3)
key1 = model1.put()
key2 = model2.put()
key3 = model3.put()
fetched = TestModel.get([key2, key1])
self.assertEquals(2, len(fetched))
self.assertEquals(2, fetched[0].number)
self.assertEquals(1, fetched[1].number)

def testGetOrInsert(self):
"""tests if get_or_insert works correctly"""
helpers.create_tables([TestModel()], self.connection)
model1 = TestModel(number=1)
model1.put()
model2 = TestModel.get_or_insert('foo', number=13, text='t')
self.assertEquals(13, model2.number)
fetched = TestModel.get_by_key_name('foo')
self.assertEquals(13, fetched)



Right now, all those tests fail miserably, but that was to be expected. However, by the end of this article, most of them should work!


Implementation


Let's take the original datastore method and convert it into SQLite.

Warning: this is a naiive and non-performant approach that should be improved in a later version! (I will address this issue a little bit later in the article).

As you might remember from the last article, every model class (or entity kind to be precise) is represented by a separate table in the SQL database. The name of the table equals the entity's kind. Every property is represented by a column, its type being "baked" into its name (a string property named foo would thus be represented as string_foo). Primary keys are either stored in a column called pk_int (for integer primary keys) or pk_string (for custom keys). In other words, the equivalent to the implementation above would be to iterate through all the primary keys, load their values into memory and translate them into an entity protocol buffer. The query that we use to fetch the data depends on the primary key: if it is a "name" (custome string key), query on pk_string, otherwise on pk_int. The following code accomplishes that goal:

  def _Dynamic_Get(self, get_request, get_response):
# Fetch a cursor from the connection
cursor = self.connection.cursor()

for key in get_request.key_list():

# Populate an entity with a clone of the key
entity = entity_pb.EntityProto()
result_key = entity.mutable_key()
result_key.CopyFrom(key)
result_key.set_app('sql-app')
key = result_key
group = get_response.add_entity()

# Build and execute the SQL query
tablename = key.path().element_list()[0].type()
last_path = key.path().element_list()[-1]
if last_path.has_name():
query = 'SELECT * FROM %s WHERE pk_string=?' % tablename
param = str(last_path.name())
else:
query = 'SELECT * FROM %s WHERE pk_int=?' % tablename
param = long(last_path.id())
cursor.execute(query, [param])
data = cursor.fetchone()
if not data: # nothing found
continue
keys = [metadata[0] for metadata in cursor.description]

# Convert the data from the cursor into a map of values
keyvals = dict([(keys[i], data[i]) for i in range(len(keys))])
keyvals.pop('pk_string')
keyvals.pop('pk_int')

# Populate the entity and store it in the response
dictToEntity(keyvals, entity)
entity.mutable_entity_group().CopyFrom(key.path())
group.mutable_entity().CopyFrom(entity)

# TODO: do I need to close the cursor? Does anyone know?


Before I talk about the missing method dictToEntity, let me explain why this is not a very efficient implementation. Suppose you call this method with two-hundred primary keys, each of them for the same entity kind. In this case, this implementation would fire two-hundred queries to the database. However, the same result could easily be achieved with a single query like SELECT * FROM modelName where pk_string='name1' or pk_string='name2'.... A better performing implementation would consider this accordingly.

Now, without further ado, the code that maps the values from the database into the protocol buffer:

def dictToEntity(values, pb):
"""Helper, transfers values from a dictionary into a protocol buffer
This method will only deal with the properties of the entity,
not its primary key.

Args:
values: a dctionary of values
pb: the entity protocol buffer
"""
for key, value in values.items():

# Split up the key (like int64_name) into segments
i = key.find('_')
if i < 1:
continue
p1 = key[0:i]
p2 = key[i+1:]

# Case: type integer
if p1 == 'int64':
prop = pb.add_property()
prop.set_name(p2)
prop.set_multiple(False)
prop.mutable_value().set_int64value(long(value))

# Case: type string
elif p1 == 'string':
prop = pb.add_property()
prop.set_name(p2)
prop.set_multiple(False)
prop.mutable_value().set_stringvalue(value)


What's next


Well, now that we have implemented the get-method, let's take a look at the results of our unit tests:

Importing test modules ... done.

Checks if our TestModel can be converted into a valid SQL table. ... ok
Gets a several models from the datastore. ... ok
tests if get_or_insert works correctly ... FAIL
Gets a single model from the datastore. ... ok
Gets a single model from the datastore with a string key. ... ok
Writes a value into the database twice. ... ok
Writes a single model to the database retrieves it. ... ok

======================================================================
FAIL: tests if get_or_insert works correctly
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/jens/appengine/sqlite/src/unittests.py", line 102, in testGetOrInsert
model2 = TestModel.get_or_insert('foo', number=13, text='t')
File "/home/jens/appengine/google_appengine/google/appengine/ext/db/__init__.py", line 862, in get_or_insert
return run_in_transaction(txn)
File "/home/jens/appengine/google_appengine/google/appengine/api/datastore.py", line 1393, in RunInTransaction
result = function(*args, **kwargs)
File "/home/jens/appengine/google_appengine/google/appengine/ext/db/__init__.py", line 857, in txn
entity = cls.get_by_key_name(key_name, parent=kwds.get('parent'))
File "/home/jens/appengine/google_appengine/google/appengine/ext/db/__init__.py", line 779, in get_by_key_name
return get(*keys)
File "/home/jens/appengine/google_appengine/google/appengine/ext/db/__init__.py", line 974, in get
entities = datastore.Get(keys)
File "/home/jens/appengine/google_appengine/google/appengine/api/datastore.py", line 209, in Get
_MaybeSetupTransaction(req, keys[0])
File "/home/jens/appengine/google_appengine/google/appengine/api/datastore.py", line 1468, in _MaybeSetupTransaction
tx.handle)
File "/home/jens/appengine/google_appengine/google/appengine/api/apiproxy_stub_map.py", line 46, in MakeSyncCall
stub.MakeSyncCall(service, call, request, response)
File "/home/jens/appengine/sqlite/src/datastore_sqlite_stub.py", line 91, in MakeSyncCall
assert response.IsInitialized(explanation), explanation
AssertionError: ['Required field: handle not set.']

----------------------------------------------------------------------
Ran 7 tests in 0.045s

FAILED (failures=1)



Most of it looks great -- hurray :-) Only problem is that the get_or_insert method isn't quite doing its job yet. Turns out that this method requires transactions to work. Sounds like a fun project for our next article...

Parting words for the week.


I would like to thank Dirk for pointing me to Pygments for sourcecode highlighting. It looks like a good choice, and I will give it a try in the next blog post!

I have been asked if I am going to make the source of this project available in its entirety. The answer is yes, but only once it's done. The world is flooded with too many half-baked projects, so I'd rather wait until I am further along. Once that happens, I will do one of the following things:


  • I will dump the code somewhere for download under Apache license and never look back.

  • I will have found an open source project where this is a good fit and that has a license compatible with Apache and donate the code.

  • I will have found a couple of passionate developer who would like to make this code compatible with more databases and start it as a standalone project.



Either way, this is not going away -- don't worry. Just give me a few more weeks to actually implement something worth releasing :-)

2 comments:

Bill said...

Great blog. If you are in the SF bay area, would you be interested in speaking at an App Engine developer meetup? You can wear a hood and voice scrambler to remain anonymous :)

The App Engine Fan said...

> You can wear a hood and voice
> scrambler to remain anonymous :)

:-D
That sounds like fun, but this blog is not a cloak and dagger operation. I'm not handing out my email to avoid being spammed, but that's pretty much it with regards to secrecy. My name is Jens, I'm a Googler (not in the core App Engine team though), and I've been living in the bay area (Santa Clara) since summer 2007.

As far as the meetings are concerned; I'm aware of them but I think there are people with more to say than me. What subject did you have in mind?

Btw, are you talking about these meetups?
http://web.meetup.com/116/