All right, time for the first installment of "getting-things-to-run-in-sqlite" ;-)
Please be patient if I have to change things I'm posting today further down the line -- I am implementing this while typing, so assumptions on how things work in app engine might be wrong and may need to be revised.
Preparations:
All righty then, let's get started. I have already set up my machine by installing the following things:
- python 2.5
- sqlite and python-sqlite library (pysqlite2)
- app engine
- eclipse
- pydev
For console-based execution, I am including all app engine libraries into my python-path (setting PYTHONPATH to
:/home/jens/appengine/google_appengine:/home/jens/appengine/google_appengine/lib/yaml/lib:/home/jens/appengine/google_appengine/lib/webop:/home/jens/appengine/google_appengine/lib/django:).Next, I create a python project in eclipse. I copy in the sources from the first post into the project (in other words, the file datastore_sqlite_stub.py).
Last but not least, since I have no clue about sqlite, I read a small tutorial.
First baby steps
In my first iteration, I would like to figure out how the low-level datastore put is supposed to work, but I have to start with some plumbing work. I create a new file with utility methods (helpers.py). The first method sets up a datastore with an in-memory sqlite instance:
from datastore_sqlite_stub import DatastoreSqliteStub
from google.appengine.api import apiproxy_stub_map
from pysqlite2 import dbapi2 as sqlite
def setup_sqlite(name=None):
"""Sets up an in-memory sqlite instance and connects the datastore to it.
Args:
name: the name of the instance to connect to,
None or empty string for in-memory
Returns:
a sqlite connection object pointing to the database.
"""
if name:
connection = sqlite.connect(name)
else:
connection = sqlite.connect(':memory:')
name = 'memory'
stub = DatastoreSqliteStub(database_name=name, connection=connection)
apiproxy_stub_map.apiproxy.RegisterStub('datastore_v3', stub)
return connection
For this to work, I need to make a small adjustment to the constructor's signature in the stub:
def __init__(self, database_name, connection=None):
"""Constructor.
Initializes and loads the datastore from the backing files, if they exist.
Args:
database_name: the name of the sqlite instance
connection: a pre-initialized connection for unit tests, optional
"""
#TODO: initialize sqlite instance
pass
Now, before I do anything else, I create a simple unit-test (file "unittests.py") that tries to write something to the database:
from google.appengine.ext import db
import helpers
import unittest
class TestModel(db.Model):
text = db.StringProperty(default='some text')
number = db.IntegerProperty(default=42)
class UnitTests(unittest.TestCase):
def setUp(self):
"""Set up in-memory connection."""
self.connection = helpers.setup_sqlite()
def testWriteSingle(self):
"""Writes a single model to the database retrieves it."""
model = TestModel()
model.put()
if __name__ == '__main__':
unittest.main()
Now I put a breakpoint into the "_Dynamic_Put" method of my stub and run the test in the debugger. This will tell me what exactly the input and output values for a put are.
Digging through buffers
Turns out that the input into my method is a "PutRequest" object that looks something like this:
PutRequest: entity <
key <
app: ":self"
path <
Element {
type: "TestModel"
id: 0
}
>
>
entity_group <
>
property <
name: "number"
value <
...
The second parameter is a "PutResponse". A quick text search for PutRequest reveals that its source can be found in the SDK under google/appengine/datastore/datastore_pb.py. Turns out that both parameters are protocol buffers, Google's language-agnostic data structure (see http://code.google.com/p/protobuf/). Makes kindof sense -- if one aims to support more than one programming language and not reinvent the wheel, there has to be some point in the stack where all parts speak the same protocol. Unfortunately, that means the sourcecode is machine generated, without a lot of documentation inside. Oh well, more fun for me :-)
The constructor of of the request suggests that the object is mostly a collection of "entities" plus a transaction:
class PutRequest(ProtocolBuffer.ProtocolMessage):
def __init__(self, contents=None):
self.entity_ = []
self.transaction_ = None
self.composite_index_ = []
self.has_transaction_ = 0
self.lazy_init_lock_ = thread.allocate_lock()
if contents is not None: self.MergeFromString(contents)
I'm not going to worry about the transaction for now (that can be part of a later post) and focus on the entities. The debugger tells me that an entity looks like this:
EntityProto: key <
app: ":self"
path <
Element {
type: "TestModel"
id: 0
}
>
>
entity_group <
>
property <
name: "number"
value <
int64Value: 42
>
multiple: false...
Another protocol buffer, eh? This one can be found in "entity_pb.py" in the same directory. Let's take a look at the constructor:
def __init__(self, contents=None):
self.key_ = Reference()
self.entity_group_ = Path()
self.owner_ = None
self.kind_ = 0
self.kind_uri_ = ""
self.property_ = []
self.raw_property_ = []
self.has_key_ = 0
self.has_entity_group_ = 0
self.has_owner_ = 0
self.has_kind_ = 0
self.has_kind_uri_ = 0
self.lazy_init_lock_ = thread.allocate_lock()
if contents is not None: self.MergeFromString(contents)
Scary stuff. Well, let's make life a little bit easier here and assume that everything that starts with "has_" simply encodes if a particular parameter exists or not (has_key is 0 if no key exists and so on). In that case, our entity seems to have the following content:
- a key object (a Reference-object, another protocol buffer)
- an entity_group (Path-object, protocol buffer)
- an owner (unknown)
- a kind (numeric)
- a kind_uri (string)
- a property and a raw_property (both lists of some sort?)
In our concrete example (looking into the debugger), the object is populated as follows:
Key is
Reference: app: ":self"
path <
Element {
type: "TestModel"
id: 0
}
Entity goup just seems to be the empty Path-object
from the constructor
owner is not set.
Kind is not set.
kind_uri is not set.
Property contains two more protocol buffers:
Property: name: "number"
value <
int64Value: 42
>
multiple: false
Property: name: "text"
value <
stringValue: "some text"
>
multiple: false
raw_property is an empty list
Ok, so in summary, it looks like each model is translated into an entity; each property is (I'm sparing the reader the digging through some more nested protocol buffers, check out class PropertyValue in entity_pb) of one of the following types:
- int64
- boolean
- string
- double
- point (don't know what that means yet)
- user (don't know what that means yet)
- reference (don't know what that means yet)
So what about output parameter? To get to the proper output of the method, I could replace my stub with the in-memory implementation and debug again, but I'm way too lazy for now. Since I have the files open anyway, I might as well just peek at the PutResponse and take an educated guess ;-) Here's the constructor:
def __init__(self, contents=None):
self.key_ = []
if contents is not None: self.MergeFromString(contents)
So, it seems that all the output is is a collection of keys -- the primary keys of the objects stored in the db, to be precise. A little further digging reveals that these keys are Reference-objects, the same protocol buffer we have already seen before. Makes sense, considering that the put-method of our Model class returns the Key of the stored entity...
Next steps:
So, now I basically know what I have to do to implement a put in the database. First, I have to decide on how to model the different data types in a relational schema (as a matter of fact, I probably have to do some more digging for point, user and reference). Next, I will write a couple of more in-depth unit tests that validate the different edge cases (note: I do not need to implement get to make this work, if I use the sql-queries directly to test what data was written into the store). Then, I will work on the put-method until all the tests pass. Wish me luck!
PS:
I know my current posts do not syntax-highlight python code, and I would like to change that for the next post. Can anyone recommend a tool to better format the code? Please, do not recommend a Windows-based blogging editor, since I am using a Linux system...
2 comments:
I wrote up a two-parter on the internals of the datastore underneath the Model interface, dealing directly with the dynamic entities. You can find it here and maybe its useful or maybe its not.
I'm interested to see where you go with this.
Post a Comment