Saturday, September 27, 2008

GQL queries in SQL, part 1

Today, I want to start looking a little bit at queries. This is one of the more complicated subjects, so it will probably take a couple of posts to get it right. Here is the first one...

In this post, I will focus on getting a better understanding on how queries are expressed internally. Queries are one of the most difficult things to get right in this whole coding project -- not because they are evil black voodoo magic, but because what we use most of the time (GQL) and what the lower level APIs actually digest and return are so far apart. When we enter a GQL query on a model class, a parser translates this query into an object structure. This structure then gets pushed down a couple of layers on the stack and ends up looking something like the following (sources were heavily cut down to only show content relevant for this post!):

class Query_Filter(ProtocolBuffer.ProtocolMessage):

_Operator_NAMES = {
1: "LESS_THAN",
2: "LESS_THAN_OR_EQUAL",
3: "GREATER_THAN",
4: "GREATER_THAN_OR_EQUAL",
5: "EQUAL",
6: "IN",
7: "EXISTS",
}

def __init__(self, contents=None):
self.op_ = 0
self.property_ = []


class Query_Order(ProtocolBuffer.ProtocolMessage):

_Direction_NAMES = {
1: "ASCENDING",
2: "DESCENDING",
}

def __init__(self, contents=None):
self.property_ = ""
self.direction_ = 1


class Query(ProtocolBuffer.ProtocolMessage):

_Plan_NAMES = {
1: "ORDER_FIRST",
2: "ANCESTOR_FIRST",
3: "FILTER_FIRST",
}

def __init__(self, contents=None):
self.app_ = ""
self.kind_ = ""
self.ancestor_ = None
self.filter_ = []
self.search_query_ = ""
self.order_ = []
self.hint_ = 0
self.offset_ = 0
self.limit_ = 0
self.composite_index_ = []
self.require_perfect_plan_ = 0



So, how does a generated query object tree exactly look like? Well, let's create a simple unit test and set a break point in our datastore's _Dynamic_RunQuery:

  def testSimpleQuery(self):
helpers.create_tables([TestModel()], self.connection)
model = TestModel(text='t1', number=13)
model.put()
TestModel.gql('WHERE text=:1 and number=:2 order by text desc',
't1', 13).fetch(5)



As we look into the variables in eclipse, we get confirmation that the incoming value really is a Query instance. Here his a rough approximation of the object structure:

kind:TestModel
limit:5
offset:0

order: list with one element:
Query_Order:
property: "text"
direction: 2

filter: list with 2 elements:
Query_Filter:
op: 5
property <
name: "text"
value <
stringValue: "t1"
>
multiple: false
>
Query_Filter:
op: 5
property <
name: "number"
value <
int64Value: 13
>
multiple: false
>



In summary:

  • Search conditions (the "WHERE" clause) are expressed as a chain of "filter" conditions.

  • Sorting conditions ("ORDER BY") are expressed as a chain of "order" conditions.

  • Table name (model kind), maximum amount of search results and initial offset are fields in the top-level query object.



There are probably a few details I missed, but that will hopefully be discovered by unit tests eventually. The next step is going to be to learn more about how query results actually look like. Stay tuned...

1 comments:

Todd Hoff said...

Cool. Thanx for looking behind the vale.