Sunday, February 22, 2009

Higher level hooks

I recently got a mail from a fellow App Engine enthusiast regarding my series on using hooks in Google App Engine. He had played with the low-level hooks, and they had worked for him in some situations. However, he also needed model-specific hooks; higher-level functions that he could use only on a specific class of models. Common use cases would be additional validation before a "foo"-Model is saved, or updating counters specific to only one Model class with special properties. While this was achievable on the protocol buffer level, it was a little bit cumbersome and not overly developer friendly (my own interpretation here, he did not write it like that).

I fully agree that there has to be a better way, so let's talk about it in this post. My fellow developer is right: lower-level hooks are a tool for advanced hacking and should only be used when everything else fails. For most use cases, working with the higher-level APIs that the App Engine team gave us should be just fine. If it is mostly about checking a particular property, providing a custom validator for that field should be enough (see this previous article from my blog). In this post, let's take a shot at a different approach. First, we build a very simple script that introduces a new class, HookedModel, and a concrete simple example that uses of it:

# Setup code, not needed in a real App Engine app
from google.appengine.api import apiproxy_stub_map
from google.appengine.api import datastore_file_stub
import os
os.environ['APPLICATION_ID'] = 'test'
stub = datastore_file_stub.DatastoreFileStub('test', None, None)
apiproxy_stub_map.apiproxy.RegisterStub('datastore_v3', stub)

# A Model with preread/write hooks
from google.appengine.ext import db

class HookedModel(db.Model):
"""A subclass of model that provides hooks for extra checks."""

def pre_write(self):
"""Called before a model is written to the store."""
pass

def post_read(self):
"""Called after a model is read from the store."""
pass


class TestModel(HookedModel):
"""A model that uses hooks:
Upon save/load, it prints out the content of a property.
"""
text = db.StringProperty(default='some text')

def pre_write(self):
print 'Writing %s' % self.text

def post_read(self):
print 'Reading %s' % self.text


save_this = TestModel()
key = save_this.put()

load_this = TestModel.get(key)
print 'Done :-)'


Our new Model class has two hooks, pre_write and post_read. Our subclass TestModel provides actual implementations. Right now, there is nothing that would trigger these hooks actually being used, so the only output we see on the screen is this:

Done :-)


So, how can we change that? As mentioned before, the Model framework is actually the highest layer on a stack of APIs. When saving a model to the store, the Model class has to actually translate the object into the lower-level data format. This operation is done in a method called _populate_internal_entity, which we will override in our HookedModel:

  def _populate_internal_entity(self, *args, **kwds):
"""Introduces hooks into the entity storing process."""
self.pre_write()
return db.Model._populate_internal_entity(self, *args, **kwds)


If we run our script again, we see that we have made some progress:

Writing some text
Done :-)


Now for the second hook. My first instinct was to replace the class-method from_entity, but that proved to be beyond my python skills. The problem was in a check that the original method did:

  @classmethod
def from_entity(cls, entity):
if cls.kind() != entity.kind():
raise KindError('Class %s cannot handle kind \'%s\'' %
(repr(cls), entity.kind()))

entity_values = cls._load_entity_values(entity)
instance = cls(None, _from_entity=True, **entity_values)
instance._entity = entity
del instance._key_name
return instance


Since I did not want to replicate code, my replacement method would simply call Model.from_entity(entity) -- which would result in a KindError being thrown. Fortunately, a second look at the implementation revealed that there was an easier way: when constructing the new model instance, from_entity would set an argument called _from_entity to true when calling the constructor. _from_entity is "intentionally undocumented" in the Model constructor, but if I had to take an educated guess, I would assume that it is set to true whenever a Model is constructed from the lower-level Entity object in the API stack. Based on this assumption, we should be able to create our own custom constructor of HookedModel:

  def __init__(self, *args, **kwds):
"""Introduces hooks into the entity loading process."""
db.Model.__init__(self, *args, **kwds)
if kwds.get('_from_entity', False):
self.post_read()


If we run our test another time, we finally get the intended result:

Writing some text
Reading some text
Done :-)


Let's take a quick look at the final HookedModel

class HookedModel(db.Model):
"""A subclass of model that provides hooks for extra checks."""

def pre_write(self):
"""Called before a model is written to the store."""
pass

def post_read(self):
"""Called after a model is read from the store."""
pass

def _populate_internal_entity(self, *args, **kwds):
"""Introduces hooks into the entity storing process."""
self.pre_write()
return db.Model._populate_internal_entity(self, *args, **kwds)

def __init__(self, *args, **kwds):
"""Introduces hooks into the entity loading process."""
db.Model.__init__(self, *args, **kwds)
if kwds.get('_from_entity', False):
self.post_read()


Not only have we added a generic way of executing code before writing to (or after reading from) the store, we have done it using standard object oriented techniques. No need to hack or monkeypatch! This shows again what a terrific job the App Engine team has done in providing an extensible set of base classes that we as the end user can customize and make useful in ways we see fit. Doing this is important but not always easy (see this talk by Joshua Bloch for some of the finer philosophical details). It is well worth it though, and I think it will help keeping the user community happy and inspire us to come up with many new cool ways of putting App Engine to use.

Friday, February 13, 2009

Blub paradox

In my last post, I mentioned some of the pitfalls I personally ran into when I started working with App Engine. Being a python newbie didn't really help either ;-) The one thing however I probably should have stressed a little more in all of this is: it gets better over time. As I keep working on python based projects, familiarity and coding speed increase. Having a great community of developers available on forums doesn't hurt either, plus the App Engine team continuously keeps cranking out great usability improvements (just look at the recently released SDK 1.1.9 and you know what I mean). Not only that -- they actually listen to the community and do whatever they can to identify and address common pain points. It is fun to use a platform that is well supported, has a huge fan base and still keeps getting better! Having said that, I need to find something else to complement my App Engine skills, and here is why:

While all these upcoming new and exciting features are really nice, I would like to take a step back today and focus a bit more on one of the basic issues that a developer may run into when switching languages and frameworks: the Blub paradox. Blub, a fictive language discussed in a classic Paul Graham essay, represents a language XYZ that a programmer is used to work in. For me, this has been Java in the past, but it could be pretty much anything else. The Blub paradox states that programmers are "satisfied with whatever language they happen to use, because it dictates the way they think about programs." As a corollary, when having to work in a different language, programmers tend to mentally code in what worked best in "Blub" instead of what works best in the new language. It's like trying to speak Spanish by translating English sentences word by word: something will come out of it, but it is not really ideal.

I recently ran into the paradox myself when I gave bad advice to a fellow developer who pinged me with a question:


Which would you say would be preferred way of implementing the following.

I want to instantiate a subclass of db.Model from a dictionary of values, and I also want to make sure that an instance has valid values for a subset of required fields before saving the object. Should I:

1) Have an __init__ method on the subclass that accepts a dictionary (assuming that is allowed) and have the property fields declared as required=true and set the values from the dictionary in the constructor, or

2) As is usually true for Java for persistent classes, allow for zero-argument constructors, and implement a static create() method on the class to create the object from the dictionary and a static add() method on the db.model subclass that checks for required fields before saving the instance via a put(), or

3) Something completely different, perhaps involving the validation capabilities of properties.


The question (and especially approach number 2) pretty much demonstrates the early stages of switching frameworks: I have problem XYZ and a couple of approaches how to solve this in a language A, but I need to use language B for this project. How do I translate the code?

I have to admit that my initial response to the question was not particularly bright: rather than questioning the rational behind it, I simply responded with a slight tweak to the code translation:

I would probably go with an __init__ method with an optional parameter, something like

def __init__(self, from_dict={})
for key, value in from_dict.items():
setattr(self, key, value)



While this looked like it should work, the devil turned out to be in the details. A few hours later, I got another mail:


This turned out to be much trickier than I thought, especially for a beginning Python programmer.

What ended up working was:

def __init__(self, plistDict=None, **kwargs):
if plistDict != None:
for key, value in plistDict.items():
kwargs[key] = value

db.Model.__init__(self, **kwargs)

What made this so tricky was that the implementation of db.Model.__init__ REQUIRES that the values of any properties be in kwargs.

Just thought I'd pass that on.


At this point in time, it hit me like a ton of bricks: I had fallen into the trap! The App Engine team contains some of the brightest minds in the python world -- they would never build a framework that makes such a simple task so hard!

After a little extra research I realized that in order to make his use case work, my fellow programmer actually needed exactly zero lines of customization: support for dictionaries was already baked into the standard Model constructor:


someInstance = SomeDbModelSubclass(**myDictionary)


The double asterisks, as described in the Python documentation allows me to inject arbitrary key/value pairs as arguments into the constructor. To be precise, here is the quote from the documentation:


A function call always assigns values to all parameters mentioned in the parameter list, either from position arguments, from keyword arguments, or from default values. If the form "*identifier" is present, it is initialized to a tuple receiving any excess positional parameters, defaulting to the empty tuple. If the form "**identifier" is present, it is initialized to a new dictionary receiving any excess keyword arguments, defaulting to a new empty dictionary.


If we construct a Model instance in App Engine like

MyModel(prop_1=13, prop_2='hello world')

then the Model constructor actually uses this very same mechanism to populate properties. It is a standard feature of the python language, beautifully utilized in App Engine's API -- and completely novel to anyone who has only done Java code before! I just got hit by the blub paradox!

Having run into this made me wonder: how much unnecessary code do I have in my other python-based programs? Have I reinvented the wheel? I have decided that it does not matter too much, as long as my code works and I learn from my mistakes. It does show me however that I should keep learning new languages or frameworks on a regular basis. By getting introduced to new concepts I can broaden my perspective and produce cleaner and better code. By just doing "what I've always done", I will stagnate and become a less effective programmer.

So, what should the next thing be? I currently tend towards GWT, because I think it is really cool and fits well into the program-web-apps-with-App-Engine theme. I'm open towards recommendations though. Pleast post comments.

Sunday, February 1, 2009

Don't panic

It is said that despite its many glaring (and occasionally fatal) inaccuracies, the Hitchhiker's Guide to the Galaxy itself has outsold the Encyclopedia Galactica because it is slightly cheaper, and because it has the words "Don't Panic" in large, friendly letters on the cover.

A little while ago at the Cloud Connect, I got into a conversation with a fellow developer who was ramping up on building an App Engine application. Like me, he originally came from a Java+SQL background, and we had an interesting chat on how to convert some data modelling concepts onto the datastore in a performant fashion. At some point, he asked me the following question:
What are the top three pieces of advice that you would give a fellow Java developer when ramping up on App Engine with python?
The following is my response:

1: Don't panic (I actually said "don't be afraid", but that would be less of a catchy phrase for a blog post, wouldn't it? ;-)
2: Kiss your IDE goodbye
3: Write lots of unit tests

So, why did I pick those three?

Don't panic:
Learning a new framework takes always a certain rampup time in which one is less productive. Learning a new framework plus a new language can be even more frustrating. For a seasoned developer who has gone through this a couple of times, this is probably old news. However, as Java has been taught at colleges worldwide for a decade now, chances are that there is quite a segment of developers out there who started out with Java and never had the need to switch to anything else. Trying out something new can be difficult in the beginning, but it really is worth it. Even if web applications were to suddenly disappear, the knowledge I gained around python during the process would still prove extremely valuable in my job. It is a great tool to have around for many tasks that require to put something together quickly with little overhead.

Kiss your IDE goodbye:
For my development on App Engine, I use Eclipse with pydev. It serves many purposes like debugging or making the code easier to read through syntax highlighting. However, it seems to lack the one feature that I love most of all: code completion.

When I type code into my editor, I want the development environment to guess for me what I'd like to put in next. Say you work with a couple of languages (Java and Python) and have to jump back and forth. You are familiar enough with them to use them, but do not know exactly know all the common libraries by heart. So, assume you have a string variable foo and convert it to upper case. What's the method? foo.toUpper()? foo.upper()? foo.to_upper()? foo.upcase()?

Many languages have the same type of features, but there are subtle differences in how they are named. If one works very intensely with one language most of your days, these kind of things are no-brainers. However, if one always has to go to documentation to look up these small things, that
accumulates to a lot of time wasted. Good autocompletion reduces the time for lookups and the propability of typos in these cases. Now, I have been told that there are alternatives our there that have significantly improved capabilities, and I am really looking forward to giving those a try at some time. I would imagine that they would still be lightyears behind the tools I have for Java though, for a couple of simple reasons:
  • the Java tools have been around much longer, so there is some catching up to do
  • the python tools have a much harder job at "guessing" because of the way the language work: how should I guess method names on "foo" when I cannot extract the object's type out of its definition ("def bar(foo):")?
  • python does not get compiled, which means there are less "compiler warnings" that could get displayed to me -- less opportunity to make me aware of issues that might blow up in my face once I try to actually run the code.
Write lots of unit tests
Typos are my worst enemy in python, since my editor does not necessarily tell me about them. They hide in my code, patiently waiting for their five seconds of fame once the code gets executed. If the typo is in a branch that rarely runs (such as how to handle certain errors that do not happen often), a typo might stay in hibernation for a long time, until it annoys the heck out of my end users (it wouldn't be the first time that Finagle's law sneaks up on me). For that very reason, I urge everyone to test your code. Test it much and often. Write automatic unit tests (see my article from last year, or also this farily recent summary of several other posts on this subject). Develop a habit of writing code in small increments and adding unit tests immediately after (or, if you can manage to be even more thorough, try out test-driven development). By the way: this is also a great habit to develop in other languages, including Java. Make faulty code blow up on the local development box -- not in front of your customers!

In summary...
So, there you have it: my top three pieces of advice for newbies switching to App Engine. They are not perfect, but it's what I believe. What I would be interested in is: what would you have answered if you had been asked the same question? Please post your response to this thread...