Sunday, July 27, 2008

Great Unittesting article

I saw a great article on "App Engine Guy's" blog. He uses in-memory stubs for the datastore to write very concise unit tests. Check it out -- it's worth a read. Btw: if you find certain APIs (like memcache) missing from his example -- they are stubbed out in exactly the same manner!

If you are interested in other techniques to unit test, you might also want to check out

  • GAEUnit, a framework to turn unit tests into browser-callable handlers.
  • pymox, another mocking framework alternative to the python mocker I have been using.
  • nose-gae, a Nose-plugin that adds app engine support. Personally, I have had some trouble with the tool, but that might very well just have been me...

Sunday, July 20, 2008

A matter of trust

Building an application based on Google App Engine is great in many ways: it is fun to do, it scales well, and it is open to a huge amount of users. If you have a Gmail account, you can log into an App Engine app -- it's as simple as that!

What can we do however if we want to be a bit more selective? For example, assume that you have built a small tool that you would like to share only with family and friends. How could you prevent other, unauthorized people, to just gain access to your app?

One solution to the problem is store a list of permitted users in a list (either hardcoded or in a database). This will work if you know exactly who the selected few are. but is also means that you have to administer the list and keep it up to date. A more generic system would be an invitation based access like gmail originally had, but that would mean one would also need to manage those invitations somehow in the database. The following example shows a middle ground -- a simple technology called HMAC to make sure a particular google account is actually supposed to have access.

The following code creates a cryptographic hash for given username. It uses a convenience-method from the cryptutil module that is part of the OpenID sample code. In order to do so, the app uses a secret key that is never revealed externally. Only the generated hash will be handed out to the end user:

import base64
from openid import cryptutil

def sign(username):

# In real life, choose a better secret!
secret='superSecretKey'
return base64.encodestring(
cryptutil.hmacSha1(secret, username))




So how does the validation work in practice? Let's look at a very simple request handler that makes use of it:

import cgi
import os
import urllib
import wsgiref.handlers

from google.appengine.api import users
from google.appengine.ext import webapp

class MainPage(webapp.RequestHandler):

def isValid(self):
"""Determines if a given request is valid.

To fulfill the criteria, a user has to be logged in
and either be administrator or pass along a signature
parameter that matches the given username.
"""
user = users.get_current_user()
if not user:
return False
if not users.is_current_user_admin():
expected_signature = sign(user.email().lower())
given_signature = self.request.get('signature')
if not given_signature == expected_signature:
return False
return True



Our validation method checks whether a given request has a signature-parameter. If it does, it will compare this with the hash computed from the current user's email address. Only if these two keys match will it consider the user to have access.

The following get and post methods build a very simple application that displays an important message to invited users only. By entering another user's email address, an invited user can have the system create a signed access url that he or she can pass along to another, authorized user:

  def get(self):
"""Display a message for invited people only."""

# Require a login
user = users.get_current_user()
if not user:
self.redirect(users.create_login_url(self.request.uri))
return

# If user is not admin, require a valid signature
if not self.isValid():
self.error(403)
return

# Ok, the user is allowed to see this page!
# In a real app, this is where we would
# render a template ;-)
self.response.out.write("""
<html><body>
<h1>Hello, %s</h1>
<p>The secret message you have been waiting for:
&nbsp;<b>APP ENGINE ROCKS!!!</b>
</p>
<form method="POST">
Pass on the message to
<input name="invitee"/> &nbsp;
<input type="submit"/></form>
<p><a href="%s">Log out</a></p>
</body></html
""" % (cgi.escape(user.nickname()),
users.create_logout_url(self.request.uri)))

def post(self):
"""Create a URL to pass the message on."""

# If user is not admin, require a valid signature
if not self.isValid():
self.error(403)
return

# Get the name of the invitee
invitee = self.request.get('invitee')
if not invitee:
invitee = 'anonymnous'
invitee = invitee.lower()

# create the signature and return the url
signature = sign(invitee)
self.response.out.write("""
<html><body>
The invite-url is
http://%s?signature=%s</body></html>""" % (
os.environ['SERVER_NAME'],
urllib.quote(signature, '')))




What might seem like a toy on first sight is actually a simplified example of what has many practical applications in todays web apps: the question on how to verify that a particular web request came from a trusted source. For a more complete algorithm on how to make almost any get or post secure, check out the OAuth specification, which explains a simple generic algorithm on how to sign web requests using HMAC. App Engine contains all the building blocks for utilizing this technique and using it to making ones products more secure. So, please do :-)

Sunday, July 13, 2008

Duh -- It's Python 2.5

My wife and I love the Virtual Console feature of the Nintendo Wii. We have spent many hours playing classics like Donkey Kong, Golden Axe, and especially Mario 64. Some time ago, we were trying to get a particularly tricky star that required you to find 8 red coins on a snowy mountain. After over half an hours, we still had only seven coins together and no clue where to find number eight, so we decided to do a quick Google search. This is what we found:



Somebody had uploaded a video of him or her waltzing through the entire level in a little bit more than thirty seconds. In other words, because this particular player had skills and was familiar with the area, he or she was able to achieve something that a "pair" of less experienced people had not been able to do in sixty times the amount of effort! I guess, sometimes "knowing your stuff" really makes a difference.

I was recently reminded of this at work when I was chatting with somebody else about my progress in learning python. I am mostly a Java guy, I said, so there were a couple of things getting used to. I was doing ok nowadays, but there were still things I wished the language had. A ternary operator, for example.

"Yeah, I've heard of this before," my colleague responded. "A friend once told me about a workaround. Did you know that you can just combine or and and to do the same in python? Take a look at the following:"

>>> True and 1 or 2
1
>>> False and 1 or 2
2



At first, I was amazed -- why had I not thought of this option myself? The idom of "condition and true-value or false-value" seemed to fit perfectly. If the condition was true, the logical and would evaluate the "true-value"; if it was false, it would skip the second argument and directly proceed to the "false-value". Euphoria set in, but it was only short-lived: the idom is actually flawed! Take a look at the following snippet:

def part2(s):
return s.startswith('part1=') and s[6:].strip() or 'part2'



This snippet defines a very simple parser that is supposed to interpret a string in the format of "part1=somevalue" and return the right-hand-side of the equation. If the string does not match the pattern, it should return part2 as a default.

If we fire up a python shell and test out the method, we expect the following to happen:

>>> part2('part1=3')
'3'
>>> part2('x')
'part2'
>>> part2('part1= ')
''



Unfortunately, the third input returns part2 instead. What went wrong? The problem is that a construct of logical operators is not equal to the ternary operator! A ternary operator "condition?trueValue:falseValue" is expected to behave as following:

  • Evaluate condition

  • If condition is true, return trueValue

  • If condition is false, return falseValue



Using the idiom condition and trueValue or falseValue however behaves more like this:

  • If condition, compute condition and trueValue

  • If condition and trueValue is true, return condition and trueValue

  • If condition and trueValue is false, return falseValue



In other words, the content of trueValue influences the overall result. If trueValue happens to be evaluated as false (which is the case for None-values and, as in this case, the empty string), our idiom will return the wrong value!

By this point in the post, you might be wondering why this whole episode made me think of Mario 64, but I'm getting to it right now: my buddy and I were both reasonably smart people, and were puzzled at how we could spend so much time thinking about this problem and running into a wall. Somebody else must have had the same issue before! We did a quick google search and this is what we came up with:

On 9/29/2005, Guido decided to add conditional expressions in the
form of "X if C else Y". [1]

The motivating use case was the prevalance of error-prone attempts
to achieve the same effect using "and" and "or". [...]



In other words, with python 2.5, there actually was a ternary operator! This is how to use it

def part2(s):
return s[6:].strip() if s.startswith('part1=') else 'part2'



I had learned python from copying-and-pasting snippets and checking a reference book that was still on 2.4 level. If I had known the area a little better, I could have saved myself quite a bit of time...

Wednesday, July 2, 2008

Four ways to insult a nose

A bit more than two weeks ago, I suggested a small tweak to help keep user-specific data private. A few days ago, I was asked: "Based on the appengine docs, why not use the 'validator' class method, or is that a different thing?" While responding, I tried to figure out if one way was preferable over the other. I consider myself rather pragmatic, but my gut response (do whatever works) did not really cover the advantages or disadvantages of the different approaches. Thus, I have decided to post this follow-up and come up with as many distinct approaches as I could find, taking a page out of Cyrano's book and listing them by category. Bear in mind that my python is still not quite to standard yet, so unlike his over a dozen ways to insult a nose, I can only come up with four. Feel free to post other alternatives as a comment. Same is true if you see any coding errors or simpler ways to accomplish the same goal.




The basic challenge:


How can we modify the following model to throw a BadValueError when data for the wrong user is retrieved?

class SSN(db.Model):
user = db.UserProperty()
ssn = db.StringProperty(required=True)



Hackish: Monkeypatch the UserProperty


Python is a wonderful language in many ways -- one is that a programmer can modify the contract under which a class operates at runtime with only a few lines of code. The following modification replaces the validate-method of db.UserProperty to raise an error if the user does not match:

if not getattr(db.UserProperty,'old_validate', None):
original = db.UserProperty.validate
db.UserProperty.old_validate = original
def validate(self, value):
value = original(self, value)
if value != users.get_current_user():
raise db.BadValueError(
'Property must be the current user')
return value
db.UserProperty.validate = validate



Monkey patching is a powerful tool, because it allows us to bend and extend frameworks to fit out needs. Quite a few of the open source projects out there that make web frameworks work in App Engine heavily utilize this technique. Unfortunately, there are many caveats, as for example described in this wikipedia article. I would recommend to stay away from this and only use it as a last resort. And if one uses it: be more elegant with your patches than I am. This patch is an all-or-nothing approach; I cannot choose to activate it for certain models and deactivate it for others.


Classic OOP: subclass the property


In this version, we create a new subclass, CurrentUserProperty, that overwrites the validate-method. We then modify the model to use the subclass instead of the original UserProperty:

# Enforce the "current user"
class CurrentUserProperty(db.UserProperty):
def validate(self, value):
value = super(CurrentUserProperty, self).validate(value)
if value != users.get_current_user():
raise db.BadValueError(
'Property %s must be the current user' % self.name)
return value

# Data model
class SSN(db.Model):
user = CurrentUserProperty()
ssn = db.StringProperty(required=True)


There are really not a lot of drawbacks to this method, except maybe that one has to change the code for all models that need to use the new subclass. Some people in the favor-composition-over-inheritance crowd (what's that? check out this short description) crowd may cringe at the sight of an actual subclass, but I believe this is actually one of the few perfectly adequate cases where subclassing is philosophically permissible. After all, the CurrentUserProperty really is-a UserProperty.

Nevertheless, thanks to the great designers of the App Engine libraries, people who favor association can easily do so as well. Check out approach #3:


Associative: provide a validator


The idea of this code is to create a custom validation method, checkCurrentUser, and pass it along as an argument in the property's constructor. The validate-method of the Property will detect its presence and execute it on top of the regular validation logic:

# Enforce the "current user"
def checkCurrentUser(value):
if value != users.get_current_user():
raise db.BadValueError(
'Property must be the current user')
return value

# Data model
class SSN(db.Model):
user = db.UserProperty(validator=checkCurrentUser)
ssn = db.StringProperty(required=True)



Better than subclassing or worse? You bet the judge, but I think (in this particular case), it is mostly a matter of taste. Both cases require us to modify the model, so there is no gain in effort either way. Personally, I prefer having a separate classname for readability reasons, but that's just me...


Aspect oriented: Black Magic, python style


I don't know if the consultant crowd has already moved on to greener pastures, but AOP used to be the craze only a few years ago. The idea (please don't flame this blog if I oversimplify) is to take certain "concerns", like logging or transaction control, out of the application logic and "weave" it in separately. We can do the same thing for user control: the following method, enforce_user, will detect all UserProperties in a Model class and patch their validation methods to enforce the current user:

# Define the weaving-code
def enforce_user(modelclass):
for name, prop in modelclass.properties().items():
if isinstance(prop, db.UserProperty) and \
not getattr(prop, 'old_validate', None):
original = prop.validate
def validate(value):
value = original(value)
if value != users.get_current_user():
raise db.BadValueError(
'Property %s must be the current user' % name)
return value
prop.old_validate = original
prop.validate = validate
return modelclass

# Enhance the data model
enforce_user(SSN)



Some out there might say that this is only glorified monkeypatching, and they may be right. Then again, unlike my original hack, we can turn patching on or off for any single model. And if we ever happen to have class decorators available to us, we can even make it look cool by just decorating our model as following:

@enforce_user
class SSN(db.Model):
user = db.UserProperty()
ssn = db.StringProperty(required=True)



In summary


As you can see, there are many ways to skin a cat (what a ghastly idiom by the way! my beloved "Princess" heavily objects!!!). What's the best way to get the job done? I don't know, but I have a gut feeling: do whatever works best for you...