Tuesday, April 15, 2008

Multiplexing and Namespaces: Running multiple apps on one App Engine instance

An application is a terrible thing to waste. Since we are currently in the preview period of App Engine, I can only create three instances to host my code. When I recently looked at my Dashboard, I realized that my URL Shortlinker was hardly using up any of the quota. This happened for three reasons that I am sure apply to many other people out there, as well:
  • it was not particularly popular (low amount of shortcuts)
  • it was optimized for performance (as far as I knew how to do that)
  • it was a simple application that did not need much resources to begin with
Now, there are certainly some apps out there that have to worry about running out of quota, but I'd like to bet that for many others (like me), the opposite is true. So, if one has a couple of apps that do not use a lot of space, what is the best way to put them to work?

One possibility to utilize ones resources would be to run several applications on the same instance. You have a shortlinker, a guestbook, a site counter and a random fortune generator? Mix them all together, call them "site tools" and run them as one App Engine app. Unfortunately, the merge process can become a little complicated (what if the apps use Model classes or scripts with the same name but different content?), so this can be a little bit tougher to achieve. I am convinced one could do a little python-magic to have each individual app in a sub-directory of its own and let one root script dispatch the calls, but that's a little bit beyond my skills for now (more on this idea further down below).

A second trick would be to abuse the major versions. I could upload two different applications to the same application id, one with major version "1" and the other with major version "2". Both should be reachable in parallel, so as long as I keep the names of the data objects separate, I should be good to go.

Example:
The following is the code for a very simple Link scrubber:
import wsgiref.handlers

from google.appengine.ext import webapp

class MainPage(webapp.RequestHandler):
def get(self):
if self.request.get('url'):
self.redirect(self.request.get('url'))
return
self.response.out.write('Link scrubber')

def main():
application = webapp.WSGIApplication([('/.*', MainPage)], debug=True)
wsgiref.handlers.CGIHandler().run(application)

if __name__ == "__main__":
main()

You can try it out by going to http://2.2.aef.appspot.com/ -- which is the very same application (aef) that I am using to host my URL shortlinker. The shortlinker just happens to be version 1 of the program and the redirector (although a completely different program) is defined as version 2.

While this is certainly feasible, I do not necessarily recommend this approach. Major versions are there for a good reason, and a little short-term gain here might turn out to become a headache later if any of these apps becomes very popular. With every bugfix for the redirector I upload, the minor version is going to change. I cannot bind an external Google Apps domain to it. In other words, the limitations are rather severe.


The third options is something that can often be done with very little code changes and can open up an application to more users: while it is not that easy to run several applications on one instance, it is possible to run the same application multiple times by using the domain as a namespace.

One single App Engine application can be bound to an arbitrary amount of domains. Within the application, a simple call to os.environ['SERVER_NAME'] will reveal what namespace we are currently working in. If we are smart about partitioning our data space accordingly, we can run a single service that will seem unique to each of our users.

Let's take a concrete example: download the simple wiki example from code.google.com. It is actually a very nice wiki, but it does not have separate namespaces. Assumed we had three different products (shortcuts, redirector, webhosting) and wanted to build a wiki for each of them on appengine, our three app-ids would be gone. Or would they?

If you take a look at the main script, wiki.py, you will see that there are actually only two places where data is loaded from the database or written to the database:

def save(self):
...
entity['name'] = self.name
...
datastore.Put(entity)

def load(name):
...
query = datastore.Query('Page')
query['name ='] = name
entities = query.Get(1)


Let's ignore for now that the developer is using a slightly different API than models and GQL. What we see is that the code uses the "name" of a wiki-entry as a unique key for loading and storing data in the store. By simply putting our domain name into the code, we can create a distinct namespace for as many wikis as we like, provided that we bind them under different URLs:

import os
def GetDomain():
return os.environ['SERVER_NAME']

...

def save(self):
...
entity['name'] = GetDomain() + "|" +self.name # CHANGED
...
datastore.Put(entity)

def load(name):
...
query = datastore.Query('Page')
query['name ='] = GetDomain() + "|" + name # CHANGED
entities = query.Get(1)

With very little modifications, we now have an application that can be used by many people in parallel without the need to upload multiple instances. Our utilisation of that single instance simply went through the roof. This will also work if the code is slightly more complicated, as seen in this example. However, the higher the level of complexity for a particular app is, the higher the chance to mess up these kind of modifications.

Conclusion:

Even without any code modifications, it is possible to use one application id to run multiple applications in parallel. Using the domain as a namespace, it is also possible to partition a database to have the same app host distinct data for each user. Both are methods to increase utilization of one's app, but both are rather crude. I believe there is a better way, but I do not know the codebase well enough yet to be sure. Here is what makes me think there is:
  • the wiki in this example uses a simpler persistence api (datastore), and it looks like both GQL and the Model class are using that internally, too (just look into __init__.py in the google/appengine/ext/db-folder of your SDK). Depending on how the higher-level classes are using the lower-level datastore, one might be able to monkeypatch it to separate by domain or sub-application automatically. If that is true, there would be no namespace conflicts in the datastore for running separate apps on the same instance.
  • the SDK is written in python. At some point, there is an entry-level class that, depending on the content of the app.yaml, decides which of an application's scripts to execute. If that code could be generalized, one could have a set of applications, each in a subdirectory, with a simple top-level script that just makes the decision which subfolder's app.yaml to use.
Depending on how long the preview period of App Engine will take, it might be worth investigating this further. If anyone has looked into this already, please let me know.

0 comments: