Saturday, February 20, 2010

Jack and the magical, evergrowing taskqueue

an App Engine fairytale



(Based on a true story, somehow. Names have been altered, so that I'm not embarrassing myself too much ;-)

Once upon a time there was a farmer named Jack. Jack's life on the server farm had always been good: he had a little parcel to grow his crop on, and because his land was quite small and did not take up a lot of space, he did not even need to tithe to the lords of the server farms.

The land where Jack's farm grew was governed by friendly wizards. They had set up some rules on how the crop was to be grown, in order to be able to sustain as many people on the farms as possible. There were countless other farmers, just like Jack, and while the variety of crop was great, they all flourished on the land that was given to them. The wizards had also devised several tools to make life easier for the people. They built a huge storage where farmers could bring their crops and retrieve them whenever needed. They also had a bellboy who could be asked to remind farmers to water their field, or do other maintenance related jobs. However, the most wonderful of their contraptions was the magical taskqueue. The magical task queue was a system operated by an ancient race of invisible gnomes. If someone needed a farm hand, he would write the tasks to be done on pieces of paper and pile them up next to the field. The gnomes would then pick them up and execute them, until the stack was empty.

One day, Jack noticed that the pile of unfinished tasks next to his field started groing and groing. Usually, it was not taller than a couple of sheets, but now it had grown to almost the height of his knee. He was completely mystified -- had the magical gnomes fallen ill? He pulled out the admin console, a powerful device the wizards had bestowed upon him. The admin console enabled the farmers to learn and see things about their crop they would otherwise not know. Through its magical eye, he peeked at the stack. He saw a flurry of activity: hundreds of hands where rushing to take the yellow stickies from the pile, yet at the same time, even more invisible forces kept piling additional jobs onto the heap. Luckily, Jack knew one of the wizards responsible for the gnomes, and he cried out for help. How could it be that a small farmer like himself could generate more work then the gnomes could handle? Something was afoul!

The wizard had heard these kind of complaints before, and he knew that it usually was an error by the farmer. "Jack, have you made sure that you configured your queue correctly? There are a couple of things to look out for, like how many tasks a minute the gnomes are allowed to take from the stack. Please double-check your configuration and let me know what happened." Jack had trust in the wizard's advice, and he looked at his settings. As it turned out, there were a couple of tweaks he could do that made the gnomes pick up more tasks. Surely that had to be it!

After several hours of waiting, the pile had grown even more in size. It was now the hight of a small child. Jack went back to to ask for more help. This was unbearable, he cried out. Obviously the gnomes must be doing something wrong! The wizard was a patient man, and he listened calmly to the ranting of the upset farmer. After a few minutes, when the yelling and screaming had subsided, he turned towards his fellow man. "My dear Jack," he responded, "how do you expect me to see anything in that mess beside your field? You have all kind of tasks mixed together in a single pile. How do you expect me to know what's causing your problems -- the watering of the plants, the sowing of new seeds, or the harvest? Go ahead and split up your tasks into several piles. Then, leave the field alone for a day and tell me what happened." Jack was quite annoyed by the wizard's response, but he did what he'd been told. He took the different kinds of tasks and put them into multiple queues. It was a hard job, but at the end of the day, everything was neatly going into separate piles. Totally exhausted, he fell asleep next to the field.

The next morning, Jack got up early to take a look. Out of his five piles of tasks, four had been reduced to less than a dozen stickies. The fourth one however was as big as a grown man by now. He went back to the wizard to report. The wizard took another look and confirmed Jack's findings. However, he also found something that Jack had overlooked. "Most of the tasks on your queue are very old and have been in and out of the queue again and again. You can see it at the head of the note: the retry-counts are through the roof. Looking at the very admin console you hold in your hands, I also see that these tasks are constantly sending "301 redirects" -- in other words, they tell the gnomes to put the paper right back onto the stack. Can you tell me what's up with that"?

The discovery of the sage hit Jack like a lightning bolt. Hadn't he lately done a change to the way his crop was grown, to make things more efficient? Could it be that he had made a mistake? Now knowing what to look for, he discovered the problem within a few minutes. Fixing it was a bit harder, but after a couple hours of work, the problem was rooted out of a system. It took only a few more hours to watch the man-sized pile of tasks grow back to the size of a small child, then a melon, and finally a pile not higher than the tiniest of mice.

So what is the moral of the story? Don't make a mess of your task queue, check closely your admin-console -- and if something is wrong and you get a friendly wizard's help, don't behave like a Jack@$$ ;-)

Saturday, February 13, 2010

ServiceAsync? We don't need no stinking...

Hosting a web app on App Engine is not only easy, it is also free -- provided that one can manage to stay in the quota limits that are allotted. One way of achieving this goal: let the browser do as much as possible on his own.



Let's take for example Google's gdata APIs. There is a ton of information available out there for the developer, but getting to it from the browser can sometimes be tricky. For security reasons, Javascript puts limitations on calls to arbitrary web services out there, which is why some apps route their requests through a server-side component (and that's exactly what we are trying to avoid). Getting by without the server makes life a bit more complicated -- or does it? Thanks to the newest Google Web Toolkit, and a couple of open source extensions, building an in-browser application that uses rich data apis has become significantly easier. Let's give it a try and build an browser-only app that sends a user query to Google base and displays the results in our application.

Preparations


We are using the latest Eclipse plugins for Google App Engine (which also contains the Google Web Toolkit) and build a new standard web application. Next, we download the open-source library gwt-data and place it in our WEB-INF/lib folder. We also add that library to the class path of our project, and make it known to the gwt compiler by placing the line
<inherits name='com.google.gwt.gdata.GData' />
in our .gwt.xml file.

Next, we need to sign up for a data api key, which we store in a constant called GDATA_API_KEY. We then open our
main class (whatever implements EntryPoint in the standard google web project) and add code to initialize the API (in this case Google Base):

import com.google.gwt.core.client.EntryPoint;
import com.google.gwt.gdata.client.GData;
import com.google.gwt.gdata.client.GDataSystemPackage;
import com.google.gwt.user.client.Window;
import com.google.gwt.user.client.ui.Label;
import com.google.gwt.user.client.ui.RootPanel;

public class MyApp implements EntryPoint {

private static final String GDATA_API_KEY = "MYDATA KEY";
private static final String APPLICATION_NAME = "MY APP NAME";

/**
* This is the entry point method.
*/
public void onModuleLoad() {
if (!GData.isLoaded(GDataSystemPackage.GBASE)) {
GData.loadGDataApi(GDATA_API_KEY, new Runnable() {
public void run() {
startApplication();
}
}, GDataSystemPackage.GBASE);
} else {
startApplication();
}
}

private void startApplication() {
if (!GData.isLoaded(GDataSystemPackage.GBASE)) {
Window.alert("GData could not be initialized");
}
// The following is an example and will depend
// on your UI classes
RootPanel.get("application").add(new SearchPanel(APPLICATION_NAME));
}
}



This code is pretty much boilerplate to bootstrap the application. It initializes Gdata, and the brings a SearchPanel onto the screen. The SearchPanel is where the magic happens -- we are going to implement it in this blog post:

UIBinder


If I had to point to a single feature that I like best about GWT 2.0, it would be UIBinder. If anyone still remembers my first experience with GWT, then you might know that I spent a considerate amount of time and code to plug a set of components together to make my application appear on the screen. With UIBinder, I simply use XML that pretty much resembles an html layout.

For example let's assume that my SearchPanel should contain three elements: a text box to enter the search query, a button to trigger the search, and a text area to show the response. The following XML file completely sets this up for me, including css styling:

<!DOCTYPE ui:UiBinder SYSTEM "http://dl.google.com/gwt/DTD/xhtml.ent">
<ui:UiBinder xmlns:ui="urn:ui:com.google.gwt.uibinder"
xmlns:g="urn:import:com.google.gwt.user.client.ui">
<ui:style>
.text {
fornt-style: italic;
}
.important {
font-weight: bold;
}
</ui:style>
<g:HTMLPanel>
<span class="{style.text}">Search for:</span>
<g:TextBox visibleLength="30" ui:field="query" />
<g:Button styleName="{style.important}"
ui:field="button" text="Google Products Search"/>
<br/>
<g:TextArea ui:field="results"
characterWidth="80" visibleLines ="50" />
</g:HTMLPanel>
</ui:UiBinder>


The entire layout is defined in this tiny snippet of XML. If I need access to a particular element from Java code, I can add a ui:field name to it in the XML. Our text box is called query, our button is called button, and out text area is called results. We will see in just a moment how these names are reflected in Java code.

It should be pointed out that the eclipse plugin has a template for uibinder (just like "new class" or "new interface", there is also a "new UiBinder" wizard). This makes it very easy to get started on such a component. Most of the boilerplate shown below was generated by Eclipse for us.

And the Java code?


Let's take a look at the Java code that belongs to our uibinder template:

package com.appenginefan.nookbook.client;

import com.google.gwt.core.client.GWT;
import com.google.gwt.event.dom.client.ClickEvent;
import com.google.gwt.gdata.client.gbase.GoogleBaseService;
import com.google.gwt.gdata.client.gbase.SnippetsEntry;
import com.google.gwt.gdata.client.gbase.SnippetsFeed;
import com.google.gwt.gdata.client.gbase.SnippetsFeedCallback;
import com.google.gwt.gdata.client.gbase.SnippetsQuery;
import com.google.gwt.gdata.client.impl.CallErrorException;
import com.google.gwt.uibinder.client.UiBinder;
import com.google.gwt.uibinder.client.UiField;
import com.google.gwt.uibinder.client.UiHandler;
import com.google.gwt.user.client.ui.Composite;
import com.google.gwt.user.client.ui.TextArea;
import com.google.gwt.user.client.ui.TextBox;
import com.google.gwt.user.client.ui.Widget;

/**
*
* A panel where books can be searched and results displayed.
*
* @author Jens Scheffler
*
*/
public class SearchPanel extends Composite {

private static final String URI = "http://www.google.com/base/feeds/snippets";

private static SearchPanelUiBinder uiBinder = GWT
.create(SearchPanelUiBinder.class);

interface SearchPanelUiBinder extends UiBinder<Widget, SearchPanel> {
}

private final GoogleBaseService service;

@UiField TextBox query;
@UiField TextArea results;

public SearchPanel(String applicationName) {
initWidget(uiBinder.createAndBindUi(this));
this.service = GoogleBaseService.newInstance(applicationName);
}

@UiHandler("button")
void search(ClickEvent event) {
SnippetsQuery gdataQuery = SnippetsQuery.newInstance(URI);
gdataQuery.setMaxResults(25);
gdataQuery.setBq(query.getText());
results.setText("Searching...");
service.getSnippetsFeed(gdataQuery, new SnippetsFeedCallback() {
@Override
public void onFailure(CallErrorException caught) {
results.setText("Call to Google Base failed:\n" + caught);
}
@Override
public void onSuccess(SnippetsFeed result) {
showData(result.getEntries());
}
});
}

private void showData(SnippetsEntry[] entries) {
if (entries.length == 0) {
results.setText("You have no items.");
} else {
StringBuilder output = new StringBuilder();
for (SnippetsEntry entry : entries) {
output.append(entry.getTitle().getText());
output.append("\n");
}
results.setText(output.toString());
}
}
}



As we can see, there are two fields that are annotated with the @UiField annotation:

@UiField TextBox query;
@UiField TextArea results;


The names of the fields match the according label in our xml file. UiBinder will automatically match the fields with the according xml tags and link our GWT code to the html. Notice also that we have not defined such a link to the button from our file. Instead, we have defined a method search and used the UiHandler annotation to let GWT know that this method should be called when the button is hit.

Within the search method, we perform our ajax api query: We get the query string out of our text field and submit it as a SnippetsQuery to the gdata service. If we get a result back from the server, we call our showData method to populate the results area accordingly.

Wednesday, February 10, 2010

The art of the unobtrusive tools

Today, version 1.3.1 of the App Engine SDK was released. A lot of great improvements went into it, especially regarding the datastore. Amongst the many powerful new features is also the python release of AppStats, a rpc instrumentation tool that I mentioned a couple of months ago.

While we are waiting with bated breath for the Java version, it is worth taking a closer look at the python implementation. AppStats is the poster child of an extremely useful library that, if done wrong, could have made life very uncomfortable for its users. First of all, it can collect statistics upon every rpc within every http request in the app -- that is a lot of data! Where will this data be stored? If one used the datastore,that would mean there that all those statistics would count against one's quota. Also, if the library happened to use an entity name that a user's application also used, data might get corrupted. An alternative might be to just use memcache -- but wouldn't that potentially collide with cached entries of the user's app? And depending on how big that data is, won't it fill up my entire memcache?

Looking into recording.py from the SDK reveals a couple of very interesting implementation details. The first two can be found in the method _save, which stores recorded data for later analysis:

  def _save(self):
part, full = self.get_both_protos_encoded()
key = make_key(self.start_timestamp)
errors = memcache.set_multi({config.PART_SUFFIX: part,
config.FULL_SUFFIX: full},
time=36*3600, key_prefix=key,
namespace=config.KEY_NAMESPACE)
if errors:
logging.warn('Memcache set_multi() error: %s', errors)
return key, len(part), len(full)



First of all, the data is apparently encoded using protocol buffers. Protocol buffers are not only language-neutral (one could read the binary data out in C++ or Java, for example), their encoding is also designed to produce a very small binary representation (thus minimizing the storage requirements). On top of that, when storing the data to memcache, the programmer sets a namespace property for the key. Thus, as long as nobody used the same namespace in their code (according to recording.py, it is '__appstats__'), stats written to the store will not overwrite user data.

The other interesting aspect is in the make_key method that produces the memcache key:

def make_key(timestamp):
distance = config.KEY_DISTANCE
modulus = config.KEY_MODULUS
tmpl = config.KEY_PREFIX + config.KEY_TEMPLATE
msecs = int(timestamp * 1000)
index = ((msecs // distance) % modulus) * distance
return tmpl % index



It turns out that the sampling size of how much data is held in memcache is very well controlled. make_key uses the timestamp to create a hash, using a set of division and modulo operations. This way, only a certain amount of values (KEY_MODULUS is 1000) will be remembered. Also, to prevent that a burst of activity over a short period of time eliminates the possibility to watch the samples over a longer time period, requests that come in too close to each other map to the same key.

If you have ever used a profiler to find bottlenecks in a desktop app, you know the performance penalty one usually pays for the extra information. Thanks to a couple of very smart design decisions, the penalty for app stats is very small. I am looking forward to reading about testimonials of people using it in the real world :-)

Sunday, February 7, 2010

Blast from the past

It's been about 4 months since I had logged into my blogger acount to write anything, or even check if anyone had left comments on past articles. To those who had left comments with questions: I apologize, I'll try to get answers to you as quickly as possible.

What kept me from posting? Nothing much really: I was focusing on some things at work; there was a longer family vacation around Christimas, and I was a bit busy working on the latest edition of a (German) beginner's book on Java. The fifth edition just launched a few days ago, and I'd like to thank and congratulate my co-authors (especially Dietmar Ratz, without whom this would have been impossible!). My focus this time was on the "online add-on", a collection of chapters about things like annotations or unit tests, and a load of programming examples about the subjects covered in the book. It was an interesting blast from the past, using things like LaTeX for setting text again, and it reminded me how much fun it actually is to write, assumed one has an interesting topic to write about.

This week, I will do my best to catch up on the comments that I missed over my extended break from blogging. Once that is done, I hope I can think of a couple of interesting things to write about. I've got enough ideas for a couple of articles, but it can never hurt to have a huge backlog, so feel free to put suggestions into comments to this article.