PasswordData, holds information about a particular entry in the password database. Passwords are differentiated by categories, and a client can either request the list of categories or all passwords in a particular category.I was thinking about how to model the datastore side of the application, and I considered augmenting my existing Pojo with JPA annotation. It seemed like good story at first: not only do I have the same model classes on the GWT client and the server, I can even put the same data object into the store. Could it be any simpler?
After a good night's sleep however, I found the thought less and less appealing. Annotating the
PasswordData would certainly not be enough; I would also need to group the data somehow in a Category. I would need to worry about expressing entity relationships, and how to find all passwords by category, expressing queries and dealing with a lot of plumbing code. Was that really necessary? Schlüsselmeister is an hommage to KeePassX after all, and I am sure it does not need all that fluff. So I checked my local database, which contains quite a few passwords: yet, it was only 17K! Do I need all that overhead for just 17k of data?Turns out that there is a big class of applications that happen to need some way of persisting data, but for which JDO and JPA feel like overkill. Schlüsselmeister for example might need to scale to thosands of users, but each of those users has only a small, isolated data set that a request would need to operate on. So why not go ahead and make it easy to work on a single blob of data in a transactional fashion? This article suggests a couple of tools to do just that. The source code (plus javadoc and a precompiled jar file) is available under apache license at
code.google.com/p/aeftools.The basic idea
Let's start by expressing the concept of access to objects in the store under a single key:
package com.appenginefan.toolkit.persistence;
import java.util.List;
import java.util.Map;
import com.google.common.base.Function;
/**
* Represents a simple way of persisting a particular object
* type.
*/
public interface Persistence<T> {
/**
* Gets an entry from the store.
*
* @param key
* the key to look up the data from
* @return the data or null, if the store does not contain
* data
* @exception NullPointerException
* if either of the arguments is null
* @exception StoreException
* if something went wrong while loading data
*/
public T get(String key);
We have no idea what the data in our store looks like, so we cannot make any queries on its content. However, it should be possible to do a scan on a particular range of keys without too many problems:
/**
* Finds zero or more entries that are within a given
* range
*
* @param start
* a lower bound of the range of keys to look in
* (inclusive)
* @param end
* an upper bound of the range of keys to look in
* (exclusive)
* @param max
* a maximum amount of elements to return. The
* implementation of the store may choose to
* return less (for example, if a store can only
* fetch 10 elements per query, then setting a
* max of 1000 will still only return 10), but
* never more.
* @return a list of up to max key/value pairs, ordered by
* key
*/
public List<Map.Entry<String, T>> scan(String start,
String end, int max);
So, how would we get the data into the store? One of the many things I really loved in the python version of App Engine was the way transactions were done: one would pass a function to
db.run_in_transaction that made a mutation to the store. If the change could not be applied, the function would simply be called again, but on the newly refreshed data from the store. I would like to emulate this behavior in Java, so I am passing my mutator along as a Function object:/**
* Modifies an entry in the store.
*
* @param key
* the key to modify
* @param mutator
* a function that applies a change to the data
* and stores the new result. If the data does
* not exist in the store yet, null will be
* passed in. If the data should be deleted from
* the store, the function will return null.
* @return the data that was stored
* @exception NullPointerException
* if either of the arguments is null
* @exception StoreException
* if something went wrong while storing data
*/
public T mutate(String key,
Function<? super T, ? extends T> mutator);
So, how can such a store be used to store and retrieve data? For an example, take a quick look at this excerpt from
StringPersistenceTest: public void testOverwrite() {
persistence.mutate("A", Functions.constant("A"));
assertEquals("B", persistence.mutate("A", Functions
.constant("B")));
assertEquals(persistence.get("A"), "B");
}
Read/write access is reduced to a few lines of code; transactions and certain edge-cases (write new data versus overwriting existing data) are handled in a consistent fashion.
Bytes and blobs
How do we implement stores for arbitrary objects on App Engine? Let's assume for a moment that we have solved the problem for one particular type, which is an array of bytes (see class
DatastorePersistence for the implementation), so we can store an arbitrary blob of data in the store. If we have a handle on that, we can simply provide a couple of adapters for other common object types. Anything can be converted to binary data, after all. The following tool class makes such an implementation easy by simply requiring to implement two conversion methods:public abstract class MarshallingPersistence<T> implements
Persistence<T> {
private final Persistence<byte[]> backend;
protected abstract T makeType(byte[] nonNullValue);
protected abstract byte[] makeArray(T nonNullValue);
public MarshallingPersistence(Persistence<byte[]> backend) {
Preconditions.checkNotNull(backend);
this.backend = backend;
}
@Override
public T get(String key) {
byte[] asBytes = backend.get(key);
if (asBytes == null) {
return null;
}
return makeType(asBytes);
}
@Override
public T mutate(String key,
final Function<? super T, ? extends T> mutator) {
byte[] asBytes =
backend.mutate(key, new Function<byte[], byte[]>() {
@Override
public byte[] apply(byte[] arg0) {
T asType =
(arg0 == null) ? null : makeType(arg0);
T mutated = mutator.apply(asType);
if (mutated == null) {
return null;
}
return makeArray(mutated);
}
});
if (asBytes == null) {
return null;
}
return makeType(asBytes);
}
@Override
public List<Entry<String, T>> scan(String start,
String end, int max) {
List<Entry<String, T>> result = Lists.newArrayList();
for (Entry<String, byte[]> entry : backend.scan(start,
end, max)) {
T value = null;
if (entry.getValue() != null) {
value = makeType(entry.getValue());
}
result
.add(Maps.immutableEntry(entry.getKey(), value));
}
return result;
}
}
Building a new store type has just very easy, as shown here for example with the
String class:public class StringPersistence
extends MarshallingPersistence<String> {
public StringPersistence(Persistence<byte[]> backend) {
super(backend);
}
@Override
protected byte[] makeArray(String nonNullValue) {
return nonNullValue.getBytes();
}
@Override
protected String makeType(byte[] nonNullValue) {
return new String(nonNullValue);
}
}
We could even be more generic and build such a class for any serializable Java class:
public class ObjectPersistence<T extends Serializable>
extends MarshallingPersistence<T> {
public ObjectPersistence(Persistence<byte[]> backend) {
super(backend);
}
@Override
protected byte[] makeArray(T nonNullValue) {
try {
ByteArrayOutputStream buffer =
new ByteArrayOutputStream();
ObjectOutputStream out =
new ObjectOutputStream(buffer);
out.writeObject(nonNullValue);
out.close();
return buffer.toByteArray();
} catch (IOException e) {
throw new StoreException(
"Object serialization failed", e);
}
}
@SuppressWarnings("unchecked")
@Override
protected T makeType(byte[] nonNullValue) {
try {
return (T) new ObjectInputStream(
new ByteArrayInputStream(nonNullValue))
.readObject();
} catch (ClassCastException e) {
throw new StoreException(
"Object deserialization failed", e);
} catch (IOException e) {
throw new StoreException(
"Object deserialization failed", e);
} catch (ClassNotFoundException e) {
throw new StoreException(
"Object deserialization failed", e);
}
}
}
The class described above works for
Integer, String, and even the PasswordData class from my Schlüsselmeister application. For the rest of this post, I am going to explaing why I am not going to use it though, and what I have chosen instead.Protocol buffers
Converting objects into a binary format is hard. Java serialization makes it look easy, but it comes with a couple of drawbacks:
- It is hard to get right and easy to mess up. Effective Java spends quite some time explaining the pitfalls of serialization. There are more articles on the web, such as this one.
- Even if I do not make any obvious mistake, I would be worried that i mis-code my class in subtle ways that might make it hard to extend it later on and add more fields. Anyone who ever had to deal with the dreaded serial versions incompatbile errors can hopefully related to that.
- I want to be able to squeeze as much data into my free quota as I possibly can. Object serialization is not optimized to deliver the smallest byte sequences possible.
- I tie myself to the Java language. If I ever wanted to provide a generic restful API that can be used by other languages like python, I would need to come up with a different encoding format for that purpose anyway.
For these reasons, I have decided to support another type of encoding: Google's very own Protocol Buffers. Protocol buffers allow me to define my data structure in a language-independent fashion, as shown in this example:
message Person {
required string name = 1;
optional string email = 2;
}
A code generator can then be used to create classes for different languages (currently Java, Python, and C++). For example, click here to take a look at the code that was generated from the message-definition above.
Protocol buffers provide a way of encoding data that has been used by Google very successfully for a long time. Not only supports it multiple languages, it is also very easy to extend. I can easily add new optional fields to my
Person datastructure and be confident that the class will still be able to read the older versions of my data.In the next blog post of the Schlüsselmeister series, I will use protocol buffers to define a data model for my App Engine backend. I will persist the data in App Engine's data store using the
ProtocolBufferPersistence class. It will be interesting to see how I fare...
7 comments:
Hi,
Great blog !
I have also created a password manager as my first GAE application:password++
It looks easier using GWT !
Adi
Hi Adi,
Thanks for the feedback :-)
Out of curiosity: what did you use to encrypt the passwords on the client side? I have yet to implement my "Scrambler", and it would be cool if I do not have to reinvent the wheel.
Cheers,
Jens
I used : Block TEA (Tiny Encryption Algorithm)After the encryption I converted the encrypted string to longs before posting it to GAE.
Btw, in the protobuf persistence, what was the reason behind the encoding/decoding (prefixed with ':') of the keys?
> what was the reason behind the encoding/decoding (prefixed with ':') of the keys?
Excellent question :-)
It is a little known fact that all string keys in App Engine have to start with a non-numeric value. If you try to create a primary string key like "123", the API will throw an Error. A common pattern to prevent that from happening is to always prepend a non-numeric character (in this example, the colon)
Little known fact indeed ...thanks
Do you know if they have restrictions with long-id keys? (Like maybe some extra encoding/decoding internally)
>Do you know if they have restrictions with long-id keys?
I'm not aware of that, but that doesn't mean there are none. This would probably be an interesting question for the App Engine forum. If you hear anything there, can you please post it back to this discussion thread?
Thanks.
Jens
Post a Comment