Most scalable web applications use separate systems for handling web requests and for storing data. The request handling system routes each request to one of many machines, each of which handles the request without knowledge of other requests going to other machines. Each request handler behaves as if it is stateless, acting solely on the content of the request to produce the response. But most web applications need to maintain state, whether it’s remembering that a customer ordered a product, or just remembering that the user who made the current request is the same user who made an earlier request handled by another machine. For this, request handlers must interact with a central database to fetch and update the latest information about the state of the application.
Just as the request handling system distributes web requests across many machines for scaling and robustness, so does the database. But unlike the request handlers, databases are by definition stateful, and this poses a variety of questions. Which machine remembers which piece of data? How does the system route a data query to the machine—or machines—that can answer the query? When a client updates data, how long does it take for all machines that know that data to get the latest version, and what does the system return for queries about that data in the meantime? What happens when two clients try to update the same data at the same time? What happens when a machine goes down?
As with request handling, Google App Engine manages the scaling and maintenance of data storage automatically. Your application interacts with an abstract model that hides the details of managing and growing a pool of data servers. This model and the service behind it provide answers to the questions of scalable data storage specifically designed for web applications.
App Engine’s abstraction for data is easy to understand, but it is not obvious how to best take advantage of its features. In particular, it is surprisingly different from the kind of database with which most of us are most familiar, the relational database. It’s different enough, in fact, that Google doesn’t call it a “database,” but a “datastore.”
We will dedicate the next several chapters to this important subject.
An entity has a key that uniquely identifies the object across the entire system. If you have a key, you can fetch the entity for the key quickly. Keys can be stored as data in entities, such as to create a reference from one entity to another. A key has several parts, some of which we’ll discuss here and some of which we’ll cover later.
One part of the key is the application’s ID, which ensures that nothing else about the key can collide with the entities of any other application. It also ensures that no other app can access your app’s data, and that your app cannot access data for other apps. You won’t see the app ID mentioned in the datastore API; this is automatic.
An important part of the key is the kind. An entity’s kind categorizes the entity for the purposes of queries, and for ensuring the uniqueness of the rest of the key. For example, a shopping cart application might represent each customer order with an entity of the kind “Order.” The application specifies the kind when it creates the entity.
The key also contains an entity ID. This can be an arbitrary string specified by the app, or it can be generated automatically by the datastore. The API calls an entity ID given by the app a key name, and an entity ID generated by the datastore an ID. An entity has either a key name or an ID, but not both.
Once an entity has been created, its key cannot be changed. This applies to all parts of its key, including the kind and the key name or ID.
The data for the entity is stored in one or more properties. Each property has a name and at least one value. Each value is of one of several supported data types, such as a string, an integer, a date-time, or a null value. We’ll look at property value types in detail later in this chapter.
A property can have multiple values, and each value can be of a different type. As you will see in Multivalued Properties, multivalued properties have unusual behavior, but are quite useful for modeling some kinds of data, and surprisingly efficient.
It’s tempting to compare these concepts with similar concepts in relational databases: kinds are tables; entities are rows; properties are fields or columns. That’s a useful comparison, but watch out for differences.
Unlike a table in a relational database, there is no relationship between an entity’s kind and its properties. Two entities of the same kind can have different properties set or not set, and can each have a property of the same name but with values of different types. You can (and often will) enforce a data schema in your own code, and App Engine includes libraries to make this easy, but this is not required by the datastore.
Also unlike relational databases, keys are not properties. You can perform queries on key names just like properties, but you cannot change a key name after the entity has been created.
And of course, a relational database cannot store multiple values in a single cell, while an App Engine property can have multiple values.