Chapter 1. Introduction to SQLAlchemy

What Is SQLAlchemy

SQLAlchemy is a Python Library created by Mike Bayer to provide a high-level, Pythonic (idiomatically Python) interface to relational databases such as Oracle, DB2, MySQL, PostgreSQL, and SQLite. SQLAlchemy attempts to be unobtrusive to your Python code, allowing you to map plain old Python objects (POPOs) to database tables without substantially changing your existing Python code. SQLAlchemy includes a database server-independent SQL expression language and an object-relational mapper (ORM) that lets you use SQL to persist your application objects automatically. This chapter will introduce you to SQLAlchemy, illustrating some of its more powerful features. Later chapters will provide more depth for the topics covered here.

If you have used lower-level database interfaces with Python, such as the DB-API, you may be used to writing code such as the following to save your objects to the database:

sql="INSERT INTO user(user_name, password) VALUES (%s, %s)"
cursor = conn.cursor()
cursor.execute(sql, ('rick', 'parrot'))

Although this code gets the job done, it is verbose, error-prone, and tedious to write. Using string manipulation to build up a query as done here can lead to various logical errors and vulnerabilities such as opening your application up to SQL injection attacks. Generating the string to be executed by your database server verbatim also ties your code to the particular DB-API driver you are currently using, making migration to a different database server difficult. For instance, if we wished to migrate the previous example to the Oracle DB-API driver, we would need to write:

sql="INSERT INTO user(user_name, password) VALUES (:1, :2)"
cursor = conn.cursor()
cursor.execute(sql, 'rick', 'parrot')

SQL Injection Attacks

SQL injection is a type of programming error where carefully crafted user input can cause your application to execute arbitrary SQL code. For instance, suppose that the DB-API code in the earlier listing had been written as follows:

sql="INSERT INTO user(user_name, password) VALUES ('%s', '%s')"
cursor = conn.cursor()
cursor.execute(sql % (user_name, password))

In most cases, this code will work. For instance, with the user_name and password variables just shown, the SQL that would be executed is INSERT INTO user(user_name, password) VALUES ('rick', 'parrot'). A user could, however, supply a maliciously crafted password: parrot'); DELETE FROM user;--. In this case, the SQL executed is INSERT INTO user(user_name, password) VALUES ('rick', 'parrot'); DELETE FROM user; --', which would probably delete all users from your database. The use of bind parameters (as in the first example in the text) is an effective defense against SQL injection, but as long as you are manipulating strings directly, there is always the possibility of introducting a SQL injection vulnerability into your code.

In the SQLAlchemy SQL expression language, you could write the following instead:

statement = user_table.insert(user_name='rick', password='parrot')
statement.execute()

To migrate this code to Oracle, you would write, well, exactly the same thing.

SQLAlchemy also allows you to write SQL queries using a Pythonic expression-builder. For instance, to retrieve all the users created in 2007, you would write:

statement = user_table.select(and_(
    user_table.c.created >= date(2007,1,1), 
    user_table.c.created <  date(2008,1,1))
result = statement.execute()

In order to use the SQL expression language, you need to provide SQLAlchemy with information about your database schema. For instance, if you are using the user table mentioned previously, your schema definition might be the following:

metadata=MetaData('sqlite://') # use an in-memory SQLite database
user_table = Table(
    'tf_user', metadata,
    Column('id', Integer, primary_key=True),
    Column('user_name', Unicode(16), unique=True, nullable=False),
Column('email_address', Unicode(255), unique=True, nullable=False),
    Column('password', Unicode(40), nullable=False),
    Column('first_name', Unicode(255), default=''),
    Column('last_name', Unicode(255), default=''),
    Column('created', DateTime, default=datetime.now))

If you would rather use an existing database schema definition, you still need to tell SQLAlchemy which tables you have, but SQLAlchemy can reflect the tables using the database server’s introspection capabilities. In this case, the schema definition reduces to the following:

users_table = Table('users', metadata, autoload=True)

Although the SQLAlchemy SQL expression language is quite powerful, it can still be tedious to manually specify the queries and updates necessary to work with your tables. To help with this problem, SQLAlchemy provides an ORM to automatically populate your Python objects from the database and to update the database based on changes to your Python objects. Using the ORM is as simple as writing your classes, defining your tables, and mapping your tables to your classes. In the case of the user table, you could perform a simple mapping via the following code:

class User(object): pass
mapper(User, user_table)

Notice that there is nothing particularly special about the User class defined here. It is used to create “plain old Python objects,” or POPOs. All the magic of SQLAlchemy is performed by the mapper. Although the class definition just shown is empty, you may define your own methods and attributes on a mapped class. The mapper will create attributes corresponding to the column names in the mapped table as well as some private attributes used by SQLAlchemy internally. Once your table is mapped, you can use a Session object to populate your objects based on data in the user table and flush any changes you make to mapped objects to the database:

>>> Session = sessionmaker()
>>> session = Session()
>>> 
>>> # Insert a user into the database
... u = User()
>>> u.user_name='rick'
>>> u.email_address='rick@foo.com'
>>> u.password='parrot'
>>> session.save(u)
>>> 
>>> # Flush all changes to the session out to the database
... session.flush()
>>> 
>>> query = session.query(User)
>>> # List all users
... list(query)
[<__main__.User object at 0x2abb96dae3d0>]
>>> 
>>> # Get a particular user by primary key
... query.get(1)
<__main__.User object at 0x2abb96dae3d0>
>>> 
>>> # Get a particular user by some other column
... query.get_by(user_name='rick')
<__main__.User object at 0x2abb96dae3d0>
>>> 
>>> u = query.get_by(user_name='rick')
>>> u.password = 'foo'
>>> session.flush()
>>> query.get(1).password
'foo'

As you can see, SQLAlchemy makes persisting your objects simple and concise. You can also customize and extend the set of properties created by SQLAlchemy, allowing your objects to model, for instance, a many-to-many relationship with simple Python lists.

The Object/Relational “Impedance Mismatch”

Although a SQL database is a powerful and flexible modeling tool, it is not always a good match for the object-oriented programming style. SQL is good for some things, and object-oriented programming is good for others. This is sometimes referred to as the object/relational “impedance mismatch,” and it is a problem that SQLAlchemy tries to address in the ORM. To illustrate the object/relational impedance mismatch, let’s first look at how we might model a system in SQL, and then how we might model it in an object-oriented way.

SQL databases provide a powerful means for modeling data and allowing for arbitrary queries of that data. The model underlying SQL is the relational model. In the relational model, modeled items (entities) can have various attributes, and are related to other entities via relationships. These relationships can be one-to-one, one-to-many, many-to-many, or complex, multientity relationships. The SQL expression of the entity is the table, and relationships are expressed as foreign key constraints, possibly with the use of an auxiliary “join” table. For example, suppose we have a user permission system that has users who may belong to one or more groups. Groups may have one or more permissions. Our SQL to model such a system might be something like the following:

CREATE TABLE tf_user (
        id INTEGER NOT NULL, 
        user_name VARCHAR(16) NOT NULL, 
        email_address VARCHAR(255) NOT NULL, 
        password VARCHAR(40) NOT NULL, 
        first_name VARCHAR(255), 
        last_name VARCHAR(255), 
        created TIMESTAMP, 
        PRIMARY KEY (id), 
         UNIQUE (user_name), 
         UNIQUE (email_address));
CREATE TABLE tf_group (
        id INTEGER NOT NULL, 
        group_name VARCHAR(16) NOT NULL, 
        PRIMARY KEY (id), 
         UNIQUE (group_name));
CREATE TABLE tf_permission (
        id INTEGER NOT NULL, 
        permission_name VARCHAR(16) NOT NULL, 
        PRIMARY KEY (id), 
         UNIQUE (permission_name));
-- Relate the user and group tables
CREATE TABLE user_group (
        user_id INTEGER, 
        group_id INTEGER, 
	 PRIMARY KEY(user_id, group_id),
         FOREIGN KEY(user_id) REFERENCES tf_user (id), 
         FOREIGN KEY(group_id) REFERENCES tf_group (id));
-- Relate the group and permission tables
CREATE TABLE group_permission (
        group_id INTEGER, 
        permission_id INTEGER, 
	 PRIMARY KEY(group_id, permission_id),
         FOREIGN KEY(group_id) REFERENCES tf_group (id), 
         FOREIGN KEY(permission_id) REFERENCES tf_permission (id));

Notice the two auxiliary tables used to provide many-to-many joins between users and groups, and between groups and users. Once we have this schema in place, a common scenario is to check whether a particular user has a particular permission. In SQL, we might write:

SELECT COUNT(*) FROM tf_user, tf_group, tf_permission WHERE
	  tf_user.user_name='rick' AND tf_user.id=user_group.user_id
	  AND user_group.group_id = group_permission.group_id
	  AND group_permission.permission_id = tf_permission.id
	  AND permission_name='admin';

In a single statement, we join the three entities—user, group, and permission—together to determine whether the user “rick” has the “admin” permission.

In the object-oriented world, we would probably model the system quite differently. We would still have users, groups, and permissions, but they would probably have an ownership relationship between them:

class User(object):
    groups=[]

class Group(object):
    users=[]
    permissions=[]

class Permission(object):
    groups=[]

Suppose we wanted to print out a summary of all of a given user’s groups and permissions, something an object-oriented style would do quite well. We might write something like the following:

print 'Summary for %s' % user.user_name
for g in user.groups:
    print '  Member of group %s' % g.group_name
    for p in g.permissions:
        print '    ... which has permission %s' % p.permission_name

On the other hand, if we wanted to determine whether a user has a particular permission, we would need to do something like the following:

def user_has_permission(user, permission_name):
    for g in user.groups:
        for p in g.permissions:
            if p.permission_name == 'admin':
                return True
    return False

In this case, we needed to write a nested loop, examining every group the user is a member of to see if that group had a particular permission. SQLAlchemy lets you use object-oriented programming where appropriate (such as checking for a user’s permission to do something) and relational programming where appropriate (such as printing a summary of groups and permissions). In SQLAlchemy, we could print the summary information exactly as shown, and we could detect membership in a group with a much simpler query. First, we need to create mappings between our tables and our objects, telling SQLAlchemy a little bit about the many-to-many joins:

mapper(User, user_table, properties=dict(
    groups=relation(Group, secondary=user_group, backref='users')))
mapper(Group, group_table, properties=dict(
    permissions=relation(Permission, secondary=group_permission, 
                         backref='groups')))
mapper(Permission, permission_table)

Now, our model plus the magic of the SQLAlchemy ORM allows us to detect whether the given user is an administrator:

q = session.query(Permission)
rick_is_admin = q.count_by(permission_name='admin',
... user_name='rick')

SQLAlchemy was able to look at our mappers, determine how to join the tables, and use the relational model to generate a single call to the database. The SQL generated by SQLAlchemy is actually quite similar to what we would have written ourselves:

SELECT count(tf_permission.id) 
FROM tf_permission, tf_user, group_permission, tf_group, user_group 
WHERE (tf_user.user_name = ? 
	  AND ((tf_permission.id = group_permission.permission_id 
	  AND tf_group.id = group_permission.group_id) 
	  AND (tf_group.id = user_group.group_id 
	  AND tf_user.id = user_group.user_id))) 
	  AND (tf_permission.permission_name = ?)

SQLAlchemy’s real power comes from its ability to bridge the object/relational divide; it allows you to use whichever model is appropriate to your task at hand. Aggregation is another example of using SQLAlchemy’s relational model rather than the object-oriented model. Suppose we wanted a count of how many users had each permission type. In the traditional object-oriented world, we would probably loop over each permission, then over each group, and finally count the users in the group (without forgetting to remove duplicates!). This leads to something like this:

for p in permissions:
    users = set()
    for g in p.groups:
        for u in g.users:
            users.add(u)
print 'Permission %s has %d users' % (p.permission_name, len(users))

In SQLAlchemy, we can drop into the SQL expression language to create the following query:

q=select([Permission.c.permission_name, 
          func.count(user_group.c.user_id)],
         and_(Permission.c.id==group_permission.c.permission_id,
              Group.c.id==group_permission.c.group_id,
              Group.c.id==user_group.c.group_id),
         group_by=[Permission.c.permission_name],
         distinct=True)
rs=q.execute()
for permission_name, num_users in q.execute():
print 'Permission %s has %d users' % (permission_name, num_users)

Although the query is a little longer in this case, we are doing all of the work in the database, allowing us to reduce the data transferred and potentially increase performance substantially due to reduced round-trips to the database. The important thing to note is that SQLAlchemy makes “simple things simple, and complex things possible.”

SQLAlchemy Philosophy

SQLAlchemy was created with the goal of letting your objects be objects, and your tables be tables. The SQLAlchemy home page puts it this way:

SQLAlchemy Philosophy
SQL databases behave less and less like object collections the more size and performance start to matter; object collections behave less and less like tables and rows the more abstraction starts to matter. SQLAlchemy aims to accommodate both of these principles.
—From http://www.sqlalchemy.org

Using the object mapper pattern (where plain Python objects are mapped to SQL tables via a mapper object, rather than requiring persistent objects to be derived from some Persistable class) achieves much of this separation of concerns. There has also been a concerted effort in SQLAlchemy development to expose the full power of SQL, should you wish to use it.

In SQLAlchemy, your objects are POPOs until you tell SQLAlchemy about them. This means that it is entirely possible to “bolt on” persistence to an existing object model by mapping the classes to tables. For instance, consider an application that uses users, groups, and permissions, as shown. You might prototype your application with the following class definitions:

class User(object):

    def __init__(self, user_name=None, password=None, groups=None):
        if groups is None: groups = []
        self.user_name = user_name
        self.password = password
        self._groups = groups

    def join_group(self, group):
        self._groups.append(group)

    def leave_group(self, group):
        self._groups.remove(group)

class Group(object):

def __init__(self, group_name=None, users=None, permissions=None):
        if users is None: users = []
        if permissions is None: permissions = []
        self.group_name = group_name
        self._users = users
        self._permissions = permissions

    def add_user(self, user):
        self._users.append(user)

    def del_user(self, user):
        self._users.remove(user)

    def add_permission(self, permission):
        self._permissions.append(permission)

    def del_permission(self, permission):
        self._permissions.remove(permission)

class Permission(object):

    def __init__(self, permission_name=None, groups=None):
        self.permission_name = permission_name
        self._groups = groups

    def join_group(self, group):
        self._groups.append(group)

    def leave_group(self, group):
        self._groups.remove(group)

Once your application moves beyond the prototype stage, you might expect to have to write code to manually load objects from the database or perhaps some other kind of persistent object store. On the other hand, if you are using SQLAlchemy, you would just define your tables:

user_table = Table(
    'tf_user', metadata,
    Column('id', Integer, primary_key=True),
    Column('user_name', Unicode(16), unique=True, nullable=False),
    Column('password', Unicode(40), nullable=False))

group_table = Table(
    'tf_group', metadata,
    Column('id', Integer, primary_key=True),
    Column('group_name', Unicode(16), unique=True, nullable=False))

permission_table = Table(
    'tf_permission', metadata,
    Column('id', Integer, primary_key=True),
    Column('permission_name', Unicode(16), unique=True, 
           nullable=False))

user_group = Table(
    'user_group',  metadata,
    Column('user_id', None, ForeignKey('tf_user.id'), 
           primary_key=True),
    Column('group_id', None, ForeignKey('tf_group.id'), 
           primary_key=True))

group_permission = Table(
    'group_permission',  metadata,
    Column('group_id', None, ForeignKey('tf_group.id'), 
           primary_key=True),
    Column('permission_id', None, ForeignKey('tf_permission.id'), 
           primary_key=True))

and your mappers:

mapper(User, user_table, properties=dict(
_groups=relation(Group, secondary=user_group, backref='_users')))
mapper(Group, group_table, properties=dict(
    _permissions=relation(Permission, secondary=group_permission, 
                          backref=_'groups')))
mapper(Permission, permission_table)

and you’re done. No modification of your objects is required—they are still simply new-style (derived from the object class) Python classes, and they still have whatever methods you have defined, as well as a few attributes added by SQLAlchemy (described in the sidebar Instrumentation on Mapped Classes”). Your old methods join_group, leave_group, etc. still work, even without modifying the class code. This means that you can modify mapped “collection” properties (properties modeling 1:N or M:N relationships) with regular list operations, and SQLAlchemy will track your changes and flush them to the database automatically.

Instrumentation on Mapped Classes

Mapped classes are actually fairly unmolested by the default SQLAlchemy mapper. In particular, the mapped class is given the following new attributes:

c: This attribute contains a collection of the columns in the table being mapped. This is useful when constructing SQL queries based on the mapped class, such as referring to User.c.user_name.
_state: SQLAlchemy uses this property to track whether a mapped object is “clean” (freshly fetched from the databaes), “dirty” (modified since fetching from the database), or “new” (as-yet unsaved to the database). This property generally should not be modified by the application programmer.
mapped properties: One attribute will be added to the mapped class for each property specified in the mapper, as well as any “auto-mapped” properties, such as columns. In the previous example, the mapper adds user_name, password, id, and _groups to the User class.

So, if you are planning on using SQLAlchemy, you should stay away from naming any class attributes c or _state, and you should be aware that SQLAlchemy will instrument your class based on the properties defined by the mapper.

SQLAlchemy also allows you the full expressiveness of SQL, including compound (multicolumn) primary keys and foreign keys, indexes, access to stored procedures, the ability to “reflect” your tables from the database into your application, and even the ability to specify cascading updates and deletes on your foreign key relationships and value constraints on your data.

SQLAlchemy Architecture

SQLAlchemy consists of several components, including the aforementioned database-independent SQL expression language object-relational mapper. In order to enable these components, SQLAlchemy also provides an Engine class, which manages connection pools and SQL dialects, a MetaData class, which manages your table information, and a flexible type system for mapping SQL types to Python types.

Engine

The beginning of any SQLAlchemy application is the Engine. The engine manages the SQLAlchemy connection pool and the database-independent SQL dialect layer. In our previous examples, the engine was created implicitly when the MetaData was created:

metadata=MetaData('sqlite://')
engine = metadata.bind

It is also possible to create an engine manually, using the SQLAlchemy function create_engine():

engine=create_engine('sqlite://')

This engine can later be bound to a MetaData object just by setting the bind attribute on the MetaData:

metadata.bind = engine

The engine can also be used in SQL statements such as table creation if the MetaData is unbound (not connected to a particular engine):

user_table.create(bind=engine)

The engine can be used to execute queries directly on the database via dynamic SQL:

for row in engine.execute("select user_name from tf_user"):
    print 'user name: %s' % row['user_name']

Most of the time, you will be using the higher-level facilities of the SQL expression language and ORM components of SQLAlchemy, but it’s nice to know that you can always easily drop down all the way to raw SQL if you need to.

Connection pooling

Thus far, we have glossed over the use of database connections. In fact, all of our examples up to this point have used SQLAlchemy’s powerful connection pooling subsystem. In order to execute queries against a database, a connection is required, and the establishment of a new connection is typically an expensive operation, involving a network connection, authentication of the user, and any database session setup required. To amortize the costs, the typical solution is to maintain a pool of database connections that are used over and over again in the application.

The Engine object in SQLAlchemy is responsible for managing a pool of low-level DB-API connections. In fact, the engine and the low-level connection objects obey a Connectable protocol, allowing you to execute dynamic SQL queries either directly against a connection, or against the engine (in which case the engine will automatically allocate a connection for the query).

In another instance of making simple things simple and complex things possible, SQLAlchemy does The Right Thing most of the time with connections, and allows you to override its strategy when required. SQLAlchemy’s default strategy is to acquire a connection for each SQL statement, and when that connection is no longer used (when its result set is closed or garbage-collected) to return it to the pool. If you would like to manually manage your collections, you can also do that via the connect() method on the engine object:

engine = create_engine('sqlite://')
connection = engine.connect()
result = connection.execute("select user_name from tf_user")
for row in result:
    print 'user name: %s' % row['user_name']
result.close()

SQLAlchemy has another strategy for connection pooling that has some performance benefits in many cases: the “thread-local” strategy. In the thread-local strategy, a connection that is currently in use by a thread will be reused for other statements within that thread. This can reduce database server load, which is especially important when you could have several applications accessing the database simultaneously. If you want to use the thread-local strategy, simply create the Engine object and set the strategy to threadlocal:

engine = create_engine('sqlite://', strategy='threadlocal')

SQL dialect management

Although SQL is a standardized language, many database vendors either do not fully implement it or simply create extensions to the standard. The dialect object attempts to manage the idiosyncracies of each supported SQL dialect as well as manage the low-level DB-API modules implementing the connection.

The dialect is mostly used as a transparent layer for your application programming. The main exception to this rule is when you want to access a data type that is supported only for particular database servers. For instance, MySQL has BigInteger and Enum types. To use these types, you must import them directly from the appropriate module in the sqlalchemy.databases package:

from sqlalchemy.databases.mysql import MSEnum, MSBigInteger

user_table = Table('tf_user', meta,
    Column('id', MSBigInteger),
Column('honorific', MSEnum('Mr', 'Mrs', 'Ms', 'Miss', 'Dr',
... 'Prof')))

MetaData Management

The MetaData object in SQLAlchemy is used to collect and organize information about your table layout (i.e., your database schema). We alluded to MetaData management before in describing how to create tables. A MetaData object must be created before any tables are defined, and each table must be associated with a MetaData object. MetaData objects can be created “bound” or “unbound,” based on whether they are associated with an engine. The following is an example of the different ways you can create MetaData objects:

# create an unbound MetaData
unbound_meta = MetaData()

# create an Engine and bind the MetaData to it
db1 = create_engine('sqlite://')
unbound_meta.bind = db1

# Create an engine and then a bound MetaData
db2 = MetaData('sqlite:///test1.db')
bound_meta1 = MetaData(db2)

# Create a bound MetaData with an implicitly created engine
bound_meta2 = MetaData('sqlite:///test2.db')

Although tables can be defined against unbound MetaData, it is often more convenient to eventually bind the metadata to an engine, as this allows the MetaData and the Table objects defined for it to access the database directly:

# Create a bound MetaData
meta = MetaData('sqlite://')

# Define a couple of tables
user_table = Table(
    'tf_user', meta,
    Column('id', Integer, primary_key=True),
    Column('user_name', Unicode(16), unique=True, nullable=False),
    Column('password', Unicode(40), nullable=False))

group_table = Table(
    'tf_group', meta,
    Column('id', Integer, primary_key=True),
    Column('group_name', Unicode(16), unique=True, nullable=False))

# Create all the tables in the (empty) database
meta.create_all()

# Select all the groups from the tf_group table
result_set = group_table.select().execute()

As mentioned previously, you can also reflect your schema by setting the autoload parameter to True in your Table creation. Reflection, however, requires a database connection to function properly. (SQLAlchemy must query the database to determine the structure of the tables.) Binding the MetaData to an engine is a convenient way to provide this connection. Note, however, that you are never required to bind the MetaData object; any operation that you can perform with a bound MetaData or a table defined on it can also be performed by passing the engine or connection to the individual method. This might be useful if you wish to use the same MetaData object for multiple distinct database engines:

meta = MetaData()
engine1 = create_engine('sqlite:///test1.db')
engine2 = create_engine('sqlite:///test2.db')

# Use the engine parameter to load tables from the first engine
user_table = Table(
    'tf_user', meta, autoload=True, autoload_with=engine1)
group_table = Table(
    'tf_group', meta, autoload=True, autoload_with=engine1)
permission_table = Table(
    'tf_permission', meta, autoload=True, autoload_with=engine1)
user_group_table = Table(
    'user_group', meta, autoload=True, autoload_with=engine1)
group_permission_table = Table(
    'group_permission', meta, autoload=True, autoload_with=engine1)

# Create the tables in the second engine
meta.create_all(engine2)

# Select some data
result_set = engine1.execute(user_table.select())

Types System

In many cases, SQLAlchemy can map SQL types to Python types in a straightforward way. To do this, SQLAlchemy provides a set of TypeEngine-derived classes that convert SQL data to Python data in the sqlalchemy.types module. TypeEngine subclasses are used to define the MetaData for tables.

Sometimes, in keeping with the SQLAlchemy philosophy of letting your objects be objects, you may find that the provided TypeEngine classes do not express all of the data types you wish to store in your database. In this case, you can write a custom TypeEngine that converts data being saved to the database to a database-native type, and converts data being loaded from the database to a Python native type. Suppose, for instance, that we wished to have a column that stored images from the Python Imaging Library (PIL). In this case, we might use the following TypeEngine definition:

class ImageType(sqlalchemy.types.Binary):

    def convert_bind_param(self, value, engine):
        sfp = StringIO()
        value.save(sfp, 'JPEG')
        return sfp.getvalue()
    
    def convert_result_value(self, value, engine):
        sfp = StringIO(value)
        image = PIL.Image.open(sfp)
        return image

Once we have defined ImageType, we can use that type in our table definitions, and the corresponding PIL image will be automatically created when we select from the database or serialized when we insert or update the database.

SQL Expression Language

SQLAlchemy’s SQL expression language provides an API to execute queries and updates against your tables, all from Python, and all in a database-independent way (managed by the SQLAlchemy-provided Dialect). For instance, the following expression:

select([user_table.c.user_name, user_table.c.password],
       where=user_table.c.user_name=='rick')

would yield the following SQL code:

SELECT tf_user.user_name, tf_user.password 
FROM tf_user 
WHERE tf_user.user_name = ?

Notice how the SQL generated uses a question mark for the user name value. This is known as a “bind parameter.” When the query is run, SQLAlchemy will send the query string (with bind parameters) and the actual variables (in this case, the string "rick") to the database engine. Using the SQLAlchemy SQL-generation layer has several advantages over hand-generating SQL strings:

Security: Application data (including user-generated data) is safely escaped via bind parameters, making SQL injection-style attacks extremely difficult.
Performance: The likelihood of reusing a particular query string (from the database server’s perspective) is increased. For instance, if we wanted to select another user from the table, the SQL generated would be identical, and a different bind parameter would be sent. This allows the database server in some cases to reuse its execution plan from the first query for the second, increasing performance.
Portability: Although SQL is a standardized language, different database servers implement different parts of the standard, and to different degrees of faithfulness. SQLAlchemy provides you a way to write database-independent SQL in Python without tying you to a particular database server. With a little bit of planning, the same SQLAlchemy-based application can run on SQLite, Oracle, DB2, PostgreSQL, or any other SQLAlchemy-supported database without code changes.

Most of the time, you will be using the SQL expression language by creating expressions involving the attributes of the table.c object. This is a special attribute that is added to Tables you have defined in the metadata, as well as any objects you have mapped to tables or other selectables. The “.c” objects represent database columns, and they can be combined via a rich set of operators:

# Select all users with a username starting with 'r' who were 
#    created before June 1, 2007
q = user_table.select(
    user_table.c.user_name.like('r%') 
    & user_table.c.created < datetime(2007,6,1))

# Alternate syntax to do the same thing
q = user_table.select(and_(
    user_table.c.user_name.like('r%'),
    user_table.c.created < datetime(2007,6,1)))

You can also use mapped classes in the same way:

q = session.query(User)
q = q.filter(User.c.user_name.like('r%') 
             & User.c.created > datetime(2007,6,1))

Of course, you aren’t required to use the SQL expression language; you can always insert custom SQL instead:

q = user_table.select("""tf_user.user_name LIKE 'r%'""")

You can also use SQL functions in your queries by using the SQLAlchemy-supplied func object:

q=select([Permission.c.permission_name, 
          func.count(user_group.c.user_id)],
         and_(Permission.c.id==group_permission.c.permission_id,
              Group.c.id==group_permission.c.group_id,
              Group.c.id==user_group.c.group_id),
         group_by=[Permission.c.permission_name],
         distinct=True)

Object Relational Mapper (ORM)

Although you can do a lot with the Engine, Metadata, TypeEngine, and SQL expression language, the true power of SQLAlchemy is found in its ORM. SQLAlchemy’s ORM provides a convenient, unobtrusive way to add database persistence to your Python objects without requiring you to design your objects around the database, or the database around the objects. To accomplish this, SQLAlchemy uses the data mapper pattern. In this pattern, you can define your tables (or other selectables, such as joins) in one module, your classes in another, and the mappers between them in yet another module.

SQLAlchemy provides a great deal of flexibility in mapping tables, as well as a sensible set of default mappings. Suppose that we defined the following tables, classes, and mappers:

user_table = Table(
    'tf_user', metadata,
    Column('id', Integer, primary_key=True),
    Column('user_name', Unicode(16), unique=True, nullable=False),
Column('email_address', Unicode(255), unique=True, nullable=False),
    Column('password', Unicode(40), nullable=False),
    Column('first_name', Unicode(255), default=''),
    Column('last_name', Unicode(255), default=''),
    Column('created', DateTime, default=datetime.now))

group_table = Table(
    'tf_group', metadata,
    Column('id', Integer, primary_key=True),
    Column('group_name', Unicode(16), unique=True, nullable=False))

user_group = Table(
    'user_group',  metadata,
Column('user_id', None, ForeignKey('tf_user.id'), primary_key=True),
Column('group_id', None, ForeignKey('tf_group.id'),
... primary_key=True))

class User(object): pass

class Group(object): pass

mapper(User, user_table)
mapper(Group, group_table)

Here, the mapper would create properties on the User class for the columns of the table: id, user_name, email_address, password, first_name, last_name, and created. On the Group class, the id and group_name properties would be defined. The mapper, however, has a great deal more flexibility. If we wished to store only a hash of the user’s password in the database, rather than the actual plaintext password, we might modify the User class and mapper to the following:

import sha
class User(object):

    def _get_password(self):
        return self._password
    def _set_password(self, value):
        self._password = sha.new(value).hexdigest()
    password=property(_get_password, _set_password)

    def password_matches(self, password):
        return sha.new(password).hexdigest() == self._password

mapper(User, user_table, properties=dict(
    _password=user_table.c.password))

By providing an application-level override for the password property, we can ensure that only hashed passwords are ever stored to the database. By telling the mapper to map user_table.c.password to the protected property _password, we prevent SQLAlchemy from providing the default mapping for the password column.

Perhaps the most powerful feature of the ORM is the ability to use regular Python data structures to model relationships between tables. In the preceding user/group example, we can modify the user mapper a bit more to provide the User class with a groups property, and the Group class with a users property:

mapper(User, user_table, properties=dict(
    _password=user_table.c.password,
    groups=relation(Group, secondary=user_group, backref='users')))

Now we can access all the groups that a user is a member of by simply accessing the groups property. We can also add a user to a group by either appending the user to the group’s users property, or appending the group to the user’s groups property:

# user1's "groups" property will automatically be updated
group1.users.append(user1) 

# group2's "users" property will automatically be updated
user2.groups.append(group2)

The ORM uses a Session object to keep track of objects loaded from the database and the changes made to them. Sessions are used to persist objects created by the application, and they provide a query interface to retrieve objects from the database. Rather than executing the database code to synchronize your objects with your tables every time an object is modified, the Session simply tracks all changes until its flush() method is called, at which point all the changes are sent to the database in a single unit of work.

A Session class is created using the sessionmaker() function, and a Session object is created by instantiating the class returned from sessionmaker(). Although you can instantiate the Session object directly, the sessionmaker function is a convenient way to fix the parameters that will be passed to the Session’s constructor, rather than repeating them wherever a Session is instantiated.

To insert objects into the database, we simply need to save them to the session:

Session=sessionmaker()
session=Session()
u = User()
u.user_name='rick'
u.password='foo'
u.email_address='rick@pyatl.org'
session.save(u) # tell SQLAlchemy to track the object
session.flush() # actually perform the insert

To retrieve objects from the database, we need to first obtain a query object from the session and then use its methods to specify which objects we retrieve:

q = session.query(User)

user = q.get(1) # retrieve by primary key

# retrieve one object by property
user = q.get_by(user_name='rick') 

# retrieve multiple objects
users = list(q.filter_by(first_name=None))

# retrieve multiple objects using the SQL expression language
users = list(q.filter(User.c.first_name==None))

Note that the filter_by() method takes keyword arguments whose names match the mapped properties. This is often a useful shortcut because you avoid having to type out “User.c.” over and over, but is less flexible than the filter method, which can take arbitrary SQL expressions as its criteria for selection. One powerful feature of SQLAlchemy is its ability, in the filter_by() method, to automatically search your joined tables for a matching column:

# Retrieve all users in a group named 'admin'
users = list(q.filter_by(group_name='admin'))

SQLAlchemy will automatically search for tables with foreign key relationships that contain the queried object to find columns to satisfy the keyword arguments. This can be very powerful, but can also sometimes find the wrong column, particularly if you are querying based on a common column name, such as name, for instance. In this case, you can manually specify the joins that SQLAlchemy will perform in the query via the join() method.

q = session.query(User)
q = q.join('groups') # use the mapped property name for joins
q = q.filter(Group.c.group_name=='admin')
users = list(q)

You can even specify a “join chain” by using a list of properties for the argument to join():

q = session.query(User)
# groups is a property of a User, permissions is a property of a
... Group
q = q.join(['groups', 'permissions']) 
q = q.filter(Permission.c.permission_name=='admin')
users = list(q)

The power of SQLAlchemy to construct complex queries becomes clear when we compare the previous code to the SQL generated:

SELECT tf_user.first_name AS tf_user_first_name, 
    tf_user.last_name AS tf_user_last_name, 
    tf_user.created AS tf_user_created, 
    tf_user.user_name AS tf_user_user_name, 
    tf_user.password AS tf_user_password, 
    tf_user.email_address AS tf_user_email_address, 
    tf_user.id AS tf_user_id 
FROM tf_user 
    JOIN user_group ON tf_user.id = user_group.user_id 
    JOIN tf_group ON tf_group.id = user_group.group_id 
JOIN group_permission ON tf_group.id = group_permission.group_id 
JOIN tf_permission ON tf_permission.id =
... group_permission.permission_id 
WHERE tf_permission.permission_name = ? ORDER BY tf_user.oid

Get Essential SQLAlchemy now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Essential SQLAlchemy by Rick Copeland