SQLAlchemy is a Python Library created by Mike Bayer to provide a high-level, Pythonic (idiomatically Python) interface to relational databases such as Oracle, DB2, MySQL, PostgreSQL, and SQLite. SQLAlchemy attempts to be unobtrusive to your Python code, allowing you to map plain old Python objects (POPOs) to database tables without substantially changing your existing Python code. SQLAlchemy includes a database server-independent SQL expression language and an object-relational mapper (ORM) that lets you use SQL to persist your application objects automatically. This chapter will introduce you to SQLAlchemy, illustrating some of its more powerful features. Later chapters will provide more depth for the topics covered here.
If you have used lower-level database interfaces with Python, such as the DB-API, you may be used to writing code such as the following to save your objects to the database:
sql="INSERT INTO user(user_name, password) VALUES (%s, %s)" cursor = conn.cursor() cursor.execute(sql, ('rick', 'parrot'))
Although this code gets the job done, it is verbose, error-prone, and tedious to write. Using string manipulation to build up a query as done here can lead to various logical errors and vulnerabilities such as opening your application up to SQL injection attacks. Generating the string to be executed by your database server verbatim also ties your code to the particular DB-API driver you are currently using, making migration to a different database server difficult. For instance, if we wished to migrate the previous example to the Oracle DB-API driver, we would need to write:
sql="INSERT INTO user(user_name, password) VALUES (:1, :2)" cursor = conn.cursor() cursor.execute(sql, 'rick', 'parrot')
In the SQLAlchemy SQL expression language, you could write the following instead:
statement = user_table.insert(user_name='rick', password='parrot') statement.execute()
To migrate this code to Oracle, you would write, well, exactly the same thing.
SQLAlchemy also allows you to write SQL queries using a Pythonic expression-builder. For instance, to retrieve all the users created in 2007, you would write:
statement = user_table.select(and_( user_table.c.created >= date(2007,1,1), user_table.c.created < date(2008,1,1)) result = statement.execute()
In order to use the SQL expression language, you need to provide SQLAlchemy with information about your database schema. For instance, if you are using the user table mentioned previously, your schema definition might be the following:
metadata=MetaData('sqlite://') # use an in-memory SQLite database user_table = Table( 'tf_user', metadata, Column('id', Integer, primary_key=True), Column('user_name', Unicode(16), unique=True, nullable=False), Column('email_address', Unicode(255), unique=True, nullable=False), Column('password', Unicode(40), nullable=False), Column('first_name', Unicode(255), default=''), Column('last_name', Unicode(255), default=''), Column('created', DateTime, default=datetime.now))
If you would rather use an existing database schema definition, you still need to tell SQLAlchemy which tables you have, but SQLAlchemy can reflect the tables using the database server’s introspection capabilities. In this case, the schema definition reduces to the following:
users_table = Table('users', metadata, autoload=True)
Although the SQLAlchemy SQL expression language is quite powerful, it can still be tedious to manually specify the queries and updates necessary to work with your tables. To help with this problem, SQLAlchemy provides an ORM to automatically populate your Python objects from the database and to update the database based on changes to your Python objects. Using the ORM is as simple as writing your classes, defining your tables, and mapping your tables to your classes. In the case of the user table, you could perform a simple mapping via the following code:
class User(object): pass mapper(User, user_table)
Notice that there is nothing particularly special about the
User
class defined here. It is used to create “plain
old Python objects,” or POPOs. All the magic of SQLAlchemy is performed by the mapper.
Although the class definition just shown is empty, you may define your own
methods and attributes on a mapped class. The mapper will create
attributes corresponding to the column names in the mapped table as well
as some private attributes used by SQLAlchemy internally. Once your table
is mapped, you can use a Session
object
to populate your objects based on data in the user table and
flush any changes you make to mapped objects to the database:
>>> Session = sessionmaker() >>> session = Session() >>> >>> # Insert a user into the database ... u = User() >>> u.user_name='rick' >>> u.email_address='rick@foo.com' >>> u.password='parrot' >>> session.save(u) >>> >>> # Flush all changes to the session out to the database ... session.flush() >>> >>> query = session.query(User) >>> # List all users ... list(query) [<__main__.User object at 0x2abb96dae3d0>] >>> >>> # Get a particular user by primary key ... query.get(1) <__main__.User object at 0x2abb96dae3d0> >>> >>> # Get a particular user by some other column ... query.get_by(user_name='rick') <__main__.User object at 0x2abb96dae3d0> >>> >>> u = query.get_by(user_name='rick') >>> u.password = 'foo' >>> session.flush() >>> query.get(1).password 'foo'
As you can see, SQLAlchemy makes persisting your objects simple and concise. You can also customize and extend the set of properties created by SQLAlchemy, allowing your objects to model, for instance, a many-to-many relationship with simple Python lists.
Although a SQL database is a powerful and flexible modeling tool, it is not always a good match for the object-oriented programming style. SQL is good for some things, and object-oriented programming is good for others. This is sometimes referred to as the object/relational “impedance mismatch,” and it is a problem that SQLAlchemy tries to address in the ORM. To illustrate the object/relational impedance mismatch, let’s first look at how we might model a system in SQL, and then how we might model it in an object-oriented way.
SQL databases provide a powerful means for modeling data and allowing for arbitrary queries of that data. The model underlying SQL is the relational model. In the relational model, modeled items (entities) can have various attributes, and are related to other entities via relationships. These relationships can be one-to-one, one-to-many, many-to-many, or complex, multientity relationships. The SQL expression of the entity is the table, and relationships are expressed as foreign key constraints, possibly with the use of an auxiliary “join” table. For example, suppose we have a user permission system that has users who may belong to one or more groups. Groups may have one or more permissions. Our SQL to model such a system might be something like the following:
CREATE TABLE tf_user ( id INTEGER NOT NULL, user_name VARCHAR(16) NOT NULL, email_address VARCHAR(255) NOT NULL, password VARCHAR(40) NOT NULL, first_name VARCHAR(255), last_name VARCHAR(255), created TIMESTAMP, PRIMARY KEY (id), UNIQUE (user_name), UNIQUE (email_address)); CREATE TABLE tf_group ( id INTEGER NOT NULL, group_name VARCHAR(16) NOT NULL, PRIMARY KEY (id), UNIQUE (group_name)); CREATE TABLE tf_permission ( id INTEGER NOT NULL, permission_name VARCHAR(16) NOT NULL, PRIMARY KEY (id), UNIQUE (permission_name)); -- Relate the user and group tables CREATE TABLE user_group ( user_id INTEGER, group_id INTEGER, PRIMARY KEY(user_id, group_id), FOREIGN KEY(user_id) REFERENCES tf_user (id), FOREIGN KEY(group_id) REFERENCES tf_group (id)); -- Relate the group and permission tables CREATE TABLE group_permission ( group_id INTEGER, permission_id INTEGER, PRIMARY KEY(group_id, permission_id), FOREIGN KEY(group_id) REFERENCES tf_group (id), FOREIGN KEY(permission_id) REFERENCES tf_permission (id));
Notice the two auxiliary tables used to provide many-to-many joins between users and groups, and between groups and users. Once we have this schema in place, a common scenario is to check whether a particular user has a particular permission. In SQL, we might write:
SELECT COUNT(*) FROM tf_user, tf_group, tf_permission WHERE tf_user.user_name='rick' AND tf_user.id=user_group.user_id AND user_group.group_id = group_permission.group_id AND group_permission.permission_id = tf_permission.id AND permission_name='admin';
In a single statement, we join the three entities—user, group, and permission—together to determine whether the user “rick” has the “admin” permission.
In the object-oriented world, we would probably model the system quite differently. We would still have users, groups, and permissions, but they would probably have an ownership relationship between them:
class User(object): groups=[] class Group(object): users=[] permissions=[] class Permission(object): groups=[]
Suppose we wanted to print out a summary of all of a given user’s groups and permissions, something an object-oriented style would do quite well. We might write something like the following:
print 'Summary for %s' % user.user_name for g in user.groups: print ' Member of group %s' % g.group_name for p in g.permissions: print ' ... which has permission %s' % p.permission_name
On the other hand, if we wanted to determine whether a user has a particular permission, we would need to do something like the following:
def user_has_permission(user, permission_name): for g in user.groups: for p in g.permissions: if p.permission_name == 'admin': return True return False
In this case, we needed to write a nested loop, examining every group the user is a member of to see if that group had a particular permission. SQLAlchemy lets you use object-oriented programming where appropriate (such as checking for a user’s permission to do something) and relational programming where appropriate (such as printing a summary of groups and permissions). In SQLAlchemy, we could print the summary information exactly as shown, and we could detect membership in a group with a much simpler query. First, we need to create mappings between our tables and our objects, telling SQLAlchemy a little bit about the many-to-many joins:
mapper(User, user_table, properties=dict( groups=relation(Group, secondary=user_group, backref='users'))) mapper(Group, group_table, properties=dict( permissions=relation(Permission, secondary=group_permission, backref='groups'))) mapper(Permission, permission_table)
Now, our model plus the magic of the SQLAlchemy ORM allows us to detect whether the given user is an administrator:
q = session.query(Permission) rick_is_admin = q.count_by(permission_name='admin', ... user_name='rick')
SQLAlchemy was able to look at our mappers, determine how to join the tables, and use the relational model to generate a single call to the database. The SQL generated by SQLAlchemy is actually quite similar to what we would have written ourselves:
SELECT count(tf_permission.id) FROM tf_permission, tf_user, group_permission, tf_group, user_group WHERE (tf_user.user_name = ? AND ((tf_permission.id = group_permission.permission_id AND tf_group.id = group_permission.group_id) AND (tf_group.id = user_group.group_id AND tf_user.id = user_group.user_id))) AND (tf_permission.permission_name = ?)
SQLAlchemy’s real power comes from its ability to bridge the object/relational divide; it allows you to use whichever model is appropriate to your task at hand. Aggregation is another example of using SQLAlchemy’s relational model rather than the object-oriented model. Suppose we wanted a count of how many users had each permission type. In the traditional object-oriented world, we would probably loop over each permission, then over each group, and finally count the users in the group (without forgetting to remove duplicates!). This leads to something like this:
for p in permissions: users = set() for g in p.groups: for u in g.users: users.add(u) print 'Permission %s has %d users' % (p.permission_name, len(users))
In SQLAlchemy, we can drop into the SQL expression language to create the following query:
q=select([Permission.c.permission_name, func.count(user_group.c.user_id)], and_(Permission.c.id==group_permission.c.permission_id, Group.c.id==group_permission.c.group_id, Group.c.id==user_group.c.group_id), group_by=[Permission.c.permission_name], distinct=True) rs=q.execute() for permission_name, num_users in q.execute(): print 'Permission %s has %d users' % (permission_name, num_users)
Although the query is a little longer in this case, we are doing all of the work in the database, allowing us to reduce the data transferred and potentially increase performance substantially due to reduced round-trips to the database. The important thing to note is that SQLAlchemy makes “simple things simple, and complex things possible.”
SQLAlchemy was created with the goal of letting your objects be objects, and your tables be tables. The SQLAlchemy home page puts it this way:
SQLAlchemy Philosophy
SQL databases behave less and less like object collections the more size and performance start to matter; object collections behave less and less like tables and rows the more abstraction starts to matter. SQLAlchemy aims to accommodate both of these principles.
—From
http://www.sqlalchemy.org
Using the object mapper pattern (where plain Python objects are
mapped to SQL tables via a mapper object, rather than requiring persistent
objects to be derived from some Persistable
class)
achieves much of this separation of concerns. There has also been a
concerted effort in SQLAlchemy development to expose the full power of
SQL, should you wish to use it.
In SQLAlchemy, your objects are POPOs until you tell SQLAlchemy about them. This means that it is entirely possible to “bolt on” persistence to an existing object model by mapping the classes to tables. For instance, consider an application that uses users, groups, and permissions, as shown. You might prototype your application with the following class definitions:
class User(object): def __init__(self, user_name=None, password=None, groups=None): if groups is None: groups = [] self.user_name = user_name self.password = password self._groups = groups def join_group(self, group): self._groups.append(group) def leave_group(self, group): self._groups.remove(group) class Group(object): def __init__(self, group_name=None, users=None, permissions=None): if users is None: users = [] if permissions is None: permissions = [] self.group_name = group_name self._users = users self._permissions = permissions def add_user(self, user): self._users.append(user) def del_user(self, user): self._users.remove(user) def add_permission(self, permission): self._permissions.append(permission) def del_permission(self, permission): self._permissions.remove(permission) class Permission(object): def __init__(self, permission_name=None, groups=None): self.permission_name = permission_name self._groups = groups def join_group(self, group): self._groups.append(group) def leave_group(self, group): self._groups.remove(group)
Once your application moves beyond the prototype stage, you might expect to have to write code to manually load objects from the database or perhaps some other kind of persistent object store. On the other hand, if you are using SQLAlchemy, you would just define your tables:
user_table = Table( 'tf_user', metadata, Column('id', Integer, primary_key=True), Column('user_name', Unicode(16), unique=True, nullable=False), Column('password', Unicode(40), nullable=False)) group_table = Table( 'tf_group', metadata, Column('id', Integer, primary_key=True), Column('group_name', Unicode(16), unique=True, nullable=False)) permission_table = Table( 'tf_permission', metadata, Column('id', Integer, primary_key=True), Column('permission_name', Unicode(16), unique=True, nullable=False)) user_group = Table( 'user_group', metadata, Column('user_id', None, ForeignKey('tf_user.id'), primary_key=True), Column('group_id', None, ForeignKey('tf_group.id'), primary_key=True)) group_permission = Table( 'group_permission', metadata, Column('group_id', None, ForeignKey('tf_group.id'), primary_key=True), Column('permission_id', None, ForeignKey('tf_permission.id'), primary_key=True))
and your mappers:
mapper(User, user_table, properties=dict( _groups=relation(Group, secondary=user_group, backref='_users'))) mapper(Group, group_table, properties=dict( _permissions=relation(Permission, secondary=group_permission, backref=_'groups'))) mapper(Permission, permission_table)
and you’re done. No modification of your objects is required—they
are still simply new-style (derived from the object
class) Python classes, and they still have whatever methods you have
defined, as well as a few attributes added by SQLAlchemy (described in the
sidebar Instrumentation on Mapped Classes”). Your old
methods join_group
, leave_group
, etc. still
work, even without modifying the class code. This means that
you can modify mapped “collection” properties (properties modeling 1:N or
M:N relationships) with regular list
operations, and SQLAlchemy will track your changes and flush them to the
database automatically.
SQLAlchemy also allows you the full expressiveness of SQL, including compound (multicolumn) primary keys and foreign keys, indexes, access to stored procedures, the ability to “reflect” your tables from the database into your application, and even the ability to specify cascading updates and deletes on your foreign key relationships and value constraints on your data.
SQLAlchemy consists of several components, including the
aforementioned database-independent SQL expression language
object-relational mapper. In order to enable these components, SQLAlchemy
also provides an Engine
class, which manages connection
pools and SQL dialects, a MetaData
class, which
manages your table information, and a flexible type system for mapping SQL
types to Python types.
The beginning of any SQLAlchemy application is the
Engine
. The engine manages the SQLAlchemy
connection pool and the database-independent SQL dialect layer. In our
previous examples, the engine was created implicitly when the MetaData
was created:
metadata=MetaData('sqlite://') engine = metadata.bind
It is also possible to create an engine manually, using
the SQLAlchemy function create_engine
()
:
engine=create_engine('sqlite://')
This engine can later be bound to a
MetaData
object just by setting the
bind
attribute on the
MetaData
:
metadata.bind = engine
The engine can also be used in SQL statements such as table
creation if the MetaData
is
unbound (not connected to a particular engine):
user_table.create(bind=engine)
The engine can be used to execute queries directly on the database via dynamic SQL:
for row in engine.execute("select user_name from tf_user"): print 'user name: %s' % row['user_name']
Most of the time, you will be using the higher-level facilities of the SQL expression language and ORM components of SQLAlchemy, but it’s nice to know that you can always easily drop down all the way to raw SQL if you need to.
Thus far, we have glossed over the use of database connections. In fact, all of our examples up to this point have used SQLAlchemy’s powerful connection pooling subsystem. In order to execute queries against a database, a connection is required, and the establishment of a new connection is typically an expensive operation, involving a network connection, authentication of the user, and any database session setup required. To amortize the costs, the typical solution is to maintain a pool of database connections that are used over and over again in the application.
The Engine
object in SQLAlchemy is responsible for managing a pool of
low-level DB-API connections. In fact, the engine and the low-level
connection objects obey a Connectable
protocol,
allowing you to execute dynamic SQL queries either directly against a
connection, or against the engine (in which case the engine will
automatically allocate a connection for the query).
In another instance of making simple things simple and complex
things possible, SQLAlchemy does The Right Thing most of the time with
connections, and allows you to override its strategy when required.
SQLAlchemy’s default strategy is to acquire a connection for each SQL
statement, and when that connection is no longer used (when its result
set is closed or garbage-collected) to return it to the pool. If you
would like to manually manage your collections, you can also do that
via the connect
()
method on the engine object:
engine = create_engine('sqlite://') connection = engine.connect() result = connection.execute("select user_name from tf_user") for row in result: print 'user name: %s' % row['user_name'] result.close()
SQLAlchemy has another strategy for connection pooling that has
some performance benefits in many cases: the “thread-local” strategy.
In the thread-local strategy, a connection that is currently in use by
a thread will be reused for other statements within that thread. This
can reduce database server load, which is especially important when
you could have several applications accessing the database
simultaneously. If you want to use the thread-local strategy, simply
create the Engine
object and set the strategy
to
threadlocal
:
engine = create_engine('sqlite://', strategy='threadlocal')
Although SQL is a standardized language, many database vendors either do not fully implement it or simply create extensions to the standard. The dialect object attempts to manage the idiosyncracies of each supported SQL dialect as well as manage the low-level DB-API modules implementing the connection.
The dialect is mostly used as a transparent layer for your
application programming. The main exception to this rule is when you
want to access a data type that is supported only for particular
database servers. For instance, MySQL has
BigInteger
and Enum
types. To use these types, you must import them directly from the
appropriate module in the sqlalchemy.databases
package:
from sqlalchemy.databases.mysql import MSEnum, MSBigInteger user_table = Table('tf_user', meta, Column('id', MSBigInteger), Column('honorific', MSEnum('Mr', 'Mrs', 'Ms', 'Miss', 'Dr', ... 'Prof')))
The MetaData
object in SQLAlchemy is used to collect and organize information
about your table layout (i.e., your database
schema). We alluded to MetaData
management
before in describing how to create tables. A
MetaData
object must be created before any tables
are defined, and each table must be associated with a
MetaData
object. MetaData
objects can be created “bound”
or “unbound,” based on whether they are associated with an engine. The
following is an example of the different ways you can create
MetaData
objects:
# create an unbound MetaData unbound_meta = MetaData() # create an Engine and bind the MetaData to it db1 = create_engine('sqlite://') unbound_meta.bind = db1 # Create an engine and then a bound MetaData db2 = MetaData('sqlite:///test1.db') bound_meta1 = MetaData(db2) # Create a bound MetaData with an implicitly created engine bound_meta2 = MetaData('sqlite:///test2.db')
Although tables can be defined against unbound
MetaData
, it is often more convenient to
eventually bind the metadata to an engine, as this allows the
MetaData
and the Table
objects defined for it to access the database directly:
# Create a bound MetaData meta = MetaData('sqlite://') # Define a couple of tables user_table = Table( 'tf_user', meta, Column('id', Integer, primary_key=True), Column('user_name', Unicode(16), unique=True, nullable=False), Column('password', Unicode(40), nullable=False)) group_table = Table( 'tf_group', meta, Column('id', Integer, primary_key=True), Column('group_name', Unicode(16), unique=True, nullable=False)) # Create all the tables in the (empty) database meta.create_all() # Select all the groups from the tf_group table result_set = group_table.select().execute()
As mentioned previously, you can also reflect
your schema by setting the autoload
parameter
to True
in your
Table
creation. Reflection, however, requires a
database connection to function properly. (SQLAlchemy must query the
database to determine the structure of the tables.) Binding the
MetaData
to an engine is a convenient way to
provide this connection. Note, however, that you are never
required to bind the MetaData
object; any operation that you
can perform with a bound MetaData
or a table
defined on it can also be performed by passing the engine or connection
to the individual method. This might be useful if you wish to use the
same MetaData
object for multiple distinct
database engines:
meta = MetaData() engine1 = create_engine('sqlite:///test1.db') engine2 = create_engine('sqlite:///test2.db') # Use the engine parameter to load tables from the first engine user_table = Table( 'tf_user', meta, autoload=True, autoload_with=engine1) group_table = Table( 'tf_group', meta, autoload=True, autoload_with=engine1) permission_table = Table( 'tf_permission', meta, autoload=True, autoload_with=engine1) user_group_table = Table( 'user_group', meta, autoload=True, autoload_with=engine1) group_permission_table = Table( 'group_permission', meta, autoload=True, autoload_with=engine1) # Create the tables in the second engine meta.create_all(engine2) # Select some data result_set = engine1.execute(user_table.select())
In many cases, SQLAlchemy can map SQL types to Python types in a
straightforward way. To do this, SQLAlchemy provides a set of
TypeEngine
-derived classes that convert SQL data
to Python data in the sqlalchemy.types
module.
TypeEngine
subclasses are used to define the
MetaData
for tables.
Sometimes, in keeping with the SQLAlchemy philosophy of letting
your objects be objects, you may find that the provided
TypeEngine
classes do not express all of the data
types you wish to store in your database. In this case, you can write a
custom TypeEngine
that
converts data being saved to the database to a database-native type, and
converts data being loaded from the database to a Python native type.
Suppose, for instance, that we wished to have a column that stored
images from the Python Imaging Library (PIL). In this case, we might use
the following TypeEngine
definition:
class ImageType(sqlalchemy.types.Binary): def convert_bind_param(self, value, engine): sfp = StringIO() value.save(sfp, 'JPEG') return sfp.getvalue() def convert_result_value(self, value, engine): sfp = StringIO(value) image = PIL.Image.open(sfp) return image
Once we have defined ImageType
, we can use
that type in our table definitions, and the corresponding PIL image will
be automatically created when we select from the database or serialized
when we insert or update the database.
SQLAlchemy’s SQL expression language provides an API to execute
queries and updates against your tables, all from Python, and all in a
database-independent way (managed by the SQLAlchemy-provided
Dialect
). For instance, the following expression:
select([user_table.c.user_name, user_table.c.password], where=user_table.c.user_name=='rick')
would yield the following SQL code:
SELECT tf_user.user_name, tf_user.password FROM tf_user WHERE tf_user.user_name = ?
Notice how the SQL generated uses a question mark for the user name value. This is known as a
“bind parameter.” When the query is run, SQLAlchemy will send the query
string (with bind parameters) and the actual variables (in this case,
the string "rick"
) to the database engine. Using the
SQLAlchemy SQL-generation layer has several advantages over hand-generating SQL strings:
- Security
Application data (including user-generated data) is safely escaped via bind parameters, making SQL injection-style attacks extremely difficult.
- Performance
The likelihood of reusing a particular query string (from the database server’s perspective) is increased. For instance, if we wanted to select another user from the table, the SQL generated would be identical, and a different bind parameter would be sent. This allows the database server in some cases to reuse its execution plan from the first query for the second, increasing performance.
- Portability
Although SQL is a standardized language, different database servers implement different parts of the standard, and to different degrees of faithfulness. SQLAlchemy provides you a way to write database-independent SQL in Python without tying you to a particular database server. With a little bit of planning, the same SQLAlchemy-based application can run on SQLite, Oracle, DB2, PostgreSQL, or any other SQLAlchemy-supported database without code changes.
Most of the time, you will be using the SQL expression language by
creating expressions involving the attributes of the
table
.c object. This is a special attribute that is added to
Tables
you have defined in the metadata, as well
as any objects you have mapped to tables or other selectables. The “.c”
objects represent database columns, and they can be combined via a rich set
of operators:
# Select all users with a username starting with 'r' who were # created before June 1, 2007 q = user_table.select( user_table.c.user_name.like('r%') & user_table.c.created < datetime(2007,6,1)) # Alternate syntax to do the same thing q = user_table.select(and_( user_table.c.user_name.like('r%'), user_table.c.created < datetime(2007,6,1)))
You can also use mapped classes in the same way:
q = session.query(User) q = q.filter(User.c.user_name.like('r%') & User.c.created > datetime(2007,6,1))
Of course, you aren’t required to use the SQL expression language; you can always insert custom SQL instead:
q = user_table.select("""tf_user.user_name LIKE 'r%'""")
You can also use SQL functions in your queries by using the
SQLAlchemy-supplied func
object:
q=select([Permission.c.permission_name, func.count(user_group.c.user_id)], and_(Permission.c.id==group_permission.c.permission_id, Group.c.id==group_permission.c.group_id, Group.c.id==user_group.c.group_id), group_by=[Permission.c.permission_name], distinct=True)
Although you can do a lot with the Engine
,
Metadata
, TypeEngine
,
and SQL expression language, the true power of SQLAlchemy is
found in its ORM. SQLAlchemy’s ORM provides a convenient, unobtrusive
way to add database persistence to your Python objects without requiring
you to design your objects around the database, or the database around the objects. To accomplish
this, SQLAlchemy uses the data mapper pattern. In this pattern, you can
define your tables (or other selectables, such as joins) in one module,
your classes in another, and the mappers between them in yet another
module.
SQLAlchemy provides a great deal of flexibility in mapping tables, as well as a sensible set of default mappings. Suppose that we defined the following tables, classes, and mappers:
user_table = Table( 'tf_user', metadata, Column('id', Integer, primary_key=True), Column('user_name', Unicode(16), unique=True, nullable=False), Column('email_address', Unicode(255), unique=True, nullable=False), Column('password', Unicode(40), nullable=False), Column('first_name', Unicode(255), default=''), Column('last_name', Unicode(255), default=''), Column('created', DateTime, default=datetime.now)) group_table = Table( 'tf_group', metadata, Column('id', Integer, primary_key=True), Column('group_name', Unicode(16), unique=True, nullable=False)) user_group = Table( 'user_group', metadata, Column('user_id', None, ForeignKey('tf_user.id'), primary_key=True), Column('group_id', None, ForeignKey('tf_group.id'), ... primary_key=True)) class User(object): pass class Group(object): pass mapper(User, user_table) mapper(Group, group_table)
Here, the mapper would create properties on the
User
class for the columns of the table:
id
, user_name
,
email_address
, password
,
first_name
, last_name
, and
created
. On the Group
class,
the id
and group_name
properties
would be defined. The mapper, however, has a great deal more
flexibility. If we wished to store only a hash of the user’s password in
the database, rather than the actual plaintext password, we might modify
the User
class and mapper to the
following:
import sha class User(object): def _get_password(self): return self._password def _set_password(self, value): self._password = sha.new(value).hexdigest() password=property(_get_password, _set_password) def password_matches(self, password): return sha.new(password).hexdigest() == self._password mapper(User, user_table, properties=dict( _password=user_table.c.password))
By providing an application-level override for the password
property, we can ensure that only hashed passwords are ever stored to
the database. By telling the mapper to map
user_table.c.password
to the protected property
_password
, we prevent SQLAlchemy from providing the default
mapping for the password column.
Perhaps the most powerful feature of the ORM is the ability to use
regular Python data structures to model relationships between tables. In
the preceding user/group example, we can modify the user mapper a bit
more to provide the User
class with a
groups
property, and the Group
class with a users
property:
mapper(User, user_table, properties=dict( _password=user_table.c.password, groups=relation(Group, secondary=user_group, backref='users')))
Now we can access all the groups that a user is a member of by
simply accessing the groups
property. We can also add
a user to a group by either appending the user to the group’s
users
property, or appending the
group to the user’s groups
property:
# user1's "groups" property will automatically be updated group1.users.append(user1) # group2's "users" property will automatically be updated user2.groups.append(group2)
The ORM uses a Session
object to keep track
of objects loaded from the database and the changes made to them.
Session
s are used to persist objects created by
the application, and they provide a query interface to retrieve objects
from the database. Rather than executing the database code to
synchronize your objects with your tables every time an object is
modified, the Session
simply tracks all changes
until its flush
()
method is
called, at which point all the changes are sent to the database in a
single unit of work.
A Session
class is created using the sessionmaker
()
function,
and a Session
object is created by instantiating
the class returned from sessionmaker
()
. Although
you can instantiate the Session
object directly,
the sessionmaker
function is a
convenient way to fix the parameters that will be passed to the
Session
’s constructor, rather than repeating them
wherever a Session
is instantiated.
To insert objects into the database, we simply need to save them to the session:
Session=sessionmaker() session=Session() u = User() u.user_name='rick' u.password='foo' u.email_address='rick@pyatl.org' session.save(u) # tell SQLAlchemy to track the object session.flush() # actually perform the insert
To retrieve objects from the database, we need to first obtain a query object from the session and then use its methods to specify which objects we retrieve:
q = session.query(User) user = q.get(1) # retrieve by primary key # retrieve one object by property user = q.get_by(user_name='rick') # retrieve multiple objects users = list(q.filter_by(first_name=None)) # retrieve multiple objects using the SQL expression language users = list(q.filter(User.c.first_name==None))
Note that the filter_by
()
method takes keyword arguments whose names match the
mapped properties. This is often a useful shortcut because you avoid
having to type out “User.c.” over and over, but is less flexible than
the filter method, which can take arbitrary SQL expressions as its
criteria for selection. One powerful feature of SQLAlchemy is its ability, in the
filter_by
()
method, to automatically search your joined tables for a matching
column:
# Retrieve all users in a group named 'admin' users = list(q.filter_by(group_name='admin'))
SQLAlchemy will automatically search for tables with foreign key
relationships that contain the queried object to find columns to satisfy
the keyword arguments. This can be very powerful, but can also sometimes
find the wrong column, particularly if you are querying based on a
common column name, such as name
,
for instance. In this case, you can manually specify the joins that
SQLAlchemy will perform in the query via the join
()
method.
q = session.query(User) q = q.join('groups') # use the mapped property name for joins q = q.filter(Group.c.group_name=='admin') users = list(q)
You can even specify a “join chain” by using a list of properties
for the argument to join
()
:
q = session.query(User) # groups is a property of a User, permissions is a property of a ... Group q = q.join(['groups', 'permissions']) q = q.filter(Permission.c.permission_name=='admin') users = list(q)
The power of SQLAlchemy to construct complex queries becomes clear when we compare the previous code to the SQL generated:
SELECT tf_user.first_name AS tf_user_first_name, tf_user.last_name AS tf_user_last_name, tf_user.created AS tf_user_created, tf_user.user_name AS tf_user_user_name, tf_user.password AS tf_user_password, tf_user.email_address AS tf_user_email_address, tf_user.id AS tf_user_id FROM tf_user JOIN user_group ON tf_user.id = user_group.user_id JOIN tf_group ON tf_group.id = user_group.group_id JOIN group_permission ON tf_group.id = group_permission.group_id JOIN tf_permission ON tf_permission.id = ... group_permission.permission_id WHERE tf_permission.permission_name = ? ORDER BY tf_user.oid
Get Essential SQLAlchemy now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.