Preface

Now and again you have a go at something outside your area of expertise. It could be car maintenance and gardening. Maybe you can do a bit of both, but you are not a professional in either one of them. This means that a trivial but insurmountable problem can hold you up; there’s a bolt that you can’t get to with your spanner or a tree root that you just can’t shift. When you give up and hire an expert you are impressed. The mechanic’s got 20 spanners and he knows which one will do the job; he also knows how to use his tools. The professional gardener isn’t scratching his head because he’s stuck; he knows half a dozen different ways to get that tree root out. He’s just trying to decide which one will require him to expend the least effort.

If you are that mechanic who’s already got 20 spanners for every job, the hacks in this book will add a few more to your collection. We know that getting the bolt off is just the start of the job; it’s once that bolt comes off that your work actually begins. We’re hoping that some of the tricks in this book will give you more tools for your toolkit, and the confidence to apply databases and SQL to new, interesting, and challenging problems.

If you are not an expert yet, and just an amateur getting started, proceed with caution. Some of the techniques in here can lead to trouble if you don’t know what you are doing. This book is not overly concerned with good database design and sticking to the rules; we’re hoping you know all those things already! This book is about getting the job done. However, even an amateur will get a lot out of this book, as it is packed with a range of hacks ideal for showing off how SQL can be used to solve both easy and challenging problems.

You might have learned SQL from practical problems that you had to solve. As a result, when you approach a new problem you tend to see it in terms of these old problems and their solutions. This can lead to all of your solutions looking very similar, with a particular style and approach, whereas a slightly different approach may have produced a more efficient or more easily understood solution. This book is all about suggesting new approaches to problems, and highlighting styles of problem solving which you may not be totally familiar with. The result should be a better understanding of the breadth of SQL, and how different approaches to a problem can result in simple and elegant solutions.

Why SQL Hacks?

The term hacking has a bad reputation in the press. They use it to refer to people who break into systems or wreak havoc with computers as their weapon. Among people who write code, though, the term hack refers to a “quick-and-dirty” solution to a problem, or a clever way to get something done. And the term hacker is taken very much as a compliment, referring to someone as being creative, having the technical chops to get things done. The Hacks series is an attempt to reclaim the word, document the good ways people are hacking, and pass the hacker ethic of creative participation on to the uninitiated. Seeing how others approach systems and problems is often the quickest way to learn about a new technology.

This book is a collection of 100 different hacks. Each hack involves a specific problem that you may have already seen before, but perhaps tackled in a way you wouldn’t have considered. The hacks range from solving simple, everyday problems, all the way to tackling complex data processing scenarios. Each hack may concentrate on a particular scenario, but you should be able to adapt them to a wide range of problems specific to your own challenges. Some of these hacks will leave you thinking, “I guess that’s one way to do it; thanks, but no thanks.” However, we hope that most will make you say, “Wow...I didn’t know SQL could do that.”

You should also be questioning the balance between SQL and your programming language. With a bit more understanding of SQL you can do more processing at the database, and as a result have less traffic between the database and your application. Nine times out of ten this approach is going to be faster and better. It’s all about letting your program do the things it’s good at, and letting the database do the things it’s good at.

How to Use This Book

You can read this book from cover to cover if you like, but each hack stands on its own, so feel free to browse and jump to the different sections that interest you most. If there’s a prerequisite you need to know about, a cross-reference will guide you to the right hack.

Almost all of the hacks work with any SQL-based system, and where there is an exception we’ve tried to show you as clearly as possible. There are still a number of minor but irritating differences between the major platforms; we’ve tried to explain that clearly but without obscuring the point and without repeating the entire solution for each platform. You may find the following information useful when trying to make your queries run:

String concatenation

The SQL standard dictates that 'foo' || 'bar' is the right way to concatenate strings. Both Oracle and PostgreSQL implement the standard. SQL Server reacts to this with a syntax error and MySQL treats || as the logical OR operator. SQL Server uses + to concatenate strings and MySQL uses the CONCAT function. We have chosen to use CONCAT on the grounds that there is less chance of confusion.

Dates

We have used the ISO standard date format for date literals, as in DATE '2007-05-20'. Users of SQL Server and DB2 can simply ignore the word DATE. We discuss some of the problems with date compatibility in “Convert Strings to Dates” [Hack #19].

Column aliases

The PostgreSQL system insists that where column aliases are used, they are preceded by the AS keyword, and we have found this to be a useful convention in all database systems. The SQL standard permits this style but does not mandate its use. Oracle, SQL Server, and MySQL adopt this, as well as the more relaxed policy of the SQL standard. The following should work everywhere:

SELECT 2*foo AS twice, 3*foo AS thrice FROM bar
Derived table aliases

SQL Server requires that derived tables be given an alias. Where the alias is redundant you will notice a trailing t in the examples. This ensures that the example works with SQL Server. So, for instance:

SELECT SUM(foo)
  FROM (SELECT foo FROM bar UNION SELECT -foo FROM bar) t
JOIN syntax

We have used the “ ANSI JOIN” syntax throughout. We have used the phrase LEFT OUTER JOIN in preference to LEFT JOIN. But when using an inner join we have not specified this explicitly with INNER JOIN, and instead simply use the keyword JOIN.

Capitalization

SQL statements are case insensitive. We shout out the SQL keywords as SELECT and FROM, and use camel case for the table names and column names that we have used:

SELECT camelCase FROM fooBar
Column names

We have in general used the contraction whn for columns with a DATE type. In many examples, the column name when, date, or day would be easier on the eye, but these are keywords and may not be used without decoration in some systems.

Semicolons

Where an SQL statement is given alone we do not show a statement separator. More commonly our examples are pasted from the MySQL command line (so you can see the output from running the queries), and in that case, the semicolon is included. The command-line clients used with other database engines use different statement separators (for example, Microsoft SQL Server’s sqlcmd uses GO on a line by itself). For more information, see “Run SQL from the Command Line” [Hack #1].

CAST

We have not used CAST as liberally as we might, as it has a tendency to make the queries much harder to read. Where an explicit CAST can be avoided we have done so. Users of Oracle will sometimes have to insert an appropriate CAST.

SQL Conventions

You will find a few references to the SQL standards. We have tried to be pragmatic and deal with each database system as they have been implemented, rather than as the language was defined.

Where it is impossible to phrase a statement that is acceptable to all of the systems—MySQL, SQL Server, Oracle, and PostgreSQL—we have used a form that is acceptable to at least two of the four. As MySQL is a relative newcomer, its designers have been able to build in compatibility with many of its competitors. For that reason, MySQL is usually one of the two systems that will accept the statement unchanged. That explains why most of the examples use the MySQL command-line utility.

The MySQL examples are based around the version 5.0 release. However, many examples will work with the 4.2 release. Note that some hacks involve features such as referential integrity and transaction isolation, and these are implemented only for InnoDB tables and not for MyISAM tables. You can set up your MySQL server so that InnoDB is the default, or you can include the phrase engine=InnoDB following the CREATE TABLE statement:

CREATE TABLE test(i INTEGER) engine=InnoDB;

We’ve used the term SQL Server to mean Microsoft SQL Server 2005. The SQL Server 2000 version is good enough for all but those examples that use the RANK() function.

PostgreSQL and Oracle users should have no problem using this book. Most of the hacks will run unchanged on both systems. Oracle has so many additional features that we have not had space to make a note every time we show a hack that can be improved using some Oracle-specific mechanism.

There is plenty here for the Access user, but we have not explicitly mentioned where a variation is required for Access users. In particular, Access users should use the function Date() for CURRENT_DATE, and note that 'foo' & 'bar' is the Access way to concatenate a pair of strings.

How This Book Is Organized

The book is divided into several chapters, organized by subject:

Chapter 1, SQL Fundamentals

This is a gentle introduction to running SQL from the command line and programs. It also touches on simple SQL constructs. Even if you are already comfortable with SQL, you may find the flexibility of the SQL shown to be surprising and instructive.

Chapter 2, Joins, Unions, and Views

The hacks in this chapter concentrate on ways to use more than one table in your SQL. Different strategies are examined and discussed. If you find yourself using subqueries more than JOIN, you may also find the methods for converting subqueries to JOINs helpful.

Chapter 3, Text Handling

SQL allows you to query text information in a variety of ways, and this chapter contains a number of hacks focused on efficient and effective text querying. In particular, there are hacks for solving anagrams and extracting substrings, as well as general wildcard searches.

Chapter 4, Date Handling

Dates in SQL can be tricky, especially when dealing with ranges and repeating patterns. For instance, suppose you want to calculate the second Tuesday of each month, or look for trends based on the day of the week. Both calculations are discussed, as well as other hacks involving date processing and report generation techniques.

Chapter 5, Number Crunching

Numbers are everywhere in a database. You might be tempted to extract raw numbers from the database and do any required processing in an external program. To do so would be to miss out on an opportunity to increase the overall performance of your applications. SQL has excellent computational capabilities, and using them can make your systems easier to understand and debug. This chapter contains a host of hacks for handling numbers, from report generation to complex spatial calculations.

Chapter 6, Online Applications

Networking and the Internet have made database systems applicable to a whole range of new applications and opportunities. Databases can help drive web sites, be directly controlled from a browser, and help close the gap between client and data. This chapter looks at a variety of hacks for using database systems in web-based activities.

Chapter 7, Organizing Data

Hacks in this chapter are concerned with how data can be represented in a database, how data can be managed as it is stored and retrieved, and how errors in information can be detected, managed, and repaired.

Chapter 8, Storing Small Amounts of Data

It is useful to parameterize queries using variables, treating the queries as functions and plugging in variables as needed. You can use standard SQL to provide parameterization, including variable scoping on a per-user or per-application basis. This chapter also looks at queries without tables, and support for inline tables.

Chapter 9, Locking and Performance

Getting the best performance out of your database system can be a full-time job. This chapter examines some common issues, and presents a number of hacks on isolation levels, locking, query partitioning, and result set management, all aimed at improving query performance and minimizing delays.

Chapter 10, Reporting

Report writing is a key part of many business systems. SQL queries for report generation require a different approach than those used for real-time querying. This can allow you to trade query performance for readability and maintainability. If a report query is executed only once per year, runtime performance is less important than ease of maintenance. This chapter includes a number of hacks for summarizing, processing, and analyzing report data.

Chapter 11, Users and Administration

This chapter presents a few useful hacks on user management, both in your applications and in your database systems. It also looks at the common pitfalls in packaging a database-enabled application for easy installation.

Chapter 12, Wider Access

Database systems are designed to share data. Allowing wider access to data brings with it additional problems. The hacks in this chapter look at how to manage a diverse range of users who have SQL-level access to your databases.

Conventions

The following is a list of the typographical conventions used in this book:

Italics

Used for emphasis and new terms where they are defined, and to indicate utilities, URLs, filenames, filename extensions, and directory/folder names. For example, a path in the filesystem will appear as /usr/local or C:\Users.

Constant width

Used to show code examples, the contents of files, console output, as well as the names of variables, commands, and other code excerpts.

Constant width bold

Used to highlight portions of code, either for emphasis or to indicate text that should be typed by the user.

Constant width italic

Used in code examples and tables to show sample text to be replaced with your own values.

Gray text

Used to indicate a cross-reference within the text.

You should pay special attention to notes set apart from the text with the following icons:

Tip

This is a tip, suggestion, or general note. It contains useful supplementary information about the topic at hand.

Warning

This is a warning or note of caution, often indicating that your money or your privacy might be at risk.

The thermometer icons, found next to each hack, indicate the relative complexity of the hack:

Using Code Examples

This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “SQL Hacks by Andrew Cumming and Gordon Russell. Copyright 2007 O’Reilly Media, Inc., 978-0-596-52799-0.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com.

How to Contact Us

We have tested and verified the information in this book to the best of our ability, but you may find that features have changed (or even that we have made mistakes!). As a reader of this book, you can help us to improve future editions by sending us your feedback. Please let us know about any errors, inaccuracies, bugs, misleading or confusing statements, and typos that you find anywhere in this book.

Please also let us know what we can do to make this book more useful to you. We take your comments seriously and will try to incorporate reasonable suggestions into future editions. You can write to us at:

O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the U.S. or Canada)
707-829-0515 (international/local)
707-829-0104 (fax)

To ask technical questions or to comment on the book, send email to:

bookquestions@oreilly.com

The web site for SQL Hacks lists examples, errata, and plans for future editions. You can find this page at:

http://www.oreilly.com/catalog/sqlhks

For more information about this book and others, see the O’Reilly web site:

http://www.oreilly.com

Got a Hack?

To explore Hacks books online or to contribute a hack for future titles, visit:

http://hacks.oreilly.com

Safari Enabled

When you see a Safari® Enabled icon on the cover of your favorite technology book, that means the book is available online through the O’Reilly Network Safari Bookshelf.

Safari offers a solution that’s better than e-books. It’s a virtual library that lets you easily search thousands of top tech books, cut and paste code samples, download chapters, and find quick answers when you need the most accurate, current information. Try it for free at http://safari.oreilly.com.

Get SQL Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.