Preface

Ma, sendo l’intento mio scrivere cosa utile a chi la intende, mi è parso più conveniente andare drieto alla verità effettuale della cosa, che alla immaginazione di essa.

But, it being my intention to write a thing which shall be useful to him who apprehends it, it appears to me more appropriate to follow up the real truth of a matter than the imagination of it.

—Niccolò Machiavelli

Il Principe, XV

THERE IS A STORY BEHIND THIS BOOK. I HAD HARDLY FINISHED THE ART OF SQL, WHICH WASN’T ON sale yet, when my then editor, Jonathan Gennick, raised the idea of writing a book about SQL refactoring. SQL, I knew. But I had never heard about refactoring. I Googled the word. In a famous play by Molière, a wealthy but little-educated man who takes lessons in his mature years marvels when he discovers that he has been speaking “prose” for all his life. Like Monsieur Jourdain, I discovered that I had been refactoring SQL code for years without even knowing it—performance analysis for my customers led quite naturally to improving code through small, incremental changes that didn’t alter program behavior. It is one thing to try to design a database as best as you can, and to lay out an architecture and programs that access this database efficiently. It is another matter to try to get the best performance from systems that were not necessarily well designed from the start, or which have grown out of control over the years but that you have to live with. And there was something appealing in the idea of presenting SQL from a point of view that is so often mine in my professional life.

The last thing you want to do when you are done with a book is to start writing another one. But the idea had caught my fancy. I discussed it with a number of friends, one of whom is one of the most redoubtable SQL specialists I know. This friend burst into righteous indignation against buzzwords. For once, I begged to differ with him. It is true that the idea first popularized by Martin Fowler[1] of improving code by small, almost insignificant, localized changes may look like a fad—the stuff that fills reports by corporate consultants who have just graduated from university. But for me, the true significance of refactoring lies in the fact that code that has made it to production is no longer considered sacred, and in the recognition that a lot of mediocre systems could, with a little effort, do much better. Refactoring is also the acknowledgment that the fault for unsatisfactory performance is in ourselves, not in our stars—and this is quite a revelation in the corporate world.

I have seen too many sites where IT managers had an almost tragic attitude toward performance, people who felt crushed by fate and were putting their last hope into “tuning.” If the efforts of database and system administrators failed, the only remaining option in their view was to sign and send the purchase order for more powerful machines. I have read too many audit reports by self-styled database experts who, after reformatting the output of system utilities, concluded that a few parameters should be bumped up and that more memory should be added. To be fair, some of these reports mentioned that a couple of terrible queries “should be tuned,” without being much more explicit than pasting execution plans as appendixes.

I haven’t touched database parameters for years (the technical teams of my customers are usually competent). But I have improved many programs, fearlessly digging into them, and I have tried as much as I could to work with developers, rather than stay in my ivory tower and prescribe from far above. I have mostly met people who were eager to learn and understand, who needed little encouragement when put on the right tracks, who enjoyed developing their SQL skills, and who soon began to set performance targets for themselves.

When the passing of time wiped from my memory the pains of book writing, I took the plunge and began to write again, with the intent to expand the ideas I usually try to transmit when I work with developers. Database accesses are probably one of the areas where there is the most to gain by improving the code. My purpose in writing this book has been to give not recipes, but a framework to try to improve the less-than-ideal SQL applications that surround us without rewriting them from scratch (in spite of a very strong temptation sometimes).

Why Refactor?

Most applications bump, sooner or later, into performance issues. In the best of cases, the success of some old and venerable application has led it to handle, over time, volumes of data for which it had never been designed, and the old programs need to be given a new lease on life until a replacement application is rolled out in production. In the worst of cases, performance tests conducted before switching to production may reveal a dismal failure to meet service-level requirements. Somewhere in between, data volume increases, new functionalities, software upgrades, or configuration changes sometimes reveal flaws that had so far remained hidden, and backtracking isn’t always an option. All of those cases share extremely tight deadlines to improve performance, and high pressure levels.

The first rescue expedition is usually mounted by system engineers and database administrators who are asked to perform the magical parameter dance. Unless some very big mistake has been overlooked (it happens), database and system tuning often improves performance only marginally.

At this point, the traditional next step has long been to throw more hardware at the application. This is a very costly option, because the price of hardware will probably be compounded by the higher cost of software licenses. It will interrupt business operations. It requires planning. Worryingly, there is no real guarantee of return on investment. More than one massive hardware upgrade has failed to live up to expectations. It may seem counterintuitive, but there are horror stories of massive hardware upgrades that actually led to performance degradation. There are cases when adding more processors to a machine simply increased contention among competing processes.

The concept of refactoring introduces a much-needed intermediate stage between tuning and massive hardware injection. Martin Fowler’s seminal book on the topic focuses on object technologies. But the context of databases is significantly different from the context of application programs written in an object or procedural language, and the differences bring some particular twists to refactoring efforts. For instance:

Small changes are not always what they appear to be

Due to the declarative nature of SQL, a small change to the code often brings a massive upheaval in what the SQL engine executes, which leads to massive performance changes—for better or for worse.

Testing the validity of a change may be difficult

If it is reasonably easy to check that a value returned by a function is the same in all cases before and after a code change, it is a different matter to check that the contents of a large table are still the same after a major update statement rewrite.

The context is often critical

Database applications may work satisfactorily for years before problems emerge; it’s often when volumes or loads cross some thresholds, or when a software upgrade changes the behavior of the optimizer, that performance suddenly becomes unacceptable. Performance improvement work on database applications usually takes place in a crisis.

Database applications are therefore a difficult ground for refactoring, but at the same time the endeavor can also be, and often is, highly rewarding.

Refactoring Database Accesses

Database specialists have long known that the most effective way to improve performance is, once indexing has been checked, to review and tweak the database access patterns. In spite of the ostensibly declarative nature of SQL, this language is infamous for the sometimes amazing difference in execution time between alternative writings of functionally identical statements.

There is, however, more to database access refactoring than the unitary rewriting of problem queries, which is where most people stop. For instance, the slow but continuous enrichment of the SQL language over the years sometimes enables developers to write efficient statements that replace in a single stroke what could formerly be performed only by a complex procedure with multiple statements. New mechanisms built into the database engine may allow you to do things differently and more efficiently than in the past. Reviewing old programs in the light of new features can often lead to substantial performance improvements.

It would really be a brave new world if the only reason behind refactoring was the desire to rejuvenate old applications by taking advantage of new features. A sound approach to database applications can also work wonders on what I’ll tactfully call less-than-optimal code.

Changing part of the logic of an application may seem contradictory to the stated goal of keeping changes small. In fact, your understanding of what small and incremental mean depends a lot on your mileage; when you go to an unknown place for the very first time, the road always seems much longer than when you return to this place, now familiar, for the umpteenth time.

What Can We Expect from Refactoring?

It is important to understand that two factors broadly control the possible benefits of refactoring (this being the real world, they are conflicting factors):

  • First, the benefits of refactoring are directly linked to the original application: if the quality of the code is poor, there are great odds that spectacular improvement is within reach. If the code were optimal, there would be—barring the introduction of new features—no opportunity for refactoring, and that would be the end of the story. It’s exactly like with companies: only the badly managed ones can be spectacularly turned around.

  • Second, when the database design is really bad, refactoring cannot do much. Making things slightly less bad has never led to satisfactory results. Refactoring is an evolutionary process. In the particular case of databases, if there is no trace of initial intelligent design, even an intelligent evolution will not manage to make the application fit for survival. It will collapse and become extinct.

It is unlikely that the great Latin poet, Horace, had refactoring in mind when he wrote about aurea mediocritas, the golden mediocrity, but it truly is mediocre applications for which we can have the best hopes. They are in ample supply, because much too often “the first way that everyone agrees will functionally work becomes the design,” as wrote a reviewer for this book, Roy Owens.

How This Book Is Organized

This book tries to take a realistic and honest view of the improvement of applications with a strong SQL component, and to define a rational framework for tactical maneuvers. The exercise of refactoring is often performed as a frantic quest for quick wins and spectacular improvements that will prevent budget cuts and keep heads firmly attached to shoulders. It’s precisely in times of general panic that keeping a cool head and taking a methodical approach matter most. Let’s state upfront that miracles, by definition, are the preserve of a few very gifted individuals, and they usually apply to worthier causes than your application (whatever you may think of it). But the reasoned and systematic application of sound principles may nevertheless have impressive results. This book tries to help you define different tactics, as well as assess the feasibility of different solutions and the risks attached to different interpretations of the word incremental.

Very often, refactoring an SQL application follows the reverse order of development: you start with easy things and slowly walk back, cutting deeper and deeper, until you reach the point where it hurts or you have attained a self-imposed limit. I have tried to follow the same order in this book, which is organized as follows:

Chapter 1, Assessment

Can be considered as the prologue and is concerned with assessing the situation. Refactoring is usually associated with times when resources are scarce and need to be allocated carefully. There is no margin for error or for improving the wrong target. This chapter will guide you in trying to assess first whether there is any hope in refactoring, and second what kind of hope you can reasonably have.

The next two chapters deal with the dream of every manager: quick wins. I discuss in these chapters the changes that take place primarily on the database side, as opposed to the application program. Sometimes you can even apply some of those changes to “canned applications” for which you don’t have access to the code.

Chapter 2, Sanity Checks

Deals with points that must be controlled by priority—in particular, indexing review.

Chapter 3, User Functions and Views

Explains how user-written functions and an exuberant use of views can sometimes bring an application to its knees, and how you can try to minimize their impact on performance.

In the next three chapters, I deal with changes that you can make to the application proper.

Chapter 4, Testing Framework

Shows how to set up a proper testing framework. When modifying code it is critical to ensure that we still get the same results, as any modification—however small—can introduce bugs; there is no such thing as a totally risk-free change. I’ll discuss tactics for comparing before and after versions of a program.

Chapter 5, Statement Refactoring

Discusses in depth the proper approach to writing different SQL statements. Optimizers rewrite suboptimal statements. That is, this is what they are supposed to do. But the cleverest optimizer can only try to make the best out of an existing situation. I’ll show you how to analyze and rewrite SQL statements so as to turn the optimizer into your friend, not your foe.

Chapter 6, Task Refactoring

Goes further in Chapter 5’s discussion, explaining how changing the operational mode—and in particular, getting rid of row-based processing—can take us to the next level. Most often, rewriting individual statements results in only a small fraction of potential improvements. Bolder moves, such as coalescing several statements or replacing iterative, procedural statements with sweeping SQL statements, often lead to awe-inspiring gains. These gains demand good SQL skills, and an SQL mindset that is very different from both the traditional procedural mindset and the object-oriented mindset. I’ll go through a number of examples.

If you are still unsatisfied with performance at this stage, your last hope is in the next chapter.

Chapter 7, Refactoring Flows and Databases

Returns to the database and discusses changes that are more fundamental. First I’ll discuss how you can improve performance by altering flows and introducing parallelism, and I’ll show the new issues—such as data consistency, contention, and locking—that you have to take into account when parallelizing processes. Then I’ll discuss changes that you sometimes can bring, physically and logically, to the database structure as a last resort, to try to gain extra performance points.

And to conclude the book:

Chapter 8, How It Works: Refactoring in Practice

Provides a kind of summary of the whole book as an extended checklist. In this chapter I describe, with references to previous chapters, what goes through my mind and what I do whenever I have to deal with the performance issues of a database application. This was a difficult exercise for me, because sometimes experience (and gut instinct acquired through that experience) suggests shortcuts that are not really the conscious product of a clear, logical analysis. But I hope it will serve as a useful reference.

Appendix A, Scripts and Sample Programs, and Appendix B, Tools

Describe scripts, sample programs, and tools that are available for download from O’Reilly’s website for this book, http://www.oreilly.com/catalog/9780596514976.

Audience

This book is written for IT professionals, developers, project managers, maintenance teams, database administrators, and tuning specialists who may be involved in the rescue operation of an application with a strong database component.

Assumptions This Book Makes

This book assumes a good working knowledge of SQL, and of course, some comfort with at least one programming language.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates emphasis, new terms, URLs, filenames, and file extensions.

Constant width

Indicates computer coding in a broad sense. This includes commands, options, variables, attributes, keys, requests, functions, methods, types, classes, modules, properties, parameters, values, objects, events, event handlers, XML and XHTML tags, macros, and keywords. It also indicates identifiers such as table and column names, and is used for code samples and command output.

Constant width bold

Indicates emphasis in code samples.

Constant width italic

Shows text that should be replaced with user-supplied values.

Using Code Examples

This book is here to help you get your job done. In general, you may use the code in this book in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: "Refactoring SQL Applications by Stéphane Faroult with Pascal L’Hermite. Copyright 2008 Stéphane Faroult and Pascal L’Hermite, 978-0-596-51497-6.”

If you feel your use of code examples falls outside fair use or the permission given here, feel free to contact us at .

Comments and Questions

Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at:

http://www.oreilly.com/catalog/9780596514976

To comment or ask technical questions about this book, send email to:

For more information about our books, conferences, Resource Centers, and the O’Reilly Network, see our web site at:

http://www.oreilly.com

Safari® Books Online

When you see a Safari® Books Online icon on the cover of your favorite technology book, that means the book is available online through the O’Reilly Network Safari Bookshelf.

Safari offers a solution that’s better than e-books. It’s a virtual library that lets you easily search thousands of top tech books, cut and paste code samples, download chapters, and find quick answers when you need the most accurate, current information. Try it for free at http://safaribooksonline.com.

Acknowledgments

A book is always the result of the work of far more people than those who have their names on the cover. First I want to thank Pascal L’Hermite whose Oracle and SQL Server knowledge was extremely valuable as I wrote this book. In a technical book, writing is only the visible part of the endeavor. Setting up test environments, devising example programs, porting them to various products, and sometimes trying ideas that in the end will lead nowhere are all tasks that take a lot of time. There is much paddling below the float line, and there are many efforts that appear only as casual references and faint shadows in the finished book. Without Pascal’s help, this book would have taken even longer to write.

Every project needs a coordinator, and Mary Treseler, my editor, played this role on the O’Reilly side. Mary selected a very fine team of reviewers, several of them authors. First among them was Brand Hunt, who was the development editor for this book. My hearty thanks go to Brand, who helped me give this book its final shape, but also to Dwayne King, particularly for his attention both to prose and to code samples. David Noor, Roy Owens, and Michael Blaha were also very helpful. I also want to thank two expert long-time friends, Philippe Bertolino and Cyril Thankappan, who carefully reviewed my first drafts as well.

Besides correcting some mistakes, all of these reviewers contributed remarks or clarifications that found their way into the final product, and made it better.

When the work is over for the author and the reviewers, it just starts for many O’Reilly people: under the leadership of the production editor, copyediting, book designing, cover designing, turning my lousy figures into something more compatible with the O’Reilly standards, indexing—all of these tasks helped to give this book its final appearance. All of my most sincere thanks to Rachel Monaghan, Audrey Doyle, Mark Paglietti, Karen Montgomery, Marcia Friedman, Rob Romano, and Lucie Haskins.



[1] Fowler, M. et al. Refactoring: Improving the Design of Existing Code. Boston: Addison-Wesley Professional.

Get Refactoring SQL Applications now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.