Chapter 1. The Basics

In this chapter, we’ll cover the basics of getting started with PostgreSQL. This includes where to get binaries and drivers, what’s new and exciting in the latest 9.2 release, common administration tools, PostgreSQL nomenclature, and where to turn for help.

Where to Get PostgreSQL

Years ago, if you wanted PostgreSQL, you had to compile it from source. Thankfully, those days are gone. Granted, you can still compile should you so choose, but most users nowadays get their PostgreSQL with a prepackaged installer. A few clicks or keystrokes, and you’re on your way in 10 minutes or less.

If you’re installing PostgreSQL for the first time and have no existing database to upgrade, you should always install the latest stable release version for your OS. http://www.postgresql.org/download maintains a listing of places where you can download PostgreSQL binaries. In Installation Guides and Distributions, you’ll find installation guides and some other additional custom distributions that people we’ve talked to seem to like.

Notable PostgreSQL Forks

The fact that PostgreSQL has MIT/BSD style licensing makes it a great candidate for forking. Various groups have done exactly that over the years. Some have contributed their changes. Netezza, a popular database choice for data warehousing workloads, in its inception was a PostgreSQL fork. GreenPlum, used for data warehousing and analyzing petabytes of information, was a spinoff of Bizgres, which was a community-driven spinoff of PostgreSQL focused on Big Data. PostgreSQL Advanced Plus by EnterpriseDb is a fork of the PostgreSQL codebase—it adds Oracle syntax and compatibility features to woo Oracle users. EnterpriseDb does provide funding to the PostgreSQL community, and for this we’re grateful.

All the aforementioned are proprietary, closed source forks. tPostgres and Postgres-XC are two budding forks that we find interesting with open source licensing. tPostgres braches off PostgreSQL 9.2 and targets Microsoft SQL Server users. For instance, with tPostgres, you can write functions using T-SQL. Postgres-XC is a cluster server providing write-scalable, synchronous multi-master replication. What makes Postgres-XC special is that it supports distributed processing and replication. It is now at version 1.0.

Administration Tools

There are three popular tools for managing PostgreSQL and these are supported by PostgreSQL core developers; they tend to stay in synch with PostgreSQL versions. In addition, there are plenty of commercial offerings as well.

psql

psql is a command-line interface for writing queries and managing PostgreSQL. It comes packaged with some nice extras, such as an import and export commands for delimited files, and a reporting feature that can generate HTML output. psql has been around since the beginning of PostgreSQL and is a favorite of hardcore PostgreSQL users. Newer converts who are more comfortable with GUI tools tend to favor pgAdmin.

pgAdmin

This is the widely used, free, graphical administration tool for PostgreSQL. You can download it separately from PostgreSQL. pgAdmin runs on the desktop and can connect to multiple PostgreSQL servers regardless of version or OS. Even if you have your database server on a window-less Unix-based server, install pgAdmin and you’ll find yourself armed with a fantastic GUI. pgAdmin is pictured in Figure 1-1.

pgAdmin

Figure 1-1. pgAdmin

Some installers, such as those offered by EnterpriseDB, package pgAdmin with the database server install. If you’re unfamiliar with PostgreSQL, you should definitely start with pgAdmin. You’ll get a great overview and gain an appreciation of the richness of PostgreSQL just by exploring all the database objects in the main interface. If you’re coming from SQL Server and used Management Studio, you’ll feel right at home.

PHPPgAdmin

PHPPgAdmin Tool

Figure 1-2. PHPPgAdmin Tool

PHPPgAdmin, pictured in Figure 1-2, is a free, web-based administration tool patterned after the popular PHPMyAdmin for MySQL. PostgreSQL has many more kinds of database objects than MySQL, as such PHPPgAdmin is a step up from PHPMyAdmin with additions to manage schemas, procedural languages, casts, operators, and so on. If you’ve used PHPMyAdmin, you’ll find PHPPgAdmin to be nearly identical.

What’s New in Latest Versions of PostgreSQL?

The upgrade process gets simpler with each new version. There’s no reason not to always keep in step with the latest version. PostgreSQL is the fastest growing database technology today. Major versions come out almost annually. Each new version adds enhancements to ease of use, stability, security, performance, and avant-garde features. The lesson here? Always upgrade, and do so often.

Why Upgrade?

If you’re using PostgreSQL 8.2 or below: upgrade now! Enough said.

If you’re using PostgreSQL 8.3: upgrade soon! 8.3 will be reaching end-of-life in early 2013. Details about PostgreSQL EOL policy can be found here: PostgreSQL Release Support Policy. EOL is not a place you want to be. New security updates and fixes to serious bugs will no longer be available. You’ll need to hire specialized PostgreSQL core consultants to patch problems or to implement workarounds—probably not a cheap proposition, assuming you can even locate someone to begin with.

Regardless of which version you are using, you should always try to run the latest micro-versions for your version. An upgrade from say 8.4.8 to 8.4.11 requires just binary file replacement, which can be generally done with a quick restart after installing the upgrade. Only bug fixes are introduced in micro-versions, so there’s little cause for concern and can in fact save you grief.

What to Look for in PostgreSQL 9.2

At time of writing, PostgreSQL 9.1 is the latest stable release, and 9.2 is waiting in the wings to strut its stuff. All of the anticipated features in 9.2 are already set in stone and available in the 9.2 beta release. The following list discusses the most notable features:

  • Index-only scans. If you need to retrieve only columns that are already a part of an index, PostgreSQL will skip the need to go to the table. You’ll see significant speed improvement in these queries as well as aggregates such as COUNT(*).

  • Sorting improvements that improve in-memory sort operations by as much as 20%.

  • Improvements in prepared statements. A prepared statement is now parsed, analyzed, and rewritten, but not necessarily planned. It can also produce custom saved plans of a given prepared statement which are dependent on argument inputs. This reduces the chance that a prepared statement will perform worse than an equivalent ad-hoc query.

  • Cascading streaming replication supports streaming from a slave to another slave.

  • SP-GiST, another advance in GiST index technology using space filling trees. This should have great impact on the various extensions that rely on GiST for speed.

  • ALTER TABLE IF EXISTS syntax for making changes to tables.

  • Many new variants of ALTER TABLE ALTER TYPE commands that used to require whole table rewrites and rebuild of indexes. (More details are available at More Alter Table Alter Types.)

  • Even more pg_dump and pg_restore options. (Read our article at 9.2 pg_dump Enhancements.)

  • plv8js is a new language handler that allows you to create functions in JavaScript.

  • JSON built-in data type and companion functions row_to_json(), array_to_json(). This should be a welcome addition for web developers writing AJAX applications.

  • New range type class of types where a pair of values in data type forms a range, eliminating the need to cludge range-like functionality.

  • Allow SQL functions to reference arguments by name instead of by number.

PostgreSQL 9.1 Improvements

PostgreSQL 9.1 introduced enterprise features, making it an even more viable alternative to the likes of Microsoft SQL Server and Oracle:

  • More built-in replication features including synchronous replication.

  • Extensions management using the new CREATE EXTENSION, ALTER EXTENSION. Extensions make installing and removing add-ons a breeze.

  • ANSI-compliant foreign data wrappers for querying disparate data sources.

  • Writeable common table expressions (CTE). The syntactical convenience of CTEs now works for UPDATE and INSERT queries.

  • Unlogged tables speeds up queries against tables where logging is unnecessary.

  • Triggers on views. In prior versions, to make views updatable you used DO INSTEAD rules, which only supported SQL for programming logic. Triggers can be written in most procedural languages—except SQL—and opens the door for more complex abstraction using views.

  • KNN GiST adds improvement to popular extensions like full-text search, trigram (for fuzzy search and case insensitive search), and PostGIS.

Database Drivers

If you are using or plan to use PostgreSQL, chances are that you’re not going to use it in a vacuum. To have it interact with other applications, you’re going to need database drivers. PostgreSQL enjoys a generous number of freely available database drivers that can be used in many programming languages. In addition, there are various commercial organizations that provide drivers with extra bells and whistles at modest prices. Below, we’ve listed a few popular, open source ones:

  • PHP is a common language used to develop web applications, and most PHP distributions come packaged with at least one PostgreSQL driver. There is the older pgsql and the newer pdo_pgsql. You may need to enable them in your php.ini or do a yum install, but they are usually already there.

  • Java. If you are doing Java development, there are always updated versions of JDBC that support the latest PostgreSQL, which you can download from http://jdbc.postgresql.org.

  • For .NET. (Microsoft or Mono) you can use the Npgsql driver, which has source and binary versions for .NET Frameworks 3.5 and above, and Mono.NET.

  • If you need to connect from MS Access or some other Windows Office productivity software, download ODBC drivers from http://www.postgresql.org/ftp/odbc/versions/msi. The link includes both 32-bit and 64-bit ODBC drivers.

  • LibreOffice/OpenOffice. LibreOffice 3.5 (and above) comes packaged with a native PostgreSQL driver. For OpenOffice and older versions of LibreOffice, you can use a PostgreSQL JDBC driver or the SDBC driver. You can find details about connecting to these on our article OO Base and PostgreSQL.

  • Python is a beautiful language and has support for PostgreSQL via various Python database drivers; at the moment, Psycopg is the most popular.

  • Ruby. You can connect to PostgreSQL via rubypg.

  • Perl. You’ll find PostgreSQL connectivity support via DBI and the DBD:Pg driver or pure Perl DBD:PgPP driver from CPAN.

Server and Database Objects

So you installed PostgreSQL and open up pgAdmin. You expand the server tree. Before you is a bewildering array of database objects, some familiar and some completely foreign. PostgreSQL has more database objects than probably any other database, and that’s without considering add-ons. You’ll probably never touch many of these objects, but if you dream up a new functionality that you wish PostgreSQL would offer, more likely than not, it’s already implemented using one of those esoteric objects that you’ve been ignoring. This book is not even going to attempt to describe all that you’ll find in a PostgreSQL install. With PostgreSQL churning out features at breakneck speed, we can’t imagine any book that could possibly itemize all that PostgreSQL has to offer. We’ll now discuss the most commonly used database objects:

server service

The PostgreSQL server service is often just called a PostgreSQL server, or daemon. You can have more than one a physical server as long as they listen on different ports or IPs and have different places to store their respective data.

database

Each PostgreSQL server houses many databases.

table

Table are the workhorses of any database. What is unique about PostgreSQL tables is the inheritance support and the fact that every table automatically begets an accompanying custom data type. Tables can inherit from other tables and querying can bring up child records from child tables.

schema

Schemas are part of the ANSI-SQL standards, so you’ll see them in other databases. Schemas are the logical containers of tables and other objects. Each database can have multiple schemas.

tablespace

Tablespace is the physical location where data is stored. PostgreSQL allows tablespaces to be independently managed, which means you can easily move databases to different drives with just a few commands.

view

Most relational databases have views for abstracting queries. In PostgreSQL, you can also have views that can be updated.

function

Functions in PostgreSQL can return scalar value or sets of records. Aggregates are functions used with SQL constructs such as GROUP BY to summarize data. Most of the time, they return scalars but in PostgreSQL they can return composite objects.

operator

These are symbolic functions that have backing of a function. In PostgreSQL, you can define your own.

cast

Casts allow you to convert from one data type to another. They are supported by functions that actually perform the conversion. What is rare about PostgreSQL that you won’t find with many other databases is that you can create your own casts and thus change the default behavior of casting. Casting can be implicit or explicit. Implicit casts are automatic and usually will expand from a more specific to a more generic type. When an implicit cast is not offered, you must cast explicitly.

sequence

Sequence is what controls auto-incrementation in table definitions. They are usually automatically created when you define a serial column. Because they are objects in their own right, you could have multiple serial columns use the same sequence object, effectively achieveing uniqueness not only within the column but across them.

trigger

Found in many databases, triggers detect data change events and can react before or after the actual data is changed. PostgreSQL 9.0 introduced some special twists to this with the WHEN clause. PostgreSQL 9.1 added the extra feature of making triggers available for views.

foreign data wrappers

Foreign data wrappers allow you to query a remote data source whether that data source be another relational database server, flat file, a NoSQL database, a web service or even an application platform like SalesForce. They are found in SQL Server as linked tables, but PostgreSQL implementation follows the SQL/Management of External Data (MED) standard, and is open to connect to any kind of data source.

row/record

Rows and records generally mean the same thing. In PostgreSQL, rows can be treated independently from their respective tables. This distinction becomes apparent and useful when you write functions or use the row constructor in SQL.

extension

This is a new feature introduced in 9.1 that packages a set of functions, types, casts, indexes, and so forth into a single unit for maintainability. It is similar in concept to Oracle packages and is primarily used to deploy add-ons.

Where to Get Help

There will come a day when you need additional help. Since that day always arrives earlier than expected, we want to point you to some resources now rather than later. Our favorite is the lively mailing list network specifically designed for helping new and old users with technical issues. First, visit PostgreSQL Help Mailing Lists. If you are new to PostgreSQL, the best newsgroup to start with is PGSQL-General Mailing List. Finally, if you run into what appears to be a bug in PostgreSQL, report it at PostgreSQL Bug Reporting.

Get PostgreSQL: Up and Running now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.