Chapter 4. Date Handling

Hacks 1923

SQL is capable of handling just about any date calculation. The hacks in this chapter show how to get dates into your database, and how to get weekly, monthly, and quarterly reports out with a minimum of human intervention.

In many of the hacks described here, the reports are generated using the current date; however, it is usually a simple matter to use a user-specified parameter instead [Hack #58].

There are inconsistencies among the main database vendors regarding dates. For most of the hacks in this chapter, we used MySQL as the base example and we’ve shown the variations for SQL Server, Access, Oracle, and PostgreSQL.

Tip

You should be aware that the database system might be running on a system having a different time zone than the system your applications run on (perhaps your web server is in New York and your database server is in Chicago). To minimize clock and time zone discrepancies, you should use CURRENT_TIMESTAMP to generate times whenever possible.

Convert Strings to Dates

The SQL standard includes a complete set of rules which govern how dates should be represented and manipulated. Each vendor implementation of SQL has a variation of these rules.

The SQL standard has a DATE type for days and a TIMESTAMP type to represent a date and time. Examples of literals are DATE '2006-05-20' and TIMESTAMP '2006-06-18 10:09:05'. The ISO format used in both examples (the year followed by the month followed by the day) has the advantage of sorting correctly even when it’s represented as a string data type. It is also visibly different from both the American convention that puts the month first, and the European style that puts the day first.

Oracle, PostgreSQL, and MySQL adhere to the SQL standard for representing dates and timestamps, but Microsoft’s SQL Server and Access use a slightly different approach. SQL Server and Access will accept a date literal such as '2006-06-08', but they cannot handle the DATE prefix.

The DATE type does not exist in SQL Server; you should use the DATETIME type to represent both a date and a moment in time. SQL Server uses the term TIMESTAMP for an entirely different purpose.

Convert Your Dates

Suppose you have dates in user-supplied input in this format—6/18/2006—and you need to create date literals for an INSERT statement such as this one: DATE '2006-06-18'. Here’s how you can accomplish this in Perl:

foreach ('6/18/2006', '12/13/2006'){
  if (/(\d+)\/(\d+)\/(\d\d\d\d)/){ # Capture date parts into $1, $2, $3
    my $m = substr("0$1", -2); # Left-pad with zeros if needed
    my $d = substr("0$2", -2);
    my $y = $3;
    $sql = "INSERT INTO d VALUES (DATE '$y-$m-$d')";
    print "$sql\n";
  } else {
    warn "Could not parse date: $!";
  }
}

Warning

Note that we in-lined the user-supplied values directly into the INSERT statement. In theory, this would have opened us up to an SQL injection attack [Hack #48]. However, the input is fully sanitized in that the regular expression guarantees that $y, $m, and $d contain only digits (\d matches any one character between 0 and 9).

The output from this code is ready for use in MySQL, Oracle, PostgreSQL, or another engine that uses the SQL standard:

INSERT INTO d VALUES (DATE '2006-06-18');
INSERT INTO d VALUES (DATE '2006-12-13');

For Microsoft SQL Server, you need only drop the word DATE.

Table 4-1 shows some common variations of date formats.

Table 4-1. Finding a common date format
EngineDATE ’2006-06-01’’2006-6-1’’1 JUN 2006’
MySQLOKOKError
SQL ServerErrorOKOK
OracleOKErrorOK
PostgreSQLOKOKOK
DB2ErrorOKError
MimerOKErrorError
StandardOKErrorError

No single format works with every engine; you can’t do better than satisfy any two of the three most popular platforms (SQL Server, MySQL, and Oracle).

You also cannot publish even the simplest SQL data in a format that everyone can read. For a nasty solution you can publish dates, but you must capitalize the word date in an odd way—for example, DaTe '2006-06-01'. SQL Server and DB2 users must do a case-sensitive search and replace to remove the string DaTe, but users of other engines can just slurp the file into their engines directly. The advantage of using an unusual form of capitalization is that the SQL engines don’t care, but the string DaTe is unlikely to occur in any other part of the file, so it’s really easy to pick up with a conversion script (you also could pipe or redirect your SQL to a one-liner, such as perl -pe 's/DaTe//g').

Warning

If MySQL comes across a date format it doesn’t recognize (such as '1 JUN 2006'), it accepts it without raising an error and puts in the value DATE '0000-00-00'. However, if you check your warnings, you’ll see that something went wrong:

mysql> insert into d values ('1 JUN 2006');Query OK, 1 row affected, 1 warning (0.13 sec)mysql> show warnings\G*************** 1. row ***************  Level: Warning   Code: 1265Message: Data truncated for column 'd' at row 11 row in set (0.00 sec)

If you are reading data in from another system you may be able to pass date strings in their original format and do the parsing in SQL. The Oracle example shows the general technique, but you’ll need to use different SQL functions for your database. We’ll show you those after the Oracle example.

Parse Dates with Oracle

Oracle has a neat function called TO_DATE, which allows you to specify the pattern used in your input string:

INSERT INTO d VALUES (TO_DATE('1 Jun 2006', 'dd Mon yyyy'))

You can specify a wide range of formats that include “filler” characters other than a space.

Using this technique, you could write a simple Perl script, for example. If Perl has read a string such as '1 Jun 2006' into the variable $v, you could generate the SQL as:

my $sql = "INSERT INTO d VALUES (TO_DATE('$v', 'dd Mon yyyy'))";

If your dates are coming from an untrusted source, you should still check the pattern to guard against SQL injection attacks:

if ($v !~ /^\d+ \w\w\w \d\d\d\d$/) {
 warn "Injection attack.";
}

If you were using XSLT, you might want to use code such as this:

<stylesheet xmlns="http://www.w3.org/1999/XSL/Transform">
  <template match="foo">
    INSERT INTO dd VALUES (
      TO_DATE('<value-of select='@bar'/>',
             ,'dd Mon yyyy'))
  </template>
</stylesheet>

That sheet would take care of input such as <foo bar='1 Jun 2006'/>.

Parse Dates with MySQL

MySQL has a similar function, called STR_TO_DATE. This works with the format strings in MySQL format:

INSERT INTO d VALUES (STR_TO_DATE('1 Jun 2006', '%d %b %Y'));

%b represents the abbreviated month name, %d is the day of the month, and %Y is a four-digit year.

Parse Dates with SQL Server

If your input format is a fixed size (with leading zeros), combine the SUBSTRING function to build the string. Convert a string such as '06/18/2006' into a date:

INSERT INTO d
  SELECT SUBSTRING(x,7,4)+'-'+
         SUBSTRING(x,1,2)+'-'+
         SUBSTRING(x,4,2)
    FROM (SELECT '06/18/2006' AS x) y;
               

Uncover Trends in Your Data

Statistics gathered daily could contain both daily cycles and weekly trends. This can lead to chaotic-looking graphs when activity is plotted day by day. You can improve your graphs easily using SQL.

Look at Figure 4-1, which shows a chart of the raw figures for the number of page views for a web site per day, over the course of one year. These figures come from Webalizer, the web log analysis program (http://www.mrunix.net/webalizer). It is difficult to see the trends because the weekly cycle overwhelms the daily detail, and obscures the long-term trend.

Page views per day over one year
Figure 4-1. Page views per day over one year

To understand the data, you need to separate the effect of the weekly cycle from the table. You can see the weekly cycle by taking the average for Monday, the average for Tuesday, and so forth. In Figure 4-2, Monday to Sunday are numbered 0 to 6.

Average page views by day of week
Figure 4-2. Average page views by day of week

Notice that the value of the Sunday column (column 6) is less than half the value of the midweek columns. This is helping to cause the zigzag pattern in the original graph. If you view the data averaged per week (see Figure 4-3) rather than per day, it is easier to see the long-term trend.

Smoothed data — page views averaged for one week
Figure 4-3. Smoothed data — page views averaged for one week

Tip

The graphs and charts shown here come from Excel. Many spreadsheet applications, including Excel, have tools for importing directly from databases and producing a variety of graphical reports.

Before you can isolate these trends, you must turn the dates into integers to more easily put them into the appropriate buckets. You can pick an arbitrary date and start counting from there. Table 4-2 shows some source data.

Table 4-2. Page views by date
whnpages
2005-06-1113368
2005-06-128129
2005-06-1344043
... 

In Table 4-3, I’ve chosen the first day of the millennium, Monday, January 1, 2001, as day zero. Every date must be converted into the number of days since then. The mechanism for converting to integers is different on different engines. In MySQL, you can create this view using the TO_DAYS function:

CREATE VIEW webalizer2 AS
  SELECT TO_DAYS(whn)-TO_DAYS(DATE '2001-01-01') whn, pages
    FROM webalizer;
Table 4-3. Converting dates to integers
whnpages
162213368
16238129
162444043
... 

With dates now represented by integers, you can perform arithmetic on them. Taking the modulus 7 value gives you the day of the week. Because 2001-01-01 was a Monday, you will get 0 on every seventh day from then. Tuesday will give you 1, Wednesday 2, and so on, with Sunday having the value 6.

Modular Arithmetic

Look at the values for whn%7 and FLOOR(whn/7). You can see that day number 1,622 (counting from 2001-01-01) is day number 5 of week number 231:

mysql> SELECT whn, whn%7, whn/7, FLOOR(whn/7)
    ->   FROM webalizer2;
+------+-------+----------+--------------+
| whn  | whn%7 | whn/7    | FLOOR(whn/7) |
+------+-------+----------+--------------+
| 1622 |     5 | 231.7143 |          231 |
| 1623 |     6 | 231.8571 |          231 |
| 1624 |     0 | 232.0000 |          232 |
| 1625 |     1 | 232.1429 |          232 |
| 1626 |     2 | 232.2857 |          232 |
| 1627 |     3 | 232.4286 |          232 |
| 1628 |     4 | 232.5714 |          232 |
| 1629 |     5 | 232.7143 |          232 |
| 1630 |     6 | 232.8571 |          232 |
| 1631 |     0 | 233.0000 |          233 |
| 1632 |     1 | 233.1429 |          233 |
...

You need to GROUP BY the whn%7 column to see the weekly cycle and GROUP BY the FLOOR(whn/7) column to see the trend.

To look at the intra-week pattern shown back in Figure 4-2, you take the average with GROUP BY whn%7:

mysql> SELECT whn%7, AVG(pages)
    ->  FROM webalizer2 GROUP BY whn%7;
+-------+------------+
| whn%7 | AVG(pages) |
+-------+------------+
|     0 | 21391.6731 |
|     1 | 23695.1538 |
|     2 | 23026.2308 |
|     3 | 24002.8077 |
|     4 | 19773.9808 |
|     5 | 10353.5472 |
|     6 | 10173.9423 |
+-------+------------+

To smooth out the data over the whole year, as shown in Figure 4-3, you can divide by 7 and take the integer value using the FLOOR function:

mysql> SELECT FLOOR(whn/7), AVG(pages)
    ->  FROM webalizer2 GROUP BY FLOOR(whn/7);
+--------------+------------+
| FLOOR(whn/7) | AVG(pages) |
+--------------+------------+
|          231 | 10748.5000 |
|          232 | 23987.8571 |
|          233 | 19321.1429 |
|          234 | 15347.0000 |
...

The value for the first week is artificially low—by chance, it includes two on only two days, and they are on weekends. Something similar might happen at the end of the interval, so it is safest to exclude any week that does not have seven entries. The HAVING clause will take care of that:

mysql> SELECT FLOOR(whn/7), AVG(pages)
    ->  FROM webalizer2 GROUP BY FLOOR(whn/7)
    ->  HAVING COUNT(*)=7;
+--------------+------------+
| FLOOR(whn/7) | AVG(pages) |
+--------------+------------+
|          232 | 23987.8571 |
|          233 | 19321.1429 |
|          234 | 15347.0000 |
...

This will work fine with MySQL and PostgreSQL, but you need to make a few alterations for SQL Server, Access, and Oracle.

SQL Server

Here’s how to create the view that represents dates as integers:

CREATE VIEW webalizer2 AS
  SELECT CONVERT(INT,whn-'2001-01-01') whn, pages
    FROM webalizer

The SELECT statements shown earlier will run unmodified.

Access

In Access, you can use Int(whn - #2001-01-01#) to extract the number of days since January 1, 2001:

SELECT Int(whn - #2001-01-01#), pages
  FROM webalizer

Also, MOD is an infix operator used in place of %:

SELECT whn MOD 7, AVG(pages)
  FROM webalizer2 GROUP BY whn MOD 7;

Oracle

Here’s how to create the view that represents dates as integers:

CREATE VIEW webalizer2 AS
  SELECT whn-DATE '2001-01-01' whn, pages
    FROM webalizer;

In Oracle, the module function is MOD, so you’d need to use that rather than whn%7:

SELECT MOD(whn,7), AVG(pages)
  FROM webalizer2 GROUP BY MOD(whn,7);

Report on Any Date Criteria

A report may depend on ranges of dates that can be tricky to calculate. Monthly totals are pretty straightforward; but how about current month, last month, and year to date?

To report performance indicators you need to generate values for specific time periods. Business analysts commonly are interested in the current month compared to the preceding month, or the corresponding period in the preceding year. You can do all of this in SQL.

In the examples that follow, the original data is in a table, t. This table records individual incidents of paperclip usage. Every row contains the date (whn) and the number of paperclips used (v):

mysql> SELECT * FROM t;
+------------+------+
| whn        | v    |
+------------+------+
| 2006-01-07 |   53 |
| 2006-01-13 |   46 |
| 2006-01-18 |   99 |
| 2006-01-19 |   15 |
| 2006-01-26 |    9 |
...

Monthly Totals

If you want to see monthly totals, you must include the year and the month in the GROUP BY expression:

mysql> SELECT YEAR(whn), MONTH(whn), COUNT(v), SUM(v)
    ->   FROM t
    ->  GROUP BY YEAR(whn),MONTH(whn);
+-----------+------------+----------+--------+
| YEAR(whn) | MONTH(whn) | COUNT(v) | SUM(v) |
+-----------+------------+----------+--------+
|      2006 |          1 |        7 |    348 |
|      2006 |          2 |        5 |    329 |
|      2006 |          3 |       10 |    585 |
|      2006 |          4 |        8 |    293 |
|      2006 |          5 |        7 |    413 |
|      2006 |          6 |        8 |    465 |
|      2006 |          7 |        6 |    206 |
|      2006 |          8 |        9 |    456 |
|      2006 |          9 |        4 |    217 |
|      2006 |         10 |       10 |    401 |
|      2006 |         11 |        9 |    540 |
|      2006 |         12 |        7 |    402 |
|      2007 |          1 |        2 |    139 |
|      2007 |          2 |       13 |    800 |
|      2007 |          3 |       14 |    674 |
|      2007 |          4 |        6 |    456 |
|      2007 |          5 |        4 |    171 |
+-----------+------------+----------+--------+

Tip

In MySQL and PostgreSQL, you can implicitly cast a date to a string and you can use that to extract the year and month. For example:

SELECT SUBSTRING(whn,1,7), COUNT(v), SUM(v) GROUP BY SUBSTRING(whn,1,7)

You can combine the year and month into a single number if you want. If you multiply the year by 100 and add the month you can be certain that each month will be distinct and sortable. Also, the resulting number is human readable and is suitable for processing as a string; you can easily turn it back into a date [Hack #19]:

mysql> SELECT 100*YEAR(whn)+MONTH(whn), COUNT(v), SUM(v)
    ->   FROM t
    ->  GROUP BY 100*YEAR(whn)+MONTH(whn);
+--------------------------+----------+--------+
| 100*YEAR(whn)+MONTH(whn) | COUNT(v) | SUM(v) |
+--------------------------+----------+--------+
|                   200601 |        7 |    348 |
|                   200602 |        5 |    329 |
|                   200603 |       10 |    585 |
|                   200604 |        8 |    293 |
|                   200605 |        7 |    413 |
|                   200606 |        8 |    465 |
|                   200607 |        6 |    206 |
|                   200608 |        9 |    456 |
|                   200609 |        4 |    217 |
|                   200610 |       10 |    401 |
|                   200611 |        9 |    540 |
|                   200612 |        7 |    402 |
|                   200701 |        2 |    139 |
|                   200702 |       13 |    800 |
|                   200703 |       14 |    674 |
|                   200704 |        6 |    456 |
|                   200705 |        4 |    171 |
+--------------------------+----------+--------+

Tip

MySQL, Access, and SQL Server support the nonstandard functions MONTH and YEAR; Oracle does not.

Oracle and MySQL support the SQL92 function EXTRACT, as in EXTRACT(MONTH FROM whn). Oracle also has the TO_CHAR function, so it allows TO_CHAR(whn, 'yyyymm'), for example.

Current Month

If you want to see the data for the current month you can test both month and year in the WHERE clause:

mysql> SELECT * FROM t
    ->  WHERE MONTH(whn)=MONTH(CURRENT_DATE)
    ->    AND  YEAR(whn)=YEAR(CURRENT_DATE)
    ->  ORDER BY whn;
+------------+------+
| whn        | v    |
+------------+------+
| 2006-06-07 |   96 |
| 2006-06-11 |    4 |
| 2006-06-12 |   78 |
| 2006-06-12 |   36 |
| 2006-06-17 |   57 |
| 2006-06-29 |   74 |
| 2006-06-29 |   94 |
| 2006-06-30 |   26 |
+------------+------+

If you want to see the data for the preceding month do not change MONTH(whn)=MONTH(CURRENT_DATE) to MONTH(whn)=MONTH(CURRENT_DATE)-1. If you do that you will get data from the wrong year when you run this query in January. Instead, you need to subtract one month from CURRENT_DATE. It is a little neater if you do the date calculation in a nested SELECT:

mysql> SELECT * FROM t,
    ->   (SELECT CURRENT_DATE - INTERVAL 1 MONTH lastMnth) p
    ->     WHERE MONTH(whn)=MONTH(lastMnth)
    ->       AND  YEAR(whn)=YEAR(lastMnth);
+------------+------+------------+
| whn        | v    | lastMnth   |
+------------+------+------------+
| 2006-05-04 |   43 | 2006-05-23 |
| 2006-05-06 |   55 | 2006-05-23 |
| 2006-05-08 |   89 | 2006-05-23 |
| 2006-05-15 |   87 | 2006-05-23 |
| 2006-05-22 |   90 | 2006-05-23 |
| 2006-05-29 |   22 | 2006-05-23 |
| 2006-05-30 |   27 | 2006-05-23 |
+------------+------+------------+

SQL Server

In SQL Server, you should use the DATEADD function. You can specify the interval as m for month and the number of months as -1:

SELECT * FROM t,
  (SELECT DATEADD(m,-1,GETDATE()) lastMnth) p
  WHERE MONTH(whn)=MONTH(lastMnth)
    AND  YEAR(whn)=YEAR(lastMnth);

Oracle

You need to use the dual table in a subselect in Oracle. Also, you can use the TO_CHAR function to match the year and month in a single function:

SELECT * FROM t,
  (SELECT CURRENT_DATE - INTERVAL '1' MONTH lastMnth FROM dual)
  WHERE TO_CHAR(whn,'yyyymm')=TO_CHAR(lastMnth,'yyyymm');

Year-to-Date Totals

To calculate year-to-date totals you must make sure that the year matches the current date and that the records occur on or before the current date:

mysql> SELECT COUNT(v), SUM(v) FROM t
    ->   WHERE whn <= CURRENT_DATE
    ->     AND YEAR(whn)=YEAR(CURRENT_DATE);
+----------+--------+
| COUNT(v) | SUM(v) |
+----------+--------+
|       42 |   2239 |
+----------+--------+

Fiscal year to date

Suppose you are reporting over a 365-day period, but your year does not start on January 1. This is the case with reports over a fiscal year or tax year.

For instance, say that your fiscal year starts on April 6. Calculating which dates are in the current fiscal year is rather complicated; the easiest thing to do is to work with the number of days between January 1 and April 6. You can get SQL to do the calculation as follows:

mysql> select DATEDIFF(DATE '2006-04-06',DATE '2006-01-01');
+-----------------------------------------------+
| DATEDIFF(DATE '2006-04-06',DATE '2006-01-01') |
+-----------------------------------------------+
|                                            95 |
+-----------------------------------------------+

In SQL Server, the DATEDIFF function needs another parameter. You use 'd' to indicate that you want the result as the number of days: DATEDIFF('d', '2006-04-06','2006-01-01').

In Oracle, you can simply subtract dates to get the number of days between them as an integer: DATE '2006-04-06' - DATE '2006-01-01'.

Once you have this offset you can determine the relevant fiscal year by subtracting this from both the date to be tested and the current date. This means that you don’t have to worry about the different cases. In this example, March 29, 2006 is in fiscal year 2005, but April 20, 2006 is in fiscal year 2006:

mysql> SELECT whn,
    ->        YEAR(whn - INTERVAL '95' DAY)          whnFiscalYear,
    ->        YEAR(CURRENT_DATE - INTERVAL '95' DAY) currentFiscalYear
    ->   FROM t
    ->  WHERE whn IN (DATE '2006-03-29', DATE '2006-04-20');
+------------+---------------+-------------------+
| whn        | whnFiscalYear | currentFiscalYear |
+------------+---------------+-------------------+
| 2006-03-29 |          2005 |              2006 |
| 2006-04-20 |          2006 |              2006 |
+------------+---------------+-------------------+

You can then use this as a condition to ensure that you are reporting on only the current fiscal year:

mysql> SELECT MIN(whn),MAX(whn), COUNT(v), SUM(v) FROM t
    ->    WHERE whn <= CURRENT_DATE
    ->      AND YEAR(whn - INTERVAL '95' DAY)=
    ->          YEAR(CURRENT_DATE - INTERVAL '95' DAY);
+------------+------------+----------+--------+
| MIN(whn)   | MAX(whn)   | COUNT(v) | SUM(v) |
+------------+------------+----------+--------+
| 2006-04-09 | 2006-06-17 |       28 |   1443 |
+------------+------------+----------+--------+

The minimum and maximum relevant dates are included in the output. This is a complicated expression and you might want to check by hand that the MIN(whn) value shown matches the first record following 2006-04-06 and that the MAX(whn) value is the last record to the current date.

In SQL Server, you can invoke the DATEADD function: DATEADD('d', whn, -95).

Perhaps your fiscal year is not a fixed number of days relative to January 1. In that case, you really have no alternative than to record the start-of-year dates in a table.

Suppose the taxYear table was created with the following format:

mysql> SELECT * FROM taxYear;
+------------+
| strt       |
+------------+
| 2005-04-06 |
| 2006-04-06 |
| 2007-04-07 |
+------------+

You can perform the same calculation as performed earlier:

mysql> SELECT MIN(whn), MAX(whn), COUNT(v), SUM(v)
    ->   FROM t,
    ->     (SELECT MAX(strt) txStrt FROM taxYear 
    ->       WHERE strt < CURRENT_DATE) tx
    ->  WHERE whn >= txStrt AND whn <= CURRENT_DATE;
+------------+------------+----------+--------+
| MIN(whn)   | MAX(whn)   | COUNT(v) | SUM(v) |
+------------+------------+----------+--------+
| 2006-04-09 | 2006-06-17 |       28 |   1443 |
+------------+------------+----------+--------+

Generate Quarterly Reports

A quarterly report aggregates three months’ worth of figures. SQL has all the functions you need to get this aggregation.

Suppose you have figures that you need to report on by quarter. The source of your data is just a list of dates and values, as shown in Table 4-4.

Table 4-4. The sale table
whnamount
2005-01-062
2005-03-148
2005-04-024

In a quarterly report, you need to SUM all the figures relating to January, February, and March into Q1. So the first two rows of Table 4-4 contribute to the Q1 total for 2005. The 2005-04-02 row occurred in April, so you should add it to the Q2 total for 2005.

You can use the MONTH function to extract the month as a number, with January, February, and March appearing as 1, 2, and 3.

If you also group on the year, each quarter of your input will correspond to exactly one cell in the output grid:

mysql> SELECT YEAR(whn) AS yr
    ->    ,SUM(CASE WHEN MONTH(whn) IN (1,2,3)    THEN amount END) AS Q1
    ->    ,SUM(CASE WHEN MONTH(whn) IN (4,5,6)    THEN amount END) AS Q2
    ->    ,SUM(CASE WHEN MONTH(whn) IN (7,8,9)    THEN amount END) AS Q3
    ->    ,SUM(CASE WHEN MONTH(whn) IN (10,11,12) THEN amount END) AS Q4
    -> FROM sale
    -> GROUP BY YEAR(whn);
+------+------+------+------+------+
| yr   | Q1   | Q2   | Q3   | Q4   |
+------+------+------+------+------+
| 2005 |   10 |   40 |   80 |  660 |
| 2006 |   30 |   20 | NULL | NULL |
+------+------+------+------+------+

Unfortunately, the YEAR and MONTH functions are not implemented in Oracle. However, the SQL standard EXTRACT function works just as well:

SQL> SELECT EXTRACT(YEAR FROM whn) AS yr
  2     ,SUM(CASE WHEN EXTRACT(MONTH FROM whn) IN (1,2,3)
  3        THEN amount END) AS Q1
  4     ,SUM(CASE WHEN EXTRACT(MONTH FROM whn) IN (4,5,6)
  5        THEN amount END) AS Q2
  6     ,SUM(CASE WHEN EXTRACT(MONTH FROM whn) IN (7,8,9)
  7        THEN amount END) AS Q3
  8     ,SUM(CASE WHEN EXTRACT(MONTH FROM whn) IN (10,11,12)
  9        THEN amount END) AS Q4
 10   FROM sale
 11  GROUP BY EXTRACT(YEAR FROM whn);

        YR         Q1         Q2         Q3         Q4
---------- ---------- ---------- ---------- ----------
      2005         10         40         80        660
      2006         30         20

Hacking the Hack

You may want to pivot the rows and columns of the report. In standard SQL, you have to apply some math:

mysql> SELECT FLOOR((EXTRACT(MONTH FROM whn)-1)/3)+1 AS Quarter
    ->   ,SUM(CASE WHEN EXTRACT(YEAR,whn)=2005 THEN amount END) AS Y2005
    ->   ,SUM(CASE WHEN EXTRACT(YEAR,whn)=2006 THEN amount END) AS Y2006
    -> FROM sale
    -> GROUP BY FLOOR((EXTRACT(MONTH FROM whn)-1)/3)+1;
+---------+-------+-------+
| Quarter | Y2005 | Y2006 |
+---------+-------+-------+
|       1 |    10 |    30 |
|       2 |    40 |    20 |
|       3 |    80 |  NULL |
|       4 |   660 |  NULL |
+---------+-------+-------+

The expression FLOOR((MONTH(whn)-1)/3)+1 calculates the quarter for the input date whn. You can see how it works if you look at the calculation one step at a time:

mysql> SELECT whn, EXTRACT(MONTH FROM whn)                'Month',
    ->             EXTRACT(MONTH FROM whn)-1              'Subtract 1',
    ->             (EXTRACT(MONTH FROM whn)-1)/3          'Divide by 3',
    ->             FLOOR((EXTRACT(MONTH FROM whn)-1)/3)   'Ignore Fraction',
    ->             FLOOR((EXTRACT(MONTH FROM whn)-1)/3)+1 'Add 1'
    -> FROM sale
    -> WHERE YEAR(whn)=2006;
+------------+-------+------------+-------------+-----------------+-------+
| whn        | Month | Subtract 1 | Divide by 3 | Ignore Fraction | Add 1 |
+------------+-------+------------+-------------+-----------------+-------+
| 2006-01-01 |     1 |          0 |      0.0000 |               0 |     1 |
| 2006-02-01 |     2 |          1 |      0.3333 |               0 |     1 |
| 2006-03-01 |     3 |          2 |      0.6667 |               0 |     1 |
| 2006-04-01 |     4 |          3 |      1.0000 |               1 |     2 |
| 2006-05-01 |     5 |          4 |      1.3333 |               1 |     2 |
+------------+-------+------------+-------------+-----------------+-------+

Each vendor has a function to extract the QUARTER and the YEAR from a date. In MySQL, these functions are QUARTER and YEAR:

mysql> SELECT QUARTER(whn)
    ->   ,SUM(CASE WHEN YEAR(whn)=2005 THEN amount END) AS Y2005
    ->   ,SUM(CASE WHEN YEAR(whn)=2006 THEN amount END) AS Y2006
    -> FROM sale
    -> GROUP BY QUARTER(whn);
+--------------+-------+-------+
| QUARTER(whn) | Y2005 | Y2006 |
+--------------+-------+-------+
|            1 |    10 |    30 |
|            2 |    40 |    20 |
|            3 |    80 |  NULL |
|            4 |   660 |  NULL |
+--------------+-------+-------+

The trick is to GROUP BY the quarter and use the CASE statement to extract only one year in each column. There are some database-specific variations to keep in mind:

  • MySQL uses QUARTER(whn) and YEAR(whn) as shown.

  • In SQL Server, you can use DATEPART(QUARTER,whn) and YEAR(whn).

  • In Oracle, you can use TO_CHAR(whn,'Q') and TO_CHAR(whn,'YYYY') for the quarter and year.

  • In PostgreSQL, you can use EXTRACT(QUARTER FROM whn) and EXTRACT(YEAR FROM whn).

  • In Access, you can use DatePart("q", whn) and YEAR(whn).

Second Tuesday of the Month

You can find “floating” calendar dates, such as the second Tuesday of the month, with modular arithmetic and brute force reasoning.

The formula to calculate the second Tuesday of the month depends on the day of the week of the first day of the month. But which month? Obviously, all you need to know are the year and month, but it’s easier if you start with a date: the date of the first day of that month. For testing purposes, use the following table:

CREATE TABLE monthdates(monthdate DATE NOT NULL PRIMARY KEY);
INSERT INTO monthdates(monthdate) VALUES (DATE '2007-04-01');
INSERT INTO monthdates(monthdate) VALUES (DATE '2007-05-01');
INSERT INTO monthdates(monthdate) VALUES (DATE '2007-06-01');

So, given a date, the first day of some month, what date is the second Tuesday of that month? The process for obtaining the solution begins by calculating the day of the week for the first day of the month.

Day-of-Week Function

Although standard SQL does not provide a function to give the day of the week for any date, most database systems do. Table 4-5 shows some of the functions that can accomplish this.

Table 4-5. Day-of-the-week functions
DatabaseFunction
MySQL DAYOFWEEK(date)
Oracle TO_CHAR(date,'D')
SQL Server DATEPART(DW,date)
PostgreSQL EXTRACT(dow FROM date)
Access DatePart("w", date)

The essence of these functions is that they will return a number between 0 and 6, or between 1 and 7. Sometimes 0 (or 1) is Sunday and 6 (or 7) is Saturday, and sometimes 0 (or 1) is Monday and 6 (or 7) is Sunday.

Warning

Make sure you know how your database system is set up, because some database systems have local settings that affect the weekday number returned, such as NLS_TERRITORY in Oracle and DATEFIRST in SQL Server. MySQL offers WEEKDAY(date), which returns 1 (Monday) through 7 (Sunday), as well as DAYOFWEEK(date), which returns 1 (Sunday) through 7 (Saturday).

The following formula uses the range 1 (Sunday) through 7 (Saturday). If your database system has no easy way to produce this range, but you can produce some other similar range, then you can alter the formula easily once you understand how it works.

The Formula

Converting the first day of the month into the second Tuesday of the month simply involves manipulating the day of the week of the first day with an arithmetic formula. Before you see the formula, you should review what happens when the first day of the month falls on each day of the week, from Sunday through Saturday.

If the first day of the month is:

  1. A Sunday, the third is a Tuesday, so the tenth is the second Tuesday.

  2. A Monday, the second is a Tuesday, so the ninth is the second Tuesday.

  3. A Tuesday, the eighth is the second Tuesday.

  4. A Wednesday, the seventh is the next Tuesday, so the fourteenth is the second Tuesday.

  5. A Thursday, the sixth is the next Tuesday, so the thirteenth is the second Tuesday.

  6. A Friday, the fifth is the next Tuesday, so the twelfth is the second Tuesday.

  7. A Saturday, the fourth is the next Tuesday, so the eleventh is the second Tuesday.

This exhausts all possibilities. So the challenge now is simply to reduce these facts into a formula. With the aid of an underappreciated technological methodology called brute force, you can verify the correctness of the following manipulation of the day of the week of the first day of the month, as shown in Table 4-6.

Table 4-6. Demonstration calculation
A1stBwkdayC10–BDC mod 7ED+7
sun1929
mon2818
tue3707
wed46613
thu55512
fri64411
sat73310

The first column (A) is the day of the week of the first day of the month, and the second column is the numerical equivalent of this, using the range 1 (Sunday) through 7 (Saturday).

The important data in Table 4-6 is in the last column, which is the number of days to add to the date of the first day of the month.

So in a nutshell, the formula is:

  1. Find B, the day of the week of the first day of the month, using:

  2. 1=Sunday ... 7=Saturday.
  3. Subtract this number from 10 to get C:

  4. With Sunday=1 ... Saturday=7, Tuesday would be 3.
  5. The number 3 – B is the offset (relative to the first of the month) for a Tuesday, but it might be in the current month or the previous month; 10 – B is also a Tuesday, and so are 17 – B and 24 – B.

  6. You should choose to subtract from 10 because you want C to be positive for all inputs. This is because you need D to be positive in the next step, but a negative value for C would result in a negative value for D. This is because –1 % 7 gives –1 on most systems.

  7. Divide by 7 and keep the remainder to get D.

  8. D is also the offset for a Tuesday, and D is in the range 0 to 6. Every day in the first week has an offset between 0 and 6. So D is the first Tuesday of the month.

  9. Add 7 to get E.

  10. That takes the range of E from 7 to 13. Every day in the second week has an offset in the range 7–13.

  11. Take the result and add that number of days to the date of the first day of the month.

In practical terms, to implement this formula you will need to use the specific date and arithmetic functions of your database system. Here are some examples.

MySQL

In MySQL:

SELECT monthdate    AS first_day_of_month
     , DATE_ADD(monthdate 
              , INTERVAL
                ( ( 10 - DAYOFWEEK(monthdate) ) % 7 ) + 7
                DAY
              )     AS second_tuesday_of_month
  FROM monthdates

Oracle

In Oracle:

SELECT monthdate    AS first_day_of_month  
     , monthdate
          + MOD( ( 10 – TO_CHAR(monthdate,'d') ), 7 ) + 7 
                    AS second_tuesday_of_month
  FROM monthdates

SQL Server

With SQL Server:

SELECT monthdate    AS first_day_of_month
     , DATEADD(day
              , ( ( 10 - DATEPART(dw,monthdate) ) % 7 ) + 7
              , monthdate 
              )     AS second_tuesday_of_month
  FROM monthdates

PostgreSQL

PostgreSQL gives 0 for Sunday, so you must add 1. Also, the output from EXTRACT is a floating-point number, so you must CAST it before you attempt modular arithmetic:

SELECT monthdate    AS first_day_of_month
     , monthdate +
          ((10 - CAST(EXTRACT(dow FROM monthdate) + 1 AS INT)) % 7) + 7
                   AS second_tuesday_of_month
  FROM monthdates

Here are the results:

first_day_of_month  second_tuesday_of_month
   2007-04-01          2007-04-10
   2007-05-01          2007-05-08
   2007-06-01          2007-06-12
                  
                  
                  

Hacking the Hack: The Last Thursday of the Month

You can use a similar technique to calculate the last Thursday of the month. Just find the first Thursday of next month and subtract seven days.

The formula for the offset for the first Thursday of the month beginning with monthdate is:

(12-DAYOFWEEK(monthdate) ) % 7

Subtract from 12 because Thursday is represented by 5 and 5 + 7 = 12.

The query to get the first day of next month is:

mysql> SELECT monthdate AS first_day_of_month
    ->       ,DATE_ADD(monthdate,INTERVAL 1 MONTH)
    ->                  AS first_day_of_next_month
    ->   FROM monthdates;
+--------------------+-------------------------+
| first_day_of_month | first_day_of_next_month |
+--------------------+-------------------------+
| 2007-04-01         | 2007-05-01              |
| 2007-05-01         | 2007-06-01              |
| 2007-06-01         | 2007-07-01              |
+--------------------+-------------------------+

You can use this result to find the first Thursday of next month and subtract 7 to get the last Thursday of this month:

mysql> SELECT first_day_of_month
    ->       ,DATE_ADD(first_day_of_next_month
    ->                ,INTERVAL
    ->                  ((12 
               - DAYOFWEEK(first_day_of_next_month)) % 7)
    ->                       - 7 DAY) AS last_thursday_of_month
    ->   FROM
    ->        (SELECT monthdate AS first_day_of_month
    ->               ,DATE_ADD(monthdate,INTERVAL 1 MONTH)
    ->                          AS first_day_of_next_month
    ->           FROM monthdates) t;
+--------------------+------------------------+
| first_day_of_month | last_thursday_of_month |
+--------------------+------------------------+
| 2007-04-01         | 2007-04-26             |
| 2007-05-01         | 2007-05-31             |
| 2007-06-01         | 2007-06-28             |
+--------------------+------------------------+

Get SQL Hacks now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.