Fun with generate_series

A few months ago I attended the PostgreSQL conference in Portland, OR. There were a lot of talks ranging from hard-core stuff like Neil Conway‘s talk about the internals of query execution, to random fun stuff like David Fetter‘s discussion of procedural languages, including LOLCODE.

gulakuno.jpg

During their talks, a few people mentioned a handy function called generate_series. It took me a while to discover how useful this function really is. I thought I’d post an example. Here goes…

Let’s say that you have a table with sales information:

postgres=# select * from sales order by date;
    date    | sales_person | part_number
------------+--------------+-------------
 2008-05-05 | Glenn        |           1
 2008-05-05 | Shahaf       |           1
 2008-05-06 | Mike         |           1
 2008-05-06 | Mike         |           2
 2008-05-08 | Glenn        |           1
 2008-05-08 | Shahaf       |           1
 2008-05-08 | Mike         |           2
 2008-05-09 | Mike         |           1
 2008-05-09 | Glenn        |           1
(9 rows)

You might want to get an idea of how many sales happened on each day. You could try to do it with a query like this:

postgres=# select date, count(*) from sales group by date order by date;
    date    | count
------------+-------
 2008-05-05 |     2
 2008-05-06 |     2
 2008-05-08 |     3
 2008-05-09 |     2
(4 rows)

This basically works, but it hides one important fact — no sales happened on May 7.

To fix this, we can use generate_series. When you run this function normally, it just generates a series of numbers:

postgres=# select generate_series(0,4);
generate_series
-----------------
0
1
2
3
4
(5 rows)

However, you can easily change it to generate a series of dates:

postgres=# select generate_series(0,4) + date '2008-05-05' as date;
date
------------
2008-05-05
2008-05-06
2008-05-07
2008-05-08
2008-05-09
(5 rows)

Once you have this, you can now join against the sales table to generate the report:

postgres=# select series.date, count(sales.date)
from (select generate_series(0,4) + date '2008-05-05' as date) as series
left outer join sales on series.date=sales.date group by series.date
order by series.date;
    date    | count
------------+-------
 2008-05-05 |     2
 2008-05-06 |     2
 2008-05-07 |     0
 2008-05-08 |     3
 2008-05-09 |     2
(5 rows)

The trick here is to do a left-outer join betwee the date sequence and the sales, and to count the rows that have a non-null sales date.

This is by no means the only use of generate_series, it’s just the most recent use I found. If you know of other ways to use this function, or if you know of other handy functions, drop a note below.

Image credits: Electric Vehicle Guide

Discussion

  • Jason

    Have you guys thought of using OLAP? The entire purpose of an OLAP cube is to rapidly analyze data just as you are demostrating here.

    I have worked with Oracle, Hyperion, and Microsoft OLAP and nearly every data intensive company is moving in this direction because of the performance and flexibility of working with cubes.

  • John R

    Golly! I was looking for a function exactly like generate_series and couldn’t find it… and now I notice that it’s in the official PostgreSQL documentation! ARRGH! Thanks for your tip.

  • Aaron

    Thanks! This is just what I’m looking for

  • http://www.blah.com Sean

    Anybody that finds this may also be interested that since version 8.4 you can also call generate_series with date/timestamps and an interval to generate the series.

  • http://www.discount-nike-dunk-shoes.com nike dunk shoes

    Hhe article's content rich variety which make us move for our mood after reading this article. surprise, here you will find what you want! Recently, I found some wedsites which commodity is colorful of fashion.
    http://www.scarf8.net

  • Pingback: Como criar listas sequenciais no PostgreSQL com a função generate_series « tiagopassos.com