reporting - Gary Sieling

Postgres doesn’t have a built-in pivot table feature, but they are relatively easy to construct.

Let’s say you have a table of users, and a table of events, and want to make a pivot table that shows how often each event occurred.

create table users (name varchar);

insert into users ('gary');
insert into users ('melissa');
insert into users ('ella');
insert into users ('chloe');

create table events (user_name varchar, event varchar);
insert 
into events (user_name, event) 
values ('gary', 'event_1');

insert 
into events (user_name, event) 
values ('gary', 'event_1');

insert 
into events (user_name, event) 
values ('gary', 'event_2');

insert 
into events (user_name, event) 
values ('melissa', 'event_2');

insert 
into events (user_name, event) 
values ('ella', 'event_1');

The pivot table output should look like this:

user     event_1  event 2
gary     2        1
melissa  0        2
ella     1        0
ziti     0        0

You can get counts easily enough, but the “0” records will drop out, which is a bit of a pain:

select 
  user_name, 
  event, 
  count(*) event_count
from events
group by 1, 2

What we really want is a list of all combinations of users and events, combined with their counts. Normally you don’t do things like this, but this can be obtained with a cartesian join.

You’ll want to make sure to itemize the names (don’t use select *) – this will help with the next step.

select users.name, event_list.event
from users
join ( 
  select distinct event 
  from events
) as event_list on 1=1

The above query gives us the following data:

gary	event_2
gary	event_1
melissa	event_2
melissa	event_1
ella	event_2
ella	event_1
chloe	event_2
chloe	event_1

If you have a static list of events, you can also join to it directly:

select users.name, event_list.event
from users
join ( 
  select unnest(array['event_1', 'event_2']) event
) as event_list on 1=1

Now that we have this, we can join it back to the count data:

select 
  users.name, 
  event_list.event, 
  coalesce(event_count, 0) event_count
from users
join (
   select distinct event 
   from events
  ) as event_list on 1=1
left join (
  select 
    user_name, 
    event, 
    count(*) event_count
  from events
  group by 1, 2
) counts on counts.user_name = users.name
        and counts.event = event_list.event

This new dataset is much better, as it includes every possible cell that we would want in the pivot table:

name    event     event_count
gary	event_1	  2
gary	event_2	  1
melissa	event_1	  0
melissa	event_2	  1
ella	event_1	  1
ella	event_2	  0
chloe	event_1	  0
chloe	event_2	  0

To get these all into one row, we can wrap this in array aggregation. Note that the values in the arrays need to be sorted by the event type, or else each record will get counts in a different order.

select 
  name, 
  array_agg(event order by event) events, 
  array_agg(event_count order by event) counts
from (
  ...
) aggregated_date
group by name

This returns:

name    events                  counts
chloe	{"event 1","event 2"}	{0,0}
ella	{"event 1","event 2"}	{1,0}
gary	{"event 1","event 2"}	{2,1}
melissa	{"event 1","event 2"}	{0,1}

Now getting pivot table data is easy through array indexing. Note that Postgres arrays start at 1 (!), and obviously this solution won’t help you if you have an indeterminite number of events.

select 
  name, 
  counts[1] as event_1, 
  counts[2] as event_2
from (
  ...
) pivot_data

name    event_1   event_2
chloe	0 	  0
ella	1	  0
gary	2	  1
melissa	0	  1

If you’re following along at home, here is the final result:

 -- Create pivot table
select
  name, 
  counts[1] as event_1, 
  counts[2] as event_2
from (
  -- Combine all the counts into one row per user
  select
    name, 
    array_agg(event order by event) events, 
    array_agg(event_count order by event) counts
  from (
    -- Create counts of "0" for events that haven't occurred
    select 
      users.name, 
      event_list.event, 
      coalesce(event_count, 0) event_count
    from users
    join (
       -- Compute the full list of events
       select distinct event 
       from events
    ) as event_list on 1 = 1
    left join (
      -- Compute how often each event happened per user
      select 
        user_name, 
        event, 
        count(*) as event_count
      from events
      group by 1, 2
    ) counts
       on counts.user_name = users.name
       and counts.event = event_list.event
  ) aggregated_date
  group by name
) pivot_data
order by name

Tag: reporting

Postgres Pivot Tables