Postgres doesn’t have a built-in pivot table feature, but they are relatively easy to construct.
Let’s say you have a table of users, and a table of events, and want to make a pivot table that shows how often each event occurred.
create table users (name varchar);
insert into users ('gary');
insert into users ('melissa');
insert into users ('ella');
insert into users ('chloe');
create table events (user_name varchar, event varchar);
insert
into events (user_name, event)
values ('gary', 'event_1');
insert
into events (user_name, event)
values ('gary', 'event_1');
insert
into events (user_name, event)
values ('gary', 'event_2');
insert
into events (user_name, event)
values ('melissa', 'event_2');
insert
into events (user_name, event)
values ('ella', 'event_1');
The pivot table output should look like this:
user event_1 event 2 gary 2 1 melissa 0 2 ella 1 0 ziti 0 0
You can get counts easily enough, but the “0” records will drop out, which is a bit of a pain:
select
user_name,
event,
count(*) event_count
from events
group by 1, 2
What we really want is a list of all combinations of users and events, combined with their counts. Normally you don’t do things like this, but this can be obtained with a cartesian join.
You’ll want to make sure to itemize the names (don’t use select *) – this will help with the next step.
select users.name, event_list.event
from users
join (
select distinct event
from events
) as event_list on 1=1
The above query gives us the following data:
gary event_2 gary event_1 melissa event_2 melissa event_1 ella event_2 ella event_1 chloe event_2 chloe event_1
If you have a static list of events, you can also join to it directly:
select users.name, event_list.event
from users
join (
select unnest(array['event_1', 'event_2']) event
) as event_list on 1=1
Now that we have this, we can join it back to the count data:
select
users.name,
event_list.event,
coalesce(event_count, 0) event_count
from users
join (
select distinct event
from events
) as event_list on 1=1
left join (
select
user_name,
event,
count(*) event_count
from events
group by 1, 2
) counts on counts.user_name = users.name
and counts.event = event_list.event
This new dataset is much better, as it includes every possible cell that we would want in the pivot table:
name event event_count gary event_1 2 gary event_2 1 melissa event_1 0 melissa event_2 1 ella event_1 1 ella event_2 0 chloe event_1 0 chloe event_2 0
To get these all into one row, we can wrap this in array aggregation. Note that the values in the arrays need to be sorted by the event type, or else each record will get counts in a different order.
select
name,
array_agg(event order by event) events,
array_agg(event_count order by event) counts
from (
...
) aggregated_date
group by name
This returns:
name events counts chloe {"event 1","event 2"} {0,0} ella {"event 1","event 2"} {1,0} gary {"event 1","event 2"} {2,1} melissa {"event 1","event 2"} {0,1}
Now getting pivot table data is easy through array indexing. Note that Postgres arrays start at 1 (!), and obviously this solution won’t help you if you have an indeterminite number of events.
select
name,
counts[1] as event_1,
counts[2] as event_2
from (
...
) pivot_data
name event_1 event_2 chloe 0 0 ella 1 0 gary 2 1 melissa 0 1
If you’re following along at home, here is the final result:
-- Create pivot table
select
name,
counts[1] as event_1,
counts[2] as event_2
from (
-- Combine all the counts into one row per user
select
name,
array_agg(event order by event) events,
array_agg(event_count order by event) counts
from (
-- Create counts of "0" for events that haven't occurred
select
users.name,
event_list.event,
coalesce(event_count, 0) event_count
from users
join (
-- Compute the full list of events
select distinct event
from events
) as event_list on 1 = 1
left join (
-- Compute how often each event happened per user
select
user_name,
event,
count(*) as event_count
from events
group by 1, 2
) counts
on counts.user_name = users.name
and counts.event = event_list.event
) aggregated_date
group by name
) pivot_data
order by name
Since 9.1 version you can use tablefunc extension http://www.postgresql.org/docs/9.1/static/tablefunc.html
CREATE EXTENSION tablefunc;
SELECT * from crosstab(‘ select distinct users.name,events.event,count(event) OVER (PARTITION BY event,name) from users left join events on name=user_name order by 1,2 ‘ ) AS ct(user_name varchar, event_1 int8, event_2 int8);
user_name | event_1 | event_2
———–+———+———
chloe | 0 |
ella | 1 |
gary | 2 | 1
melissa | 1 |
ziti | 0 |
(5 rows)
I did see that, but I like this method because you an option in a production environment that doesn’t require changing anything.
How about the canonical way:
SELECT
name,
sum(case when event=’event_1′ then 1 else 0 end) as event_1,
sum(case when event=’event_2′ then 1 else 0 end) as event_2
FROM users LEFT JOIN events on (name=user_name)
GROUP BY 1
@Daniel I’ve used this approach with success. Note that in this case the number of columns is known (ie. event_1, event_2) where the article’s approach will handle an unknown number (ie. event_1 .. event_n).
Also, if using your approach note that since 9.4 you can use `COUNT(*) FILTER (WHERE event=’event_1′) AS event_1` which has given me a better plan and performance in most cases.
You can do:
SELECT name,
count(1) FILTER (WHERE event=’event_1′) AS event_1,
count(1) FILTER (WHERE event=’event_2′) AS event_2,
FROM users LEFT JOIN events ON name = user_name
GROUP BY 1
@Ariel – that’s neat, I like that technique as well
Check out 9.5’s GROUPING SETs, ROLLUP and CUBE which make this even easier
https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.5#GROUPING_SETS.2C_CUBE_and_ROLLUP
Very good,
In case i need sum / add?
for example:
user | month_1 | month 2
—————————————
gary | 2.00 | 1.00
melissa | 0 | 2.00
ella | 1.50 | 0
ziti | 2.50 | 0
Thanks for shared.