Greetings,
I have a table of ID's and activities like the following and I need to pivot the data as shown below into a View
ID Activity
1 Swimming
2 Skating
1 Gymnastics
3 Swimming
The Desired state is as follows:
ID Swimming Skating Gymnastics
1 1 0 0
2 0 1 0
3 1 0 0
There are 150,000 entries in the table with 40 different activities. I have tried using pivot and can get an aggregated value of ID's by activity but that is not what I am looking for. I could create a temp table and update from the base table but that is not a very elegant solution. Can Pivot be used for this purpose or should I look at other options? Thanks
November 21, 2023 at 9:52 pm
Probably a bit messy, but what about using a CASE statement? Something along the lines of:
SELECT ID
, CASE WHEN EXISTS (SELECT 1
FROM <table> AS inner
WHERE outer.ID = inner.ID AND Activity = 'Swimming')
THEN 1
ELSE 0
...
FROM <table> AS outer
Will result in 40 different CASE statements, but avoids the PIVOT. Mind you, I think that PIVOT should work as long as you don't have duplicate rows. I am not certain what issue you had with aggregated ID, but if the problem was due to duplicate rows, you could use a subquery and put a distinct on the query so you don't have any duplicate rows before you do the pivot.
I am also a bit confused about the "expected" results as I see that ID 1 has both Swimming and Gymnastics, but your expected results ONLY lists swimming. Just curious how you determine if swimming OR gymnastics should be marked? If it is based on the previous rows, how do you determine what is a "previous" row? Are you capturing any datetime information on the data? If it is a mistake in the expected results, then could you elaborate on why an aggregate on ID is problematic? To me that feels like the correct use, but if not, then I can think of alternate options, but it depends on what your results should look like...
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
November 21, 2023 at 9:53 pm
Yes, it can be done with PIVOT. You'll need to create an artificial column to aggregate on. AND you should investigate cross tabs.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
November 21, 2023 at 9:55 pm
@drew.Allen - is there a reason you need an artificial column to aggregate on? Couldn't you use a COUNT on the ID, assuming there are no duplicate rows and if there are duplicate rows, couldn't you DISTINCT the results first and then COUNT on the ID?
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
November 21, 2023 at 10:19 pm
I think it's more likely that he's misstating what he wants, because he doesn't have enough knowledge, rather accurately stating that the solution should not use an aggregate. Assuming that aggregates are allowed produces a MUCH simpler and more efficient query.
SELECT ID
, MAX(CASE WHEN Activity = 'Swimming')
THEN 1
ELSE 0
END) AS Swimming
...
FROM <table>
GROUP BY ID
Even in the unlikely event that he doesn't want an aggregate, it's probably going to be more efficient to use windowed functions rather than subqueries.
SELECT ID
, MAX(CASE WHEN Activity = 'Swimming')
THEN 1
ELSE 0
END) OVER(PARTITION BY ID) AS Swimming
...
FROM <table>
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
November 21, 2023 at 10:24 pm
@Drew.Allen - is there a reason you need an artificial column to aggregate on? Couldn't you use a COUNT on the ID, assuming there are no duplicate rows and if there are duplicate rows, couldn't you DISTINCT the results first and then COUNT on the ID?
The definition for PIVOT includes the following
<aggregation function>(<column being aggregated>)
Note that it says COLUMN being aggregated. It does not say EXPRESSION being aggregated. Part of reason for using a cross tab rather than a pivot, is that you have much more flexibility in the syntax.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
November 21, 2023 at 10:58 pm
Thanks for the feedback everyone, you were correct about the expected results, there should be a 1 for both swimming and gymnastics for id 1. I'll explore these suggestions and get back to you. At the end of the day, I want a list of distinct id's in Column 1 and a column for every activity with a 0 or 1 value based on whether a row with that activity exists for that ID.
November 21, 2023 at 11:32 pm
Here's an example with an artificial column. I did try COUNT(ID), but it said invalid column.
DROP TABLE IF EXISTS dbo.Activities;
CREATE TABLE dbo.Activities
( ID INT,
Activity VARCHAR(100),
);
INSERT dbo.Activities (ID, Activity)
VALUES (1,'Swimming'), (2,'Skating'),(1, 'Gymnastics'), (3, 'Swimming');
WITH Act AS
( SELECT ID, Activity,
CAST(1 AS INT) AS CountFlag
FROM dbo.Activities
)
SELECT ID, ISNULL([Swimming],0) AS 'Swimming', ISNULL([Skating],0) AS 'Skating', ISNULL([Gymnastics],0) AS 'Gymnastics'
FROM
(
SELECT ID, Activity , CountFlag
FROM Act
) AS x
PIVOT
(
MAX(CountFlag)
FOR Activity IN ([Swimming], [Skating], [Gymnastics])
) AS x;
November 21, 2023 at 11:54 pm
why doesn't pivot work for you?
2 ways of doing it with pivot below
drop table if exists #data
select *
into #data
from (values (1, 'Swimming')
, (2, 'Skating')
, (1, 'Gymnastics')
, (3, 'Swimming')
) t(ID, Activity)
select piv.ID
, coalesce(piv.Swimming, 0) as Swimming
, coalesce(piv.Skating, 0) as Skating
, coalesce(piv.Gymnastics, 0) as Gymnastics
from (select *
from #data
cross apply (select 1 as Cnt) t2
) t
PIVOT (max(t.Cnt) for activity IN ([Swimming], [Skating], [Gymnastics])
) as piv
order by piv.id
/*
or if list of activities not known use dynamic sql
*/
declare @sql nvarchar(max)
declare @basesql nvarchar(max)
declare @pivselect nvarchar(max)
declare @pivcolumns nvarchar(max)
set @basesql =
'select piv.ID
--pivselect
from (select *
from #data
cross apply (select 1 as Cnt) t2
) t
PIVOT (max(t.Cnt) for activity IN
(--pivcolumns
)
) as piv
order by piv.id
'
select @pivcolumns = STRING_AGG(quotename(t.activity), ',')
, @pivselect = STRING_AGG(', coalesce(piv.' + quotename(t.activity) + ', 0) as ' + QUOTENAME(t.activity), '')
within GROUP (ORDER by t.activity)
from (select distinct activity
from #data
) t
select @pivcolumns, @pivselect
set @sql = replace(replace(@basesql, '--pivselect', @pivselect), '--pivcolumns', @pivcolumns)
exec sys.sp_executesql @sql
Just a suggestion... Keep It Super Simple and flexible at the same time.
Here's the test table (not part of the solution):
DROP TABLE IF EXISTS #TestTable;
GO
SELECT *
INTO #TestTable
FROM (VALUES
(1,'Swimming')
,(2,'Skating')
,(1,'Gymnastics')
,(3,'Swimming')
)v(ID, Activity)
;
This is a simple CROSSTAB. It allows for simple totals for rows and columns without a lot of additional fanfare. It's usually faster than PIVOT and also lends itself very well to dynamic SQL (not shown here):
--===== Simple CROSSTABLE method includes row and column totals.
SELECT ID = IIF(GROUPING_ID(ID)=0,CONVERT(VARCHAR(10),ID),'Total')
,Swimming = SUM(IIF(Activity='Swimming' ,1,0))
,Skating = SUM(IIF(Activity='Skating' ,1,0))
,Gymnastics = SUM(IIF(Activity='Gymnastics' ,1,0))
,Total = COUNT(*) --This does the row totals
FROM #TestTable
GROUP BY ID WITH ROLLUP --This does the column totals
ORDER BY GROUPING_ID(ID), ID
;
And here are the results... Notice in the code above that we didn't need even 1 ISNULL() or COALESCE() to get the missing zeros to display:
Here are some references on this old "Black Arts" method of pivoting:
This is how it works and a few reasons why it's better than Pivot... (Note... really old article used CASE instead of IIF)
https://www.sqlservercentral.com/contributions
This is how to create the code dynamically:
https://www.sqlservercentral.com/articles/cross-tabs-and-pivots-part-2-dynamic-cross-tabs
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply