June 22, 2009 at 10:21 am
Hi
I'm using SQL Server 2005 and have a table with telephone nr in. When I link it back to main table with the customer details I get duplicate values for the same client seeing that the client can have more than 1 telephone nr.
The table is as follows and I would like to transpose just the first 3 telephone numbers for each client.
table :
Client NrOccuranceTelephone
A110115551234
A120115554321
A130115559999
A210115558888
A310115550000
A320115551111
Becomes
Client NrTelephone 1Telephone 2Telephone 3
A1011555123401155543210115559999
A20115558888NULLNULL
A301155500000115551111NULL
I see that the Pivot uses a aggregation function everytime so I assume that it will not work in this case?
Any suggestion are most welcome
Thx
June 22, 2009 at 10:40 am
Thanks I found this on this site, works perfectly
SELECT
ACCT_DEBTOR,
max(LandLineNumber1) as LandLineNumber1,
max(LandLineNumber2) as LandLineNumber2,
max(LandLineNumber3) as LandLineNumber3
FROM (
SELECT
ACCT_DEBTOR,
OCCURRENCE,
(Case WHEN OCCURRENCE=1 THEN LANDLINE_CONTACT_NO ELSE null END) AS LandLineNumber1,
(Case WHEN OCCURRENCE=2 THEN LANDLINE_CONTACT_NO ELSE null END) AS LandLineNumber2,
(Case WHEN OCCURRENCE=3 THEN LANDLINE_CONTACT_NO ELSE null END) AS LandLineNumber3
FROM Post_File082_Landline_No
) AS LandlineNumbers
June 22, 2009 at 2:16 pm
You don't need the subselect:
SELECT
ACCT_DEBTOR,
MAX(Case WHEN OCCURRENCE=1 THEN LANDLINE_CONTACT_NO ELSE null END) AS LandLineNumber1,
MAX(Case WHEN OCCURRENCE=2 THEN LANDLINE_CONTACT_NO ELSE null END) AS LandLineNumber2,
MAX(Case WHEN OCCURRENCE=3 THEN LANDLINE_CONTACT_NO ELSE null END) AS LandLineNumber3
FROM Post_File082_Landline_No
GROUP BY ACCT_DEBTOR
June 23, 2009 at 1:40 am
Tested it without the sub select, and works perfectly. Thanks!
June 23, 2009 at 6:49 am
Also you can achieve the same using PIVOT
e.g.
SELECT ACCT_DEBTOR,
[1] AS [LandLineNumber1],
[2] AS [LandLineNumber2],
[3] AS [LandLineNumber3]
FROM (SELECT ACCT_DEBTOR,OCCURRENCE,LANDLINE_CONTACT_NO FROM Post_File082_Landline_No) a
PIVOT(MAX(LANDLINE_CONTACT_NO) FOR OCCURRENCE IN ([1],[2],[3])) p
Far away is close at hand in the images of elsewhere.
Anon.
June 23, 2009 at 8:13 am
HI, yes, my original post was to get the value without using a aggregation, but I see that the solution which i'm using includes it in any case! I didnt think it thru, seeing that the pivot would work in this case as well.
I'll test the pivot as well, seeing that I could create an index on the view which I'm using the query in now, and to return any data takes about 90 seconds or so. I'll test the pivot and give some feedback on the performance as well, but I dont want to spend to much effort on the testing of the data as this is only temp data.
Many thanks
June 23, 2009 at 8:28 am
Pivot table solution on 9.5 mil records returned : 134 seconds
and with first solution 132 seconds.
June 23, 2009 at 10:00 am
Thank you for posting the results!
Since they are almost equal I'm wondering what the execution plans would look like...
Would you mind posting the actual execution plan for both options (saved as .sqlplan and attached as zip file) or check it yourself whether they are any different?
Maybe one of the performance gurus comes across and might find some additional tuning options...
June 24, 2009 at 1:16 am
Attached is the SQL Plan. I could never really understand the plans but hopefully someone will.
the only thing which I can add is that the group by query ranges from 116 seconds to 242 seconds depending on the indexes used on it, but the pivot with an index on OCCURRENCE only completed in 704 seconds but for the rest between 131 seconds and 208.
June 24, 2009 at 1:36 pm
I'm by far no expert on execution plans but to me it looks like there is a missing index on column ACCT_DEBTOR that causes the major time of the non-pivot query for sorting. So I would use an index on the two columns ACCT_DEBTOR, OCCURRENCE with column LANDLINE_CONTACT_NO included. I would add rsp. include the other two columns to avoid a bookmark lookup that may drive the query analyzer not to use this index at all.
If my interpretation is correct than SQL Server would pick the index, having all the data already in the right order eliminating the sort. The index should also help the pivot.
Running over 2 minutes for those 12mill rows seems a little bit slow (depending on the hardware, of course).
If you have a test system, you could verify how the index will affect the query and your overall performance (including disc space since the index will probably need more than 100Mb...).
June 24, 2009 at 3:48 pm
An actual execution plan would be much more useful than estimated plans (we could see the run-time distribution of data across the threads for example).
In the posted plans, the PIVOT plan uses a hash aggregate whereas the non-PIVOT chooses a sort followed by a stream aggregate. This may be optimal, but it would be interesting to add OPTION (HASH GROUP) to the non-PIVOT query, to see a closer apples-to-apples comparison.
I guess we should also check that DBCC DROPCLEANBUFFERS was run before each test to ensure a level playing field (cold cache).
It's all a bit academic without appropriate indexes, or statistics at least.
Interesting though.
Paul
June 24, 2009 at 4:16 pm
Paul White (6/24/2009)
An actual execution plan would be much more useful than estimated plans (we could see the runtime distribution of data across the threads for example)....
It's all a bit academic without appropriate indexes, or statistics at least.
Interesting though.
Paul
Thanx for looking into it!
What indexes would you consider as "appropriate"?
As far as I can see it doesn't really make sense to compare the two performance wise as long as both aren't "tuned".
The next steps I was thinking of getting towards comparable results was:
1) getting the index issue fixed
2) getting actual execution plans using indexes defined above (for both, non-pivot and pivot)
3) If execution plans do look ok start comparing apples and apples and see where we end at.
What I've seen when looking at the exec plans was that there seems to be room for significant performance improvement for the non-pivot solution whereas I couldn't really see a way to get the pivot faster.
Making long story short:
This thread made me wondering about the benefit of using PIVOT at all (knowing that I've recommended it quite often in the past though... :unsure:)
June 24, 2009 at 7:25 pm
lmu92 (6/24/2009)
What indexes would you consider as "appropriate"?
A nonclustered index on ACCT_DEBTOR with OCCURRENCE and LANDLINE_CONTACT_NO as included columns, I think.
(I would also be naturally inclined to create a clustered index on something (and then drop it again if it isn't needed), just to get rid of any nastiness like forwarded records or fragmentation. Precise indexing choices depend on information we don't have.)
lmu92 (6/24/2009)
As far as I can see it doesn't really make sense to compare the two performance wise as long as both aren't "tuned".The next steps I was thinking of getting towards comparable results was:
1) getting the index issue fixed
2) getting actual execution plans using indexes defined above (for both, non-pivot and pivot)
3) If execution plans do look ok start comparing apples and apples and see where we end at.
Sounds great.
lmu92 (6/24/2009)
What I've seen when looking at the exec plans was that there seems to be room for significant performance improvement for the non-pivot solution whereas I couldn't really see a way to get the pivot faster.
The PIVOT will benefit from the nice new index. The QO may choose a Stream Aggregate with that index, but I would test both grouping options - unless sorted results are required the hash group normally performs better on large result sets.
lmu92 (6/24/2009)
This thread made me wondering about the benefit of using PIVOT at all (knowing that I've recommended it quite often in the past though... :unsure:)
Apparently some people find it more readable (!)
I'm not one of those people - and I rarely use PIVOT for probably the same reasons you have in mind.
Paul
June 25, 2009 at 7:16 am
Hi guys, thought I would paste some of the performance figures.
I used the following statements each time I ran 1 of the test
CHECKPOINT;
DBCC DROPCLEANBUFFERS;
CREATE NONCLUSTERED INDEX Test1 ON Post_File082_Landline_No .......
RUN TEST
DROP INDEX
Sorry about the attachment JPG, but could not get the table correctly pasted in here
June 25, 2009 at 2:54 pm
Cool stuff Chris. Of course it leaves me wondered about the hash aggregate versus stream aggregate thing but hey.
So PIVOT can be slightly more efficient - if you can be bothered to fight the syntax, and if the query you need suits it.
😎
Viewing 15 posts - 1 through 15 (of 33 total)
You must be logged in to reply to this topic. Login to reply