August 20, 2015 at 3:05 pm
Hi all,
Can anyone please help with grouping islands of data? I am revisiting an issue we wound up using a recursive CTE to solve previously because we weren't sure how to do it any other way. But now it's just way too slow. I had to cancel it after 30 minutes when trying to run a day's worth of data (~200k records).
Here's some sample data:
CREATE TABLE #t (ID int, filenm int, docnum int, pg char(1));
GO
INSERT INTO #t
SELECT 140675885,072915107,1580,'D' UNION ALL
SELECT 140675886,072915107,11,'D' UNION ALL
SELECT 140675887,072915107,1580,'D' UNION ALL
SELECT 140675888,072915107,109,'D' UNION ALL
SELECT 140675889,072915107,1511,'D' UNION ALL
SELECT 140675890,072915107,221,'D' UNION ALL
SELECT 140675891,072915107,221,'F' UNION ALL
SELECT 140675892,072915107,46,'D' UNION ALL
SELECT 140675893,072915107,20,'D' UNION ALL
SELECT 140675894,072915107,20,'F' UNION ALL
SELECT 140675895,072915107,20,'F' UNION ALL
SELECT 140675896,072915107,20,'F' UNION ALL
SELECT 140675897,072915107,20,'F' UNION ALL
SELECT 140675898,072915107,20,'F' UNION ALL
SELECT 140675899,072915107,14,'D' UNION ALL
SELECT 140675900,072915107,14,'F' UNION ALL
SELECT 140675901,072915107,14,'F' UNION ALL
SELECT 140675887,072915205,1580,'D' UNION ALL
SELECT 140675887,072915205,1580,'F' UNION ALL
SELECT 140675887,072915205,1580,'D'
;
Essentially a 'pg' of D is the start of a new group or island, and any Fs following that D for the given filenm and docnum in ID order are part of that group.
I have tried Row_number and Dense_rank. Dense_rank gets sort of close, but the fact that an island can include both D and F records really has me stumped. I have a feeling this should be easy, but I am just not seeing it.
Attached is an example of how the sample data should be grouped.
Let me know if there's anything I can clarify or provide additional information about and thank you in advance!
August 20, 2015 at 3:58 pm
Are there supposed to be duplicates in the data? It seems that this row is repeated:
SELECT 140675887, 072915205, 1580, 'D'
Cheers!
EDIT: Fixed a typo.
August 21, 2015 at 6:51 am
Apologies! No, that's an oversight on my part. Here is the corrected sample data and attached example result set:
SELECT 140675885,072915107,1580,'D' UNION ALL
SELECT 140675886,072915107,11,'D' UNION ALL
SELECT 140675887,072915107,1580,'D' UNION ALL
SELECT 140675888,072915107,109,'D' UNION ALL
SELECT 140675889,072915107,1511,'D' UNION ALL
SELECT 140675890,072915107,221,'D' UNION ALL
SELECT 140675891,072915107,221,'F' UNION ALL
SELECT 140675892,072915107,46,'D' UNION ALL
SELECT 140675893,072915107,20,'D' UNION ALL
SELECT 140675894,072915107,20,'F' UNION ALL
SELECT 140675895,072915107,20,'F' UNION ALL
SELECT 140675896,072915107,20,'F' UNION ALL
SELECT 140675897,072915107,20,'F' UNION ALL
SELECT 140675898,072915107,20,'F' UNION ALL
SELECT 140675899,072915107,14,'D' UNION ALL
SELECT 140675900,072915107,14,'F' UNION ALL
SELECT 140675901,072915107,14,'F' UNION ALL
SELECT 140675902,072915205,1580,'D' UNION ALL
SELECT 140675903,072915205,1580,'F' UNION ALL
SELECT 140675904,072915205,1580,'D'
Thank you for noting that!
August 21, 2015 at 7:19 am
jessica.green2312 (8/21/2015)
Apologies! No, that's an oversight on my part. Here is the corrected sample data and attached example result set:SELECT 140675885,072915107,1580,'D' UNION ALL
SELECT 140675886,072915107,11,'D' UNION ALL
SELECT 140675887,072915107,1580,'D' UNION ALL
SELECT 140675888,072915107,109,'D' UNION ALL
SELECT 140675889,072915107,1511,'D' UNION ALL
SELECT 140675890,072915107,221,'D' UNION ALL
SELECT 140675891,072915107,221,'F' UNION ALL
SELECT 140675892,072915107,46,'D' UNION ALL
SELECT 140675893,072915107,20,'D' UNION ALL
SELECT 140675894,072915107,20,'F' UNION ALL
SELECT 140675895,072915107,20,'F' UNION ALL
SELECT 140675896,072915107,20,'F' UNION ALL
SELECT 140675897,072915107,20,'F' UNION ALL
SELECT 140675898,072915107,20,'F' UNION ALL
SELECT 140675899,072915107,14,'D' UNION ALL
SELECT 140675900,072915107,14,'F' UNION ALL
SELECT 140675901,072915107,14,'F' UNION ALL
SELECT 140675902,072915205,1580,'D' UNION ALL
SELECT 140675903,072915205,1580,'F' UNION ALL
SELECT 140675904,072915205,1580,'D'
Thank you for noting that!
do you only have "D" and "F" or are there other variants?
anyways...try this
SELECT
ID
, filenm
, docnum
, pg
, CASE
WHEN pg = 'F' THEN
ROW_NUMBER() OVER (ORDER BY id) - ROW_NUMBER() OVER (PARTITION BY pg ORDER BY id)
ELSE
ROW_NUMBER() OVER (PARTITION BY pg ORDER BY id)
END AS jls
FROM #t
ORDER BY
id;
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
August 21, 2015 at 9:52 am
Yes, we only ever have Ds and Fs. This is perfect - thank you so much!!!
August 21, 2015 at 9:56 am
jessica.green2312 (8/21/2015)
Yes, we only ever have Ds and Fs. This is perfect - thank you so much!!!
you are welcome....how long did it take to run?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
August 21, 2015 at 12:28 pm
It hardly takes any time at all. The craziness that runs immediately before takes just shy of 2 minutes and when I add this afterwards it was still finished at ~1:56. 🙂
August 23, 2015 at 12:46 pm
jessica.green2312 (8/21/2015)
It hardly takes any time at all. The craziness that runs immediately before takes just shy of 2 minutes and when I add this afterwards it was still finished at ~1:56. 🙂
To do what? Return it all to the screen? Also, what is the clustered index on the table?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply