October 20, 2008 at 10:04 am
The A's only is a very easy fix. The second part should have been handled by the variable check, but it's not reacting the way I thought it would. Looking into that now.
October 20, 2008 at 10:47 am
Heh. You know when I saw the title of this thread, I thought that it was another editorial about the Financial Crisis. 🙂
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
October 20, 2008 at 10:54 am
rbarryyoung (10/20/2008)
Heh. You know when I saw the title of this thread, I thought that it was another editorial about the Financial Crisis. 🙂
heh heh..sorry bout that, couldn't think of a good title for this one.
October 20, 2008 at 11:54 am
Ok, I just discovered something interesting. The reason it wasn't working before is because the @LastDupe variable inside that CASE statement does not get properly evaluated for the subselect. It will correctly insert into the table if you try to add it later in the update (or earlier), but inside of that subquery, it never gets evaluated. Hard coding a value works fine, but a variable won't work at all. I tried it 10 different ways, and every time it was ignored. That might help explain why it runs as well as it does.
[EDIT]
I had another solution posted here, but I tested it against some of your alternative test data(I did in fact just get lucky with the small data set 😉 ), and it is wrong as well, so at this point I'm scrapping that one too.
October 20, 2008 at 2:10 pm
Garadin (10/20/2008)
Ok, I just discovered something interesting. The reason it wasn't working before is because the @LastDupe variable inside that CASE statement does not get properly evaluated for the subselect. It will correctly insert into the table if you try to add it later in the update (or earlier), but inside of that subquery, it never gets evaluated. Hard coding a value works fine, but a variable won't work at all. I tried it 10 different ways, and every time it was ignored. That might help explain why it runs as well as it does.With that in mind, this gets a bit more complicated, as I have to now do it in multiple steps. I'm not quite as happy with this solution as I was before, but it should still be a *huge* improvement over your current method (assuming the new part actually works, and I didn't just get lucky with this small result set)....
It almost got the result that I'm expecting but when I tried it on this kind of pattern, the result was lacking..
INSERT #TempTable(RowID,RowKey,Locale,Code)
SELECT 15,'1','zh','A' UNION ALL
SELECT 16,'1','zh','A' UNION ALL
SELECT 20,'2','en','C' UNION ALL
SELECT 21,'3','de','C' UNION ALL
SELECT 22,'4','de','A' UNION ALL
SELECT 23,'2','en','A' UNION ALL
SELECT 24,'4','de','A' UNION ALL
SELECT 25,'4','de','A' UNION ALL
SELECT 30,'1','en','C' UNION ALL
SELECT 31,'1','en','C'
The code you supplied outputted this (which is lacking):
RowIDRowKeyLocaleCodeGroupIDDupe
161zhA11
202enC20
224deA30
244deA31
301enC40
The correct output should be:
RowIDRowKeyLocaleCode
161zhA
202enC
224deA
244deA
254deA
301enC
25 4deA - This should be included since in the second range/batch of A's it is a duplicate on 244deA
Thanks for your time and help.
P.S. The logic on how to get the result is on the attached file (demo.gif).
October 20, 2008 at 2:23 pm
My apologies for all the semi correct attempts, had I taken the time to read all the posts I glossed over from a couple days ago, I'd have seen most of the errors I made in this. I will take a crack at fixing the last piece of this later tonight if you haven't gotten an answer by then. (Got a couple projects I need to finish before the end of the day)
October 20, 2008 at 5:30 pm
Garadin (10/20/2008)
My apologies for all the semi correct attempts, had I taken the time to read all the posts I glossed over from a couple days ago, I'd have seen most of the errors I made in this. I will take a crack at fixing the last piece of this later tonight if you haven't gotten an answer by then. (Got a couple projects I need to finish before the end of the day)
Thank you very much for your time Garadin, I really appreciate your help.
October 21, 2008 at 6:40 am
/Quote
SELECT 14,'1','en','A' UNION ALL --/second(2) batch of A
SELECT 15,'3','en','A' UNION ALL --/second(2) batch of A
SELECT 16,'3','de','A' UNION ALL --/second(2) batch of A
SELECT 18,'1','en','A' UNION ALL --/second(2) batch of A <- dup on RowID 14 (checked)
SELECT 19,'3','de','A' UNION ALL --/second(2) batch of A <- I don't need to include this even if there is dup on RowID 16 since it is already covered in the range from RowID 14-18
SELECT 20,'1','en','C' UNION ALL
SELECT 21,'1','en','C'
Quote/
I am a bit confused why you would not want to identify RowId = 19 as a duplicate of RowId =16. Is this simply because you want to run this process iteratively i.e
first identify RowID 18 as a Duplicate and delete
thenn in next cycle identify 19 as dupe and delete??
October 21, 2008 at 7:10 am
AnzioBake (10/21/2008)
/QuoteSELECT 14,'1','en','A' UNION ALL --/second(2) batch of A
SELECT 15,'3','en','A' UNION ALL --/second(2) batch of A
SELECT 16,'3','de','A' UNION ALL --/second(2) batch of A
SELECT 18,'1','en','A' UNION ALL --/second(2) batch of A <- dup on RowID 14 (checked)
SELECT 19,'3','de','A' UNION ALL --/second(2) batch of A <- I don't need to include this even if there is dup on RowID 16 since it is already covered in the range from RowID 14-18
SELECT 20,'1','en','C' UNION ALL
SELECT 21,'1','en','C'
Quote/
I am a bit confused why you would not want to identify RowId = 19 as a duplicate of RowId =16. Is this simply because you want to run this process iteratively i.e
first identify RowID 18 as a Duplicate and delete
thenn in next cycle identify 19 as dupe and delete??
Yes that would be like that of some sort...since RowID 18 is a first occurrence found as duplicate (dupe from RowID 14), the next time I search for another duplicate will start on RowID 18 onwards..
October 21, 2008 at 8:26 am
Dan,
Not to rain on your quest here, but I don't think you've actually provided a complete statement of the problem. I know that it's very clear to YOU what result you want, but all we have is a picture, and some limited descriptions. What we do NOT have is a step by step algorithm for deciding that any given row is to be included. If you can write out a step by step completely specific procedure, that for ANY given row of input, decides whether or not that row is included, you'll be 99.999% of the way to a solution. I've read your diagram several times, and am still not clear on the exact set of rules to follow. I'm pretty sure we understand about grouping the sets of A's together, and that's fine, but once one is inside a group of A's, what's next? The algorithm needs to be able to take into account ALL possible input combinations that the rest of a given group of A's can have.
Steve
(aka smunson)
:):):)
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
October 21, 2008 at 10:10 am
smunson (10/21/2008)
Dan,Not to rain on your quest here, but I don't think you've actually provided a complete statement of the problem. I know that it's very clear to YOU what result you want, but all we have is a picture, and some limited descriptions. What we do NOT have is a step by step algorithm for deciding that any given row is to be included. If you can write out a step by step completely specific procedure, that for ANY given row of input, decides whether or not that row is included, you'll be 99.999% of the way to a solution. I've read your diagram several times, and am still not clear on the exact set of rules to follow. I'm pretty sure we understand about grouping the sets of A's together, and that's fine, but once one is inside a group of A's, what's next? The algorithm needs to be able to take into account ALL possible input combinations that the rest of a given group of A's can have.
Steve
(aka smunson)
:):):)
Thank you for pointing this out Steve and sorry for that one. Here is another attachment (Demo2.JPG) to explain how to get the result inside group of A. I hope this will make everything clear.
Thank you for your help.
October 21, 2008 at 8:28 pm
I swear I've done this problem on this forum before... :hehe:
Dan, in the GIF file example, please tell me why RowID 10 is NOT considered to be a dupe of RowID 3, but you include RowID 23 as a dupe of RowID 21?
Is it just because the rowkey changed to 2 and then went back to 1 between 10 and 3 but did not between 23 and 21?
(and, yeah, you have an error in the instructions on that nice graphic you made. There is no RowID 13... you meant RowID 23.). 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
October 22, 2008 at 1:49 am
On my Desktop, because you mentioned performance, I created a table with 1 Million random records that a tweaked to ensure that I got Duplicates and series of data
I have series ranging from 1- 35 records and multiple Duplicates.
The result required is:
the first Row of a series
the first duplicate row
followed by the duplicate of a row after the previous duplicate row.
My solution is a bit long but executes in under three (3) minutes.
Here is s sample of the longer series and the results set. If Correct I will post my Solution
Data:
RowIDRowKeyLocaleCode
45691deA
45701enA
45711enA
45721enA
45731zhA
45741enA
45751enA
45761enA
45771enA
45781deA
45791enA
45801zhA
45811deA
45821enA
45831enA
45841enA
45851enA
45861deA
45871enA
45881zhA
45891enA
45901enA
45911zhA
45921enA
45931enA
45941deA
45951deA
45961zhA
45971enA
45981deA
45991zhA
46001zhA
46011deC
Results:
RowIDRowKeyLocaleCode
45691deA
45711enA
45741enA
45761enA
45791enA
45831enA
45851enA
45891enA
45921enA
45951deA
45991zhA
46011deC
October 22, 2008 at 2:12 am
I had some trouble logging on so read the other posts after my reply
I had to make a change but now I am not sure if the result set is correct (see below)
4574 appears as a duplicate of 4572 which is a duplicate of 4571 -4570
RowIDRowKeyLocaleCode
45691deA
45711enA
45721enA
45741enA
45751enA
45761enA
45771enA
45791enA
45821enA
45831enA
45841enA
45851enA
45871enA
45891enA
45901enA
45921enA
45931enA
45951deA
45981deA
46001zhA
46011deC
October 22, 2008 at 5:48 am
Jeff Moden (10/21/2008)
I swear I've done this problem on this forum before... :hehe:Dan, in the GIF file example, please tell me why RowID 10 is NOT considered to be a dupe of RowID 3, but you include RowID 23 as a dupe of RowID 21?
Is it just because the rowkey changed to 2 and then went back to 1 between 10 and 3 but did not between 23 and 21?
Jeff, RowID 10 is NOT considered to be a dupe of RowID 3 because from RowID 3 onwards, iteratively the first dupe found is at RowID 5 (In this case I already have the range from RowID 3-5 which has unique data within it and need not to consider those data within that certain range to find another one) and therefore I should start looking again for another dupe from RowID 5 onwards.
Jeff Moden (10/21/2008)
(and, yeah, you have an error in the instructions on that nice graphic you made. There is no RowID 13... you meant RowID 23.). :)[
Oh yeah, damn haven't notice that...yes, I mean RowID 23 and NOT RowID 13.
Thanks Jeff.
Viewing 15 posts - 16 through 30 (of 115 total)
You must be logged in to reply to this topic. Login to reply