Need help on performance and integrity

Question

Need help on performance and integrity

Viewing 15 posts - 16 through 30 (of 115 total)

You must be logged in to reply to this topic. Login to reply

Garadin One Orange Chip Points: 29613 More actions · Answer 1

The A's only is a very easy fix. The second part should have been handled by the variable check, but it's not reacting the way I thought it would. Looking into that now.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

RBarryYoung SSC Guru Points: 143329 More actions · Answer 2

Heh. You know when I saw the title of this thread, I thought that it was another editorial about the Financial Crisis. 🙂

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 3

rbarryyoung (10/20/2008)
Heh. You know when I saw the title of this thread, I thought that it was another editorial about the Financial Crisis. 🙂

heh heh..sorry bout that, couldn't think of a good title for this one.

Garadin One Orange Chip Points: 29613 More actions · Answer 4

Ok, I just discovered something interesting. The reason it wasn't working before is because the @LastDupe variable inside that CASE statement does not get properly evaluated for the subselect. It will correctly insert into the table if you try to add it later in the update (or earlier), but inside of that subquery, it never gets evaluated. Hard coding a value works fine, but a variable won't work at all. I tried it 10 different ways, and every time it was ignored. That might help explain why it runs as well as it does.

[EDIT]

I had another solution posted here, but I tested it against some of your alternative test data(I did in fact just get lucky with the small data set 😉 ), and it is wrong as well, so at this point I'm scrapping that one too.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 5

Garadin (10/20/2008)
Ok, I just discovered something interesting. The reason it wasn't working before is because the @LastDupe variable inside that CASE statement does not get properly evaluated for the subselect. It will correctly insert into the table if you try to add it later in the update (or earlier), but inside of that subquery, it never gets evaluated. Hard coding a value works fine, but a variable won't work at all. I tried it 10 different ways, and every time it was ignored. That might help explain why it runs as well as it does.
With that in mind, this gets a bit more complicated, as I have to now do it in multiple steps. I'm not quite as happy with this solution as I was before, but it should still be a *huge* improvement over your current method (assuming the new part actually works, and I didn't just get lucky with this small result set)....

It almost got the result that I'm expecting but when I tried it on this kind of pattern, the result was lacking..

INSERT #TempTable(RowID,RowKey,Locale,Code)

SELECT 15,'1','zh','A' UNION ALL

SELECT 16,'1','zh','A' UNION ALL

SELECT 20,'2','en','C' UNION ALL

SELECT 21,'3','de','C' UNION ALL

SELECT 22,'4','de','A' UNION ALL

SELECT 23,'2','en','A' UNION ALL

SELECT 24,'4','de','A' UNION ALL

SELECT 25,'4','de','A' UNION ALL

SELECT 30,'1','en','C' UNION ALL

SELECT 31,'1','en','C'

The code you supplied outputted this (which is lacking):

RowIDRowKeyLocaleCodeGroupIDDupe

161zhA11

202enC20

224deA30

244deA31

301enC40

The correct output should be:

RowIDRowKeyLocaleCode

161zhA

202enC

224deA

244deA

254deA

301enC

25 4deA - This should be included since in the second range/batch of A's it is a duplicate on 244deA

Thanks for your time and help.

P.S. The logic on how to get the result is on the attached file (demo.gif).

Garadin One Orange Chip Points: 29613 More actions · Answer 6

My apologies for all the semi correct attempts, had I taken the time to read all the posts I glossed over from a couple days ago, I'd have seen most of the errors I made in this. I will take a crack at fixing the last piece of this later tonight if you haven't gotten an answer by then. (Got a couple projects I need to finish before the end of the day)

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 7

Garadin (10/20/2008)
My apologies for all the semi correct attempts, had I taken the time to read all the posts I glossed over from a couple days ago, I'd have seen most of the errors I made in this. I will take a crack at fixing the last piece of this later tonight if you haven't gotten an answer by then. (Got a couple projects I need to finish before the end of the day)

Thank you very much for your time Garadin, I really appreciate your help.

AnzioBake SSCarpal Tunnel Points: 4134 More actions · Answer 8

/Quote

SELECT 14,'1','en','A' UNION ALL --/second(2) batch of A

SELECT 15,'3','en','A' UNION ALL --/second(2) batch of A

SELECT 16,'3','de','A' UNION ALL --/second(2) batch of A

SELECT 18,'1','en','A' UNION ALL --/second(2) batch of A <- dup on RowID 14 (checked)

SELECT 19,'3','de','A' UNION ALL --/second(2) batch of A <- I don't need to include this even if there is dup on RowID 16 since it is already covered in the range from RowID 14-18

SELECT 20,'1','en','C' UNION ALL

SELECT 21,'1','en','C'

Quote/

I am a bit confused why you would not want to identify RowId = 19 as a duplicate of RowId =16. Is this simply because you want to run this process iteratively i.e

first identify RowID 18 as a Duplicate and delete

thenn in next cycle identify 19 as dupe and delete??

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 9

AnzioBake (10/21/2008)
/Quote
SELECT 14,'1','en','A' UNION ALL --/second(2) batch of A
SELECT 15,'3','en','A' UNION ALL --/second(2) batch of A
SELECT 16,'3','de','A' UNION ALL --/second(2) batch of A
SELECT 18,'1','en','A' UNION ALL --/second(2) batch of A <- dup on RowID 14 (checked)
SELECT 19,'3','de','A' UNION ALL --/second(2) batch of A <- I don't need to include this even if there is dup on RowID 16 since it is already covered in the range from RowID 14-18
SELECT 20,'1','en','C' UNION ALL
SELECT 21,'1','en','C'
Quote/
I am a bit confused why you would not want to identify RowId = 19 as a duplicate of RowId =16. Is this simply because you want to run this process iteratively i.e
first identify RowID 18 as a Duplicate and delete
thenn in next cycle identify 19 as dupe and delete??

Yes that would be like that of some sort...since RowID 18 is a first occurrence found as duplicate (dupe from RowID 14), the next time I search for another duplicate will start on RowID 18 onwards..

sgmunson SSC Guru Points: 110640 More actions · Answer 10

Dan,

Not to rain on your quest here, but I don't think you've actually provided a complete statement of the problem. I know that it's very clear to YOU what result you want, but all we have is a picture, and some limited descriptions. What we do NOT have is a step by step algorithm for deciding that any given row is to be included. If you can write out a step by step completely specific procedure, that for ANY given row of input, decides whether or not that row is included, you'll be 99.999% of the way to a solution. I've read your diagram several times, and am still not clear on the exact set of rules to follow. I'm pretty sure we understand about grouping the sets of A's together, and that's fine, but once one is inside a group of A's, what's next? The algorithm needs to be able to take into account ALL possible input combinations that the rest of a given group of A's can have.

Steve

(aka smunson)

:):):)

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 11

smunson (10/21/2008)
Dan,
Not to rain on your quest here, but I don't think you've actually provided a complete statement of the problem. I know that it's very clear to YOU what result you want, but all we have is a picture, and some limited descriptions. What we do NOT have is a step by step algorithm for deciding that any given row is to be included. If you can write out a step by step completely specific procedure, that for ANY given row of input, decides whether or not that row is included, you'll be 99.999% of the way to a solution. I've read your diagram several times, and am still not clear on the exact set of rules to follow. I'm pretty sure we understand about grouping the sets of A's together, and that's fine, but once one is inside a group of A's, what's next? The algorithm needs to be able to take into account ALL possible input combinations that the rest of a given group of A's can have.
Steve
(aka smunson)
:):):)

Thank you for pointing this out Steve and sorry for that one. Here is another attachment (Demo2.JPG) to explain how to get the result inside group of A. I hope this will make everything clear.

Thank you for your help.

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 12

I swear I've done this problem on this forum before... :hehe:

Dan, in the GIF file example, please tell me why RowID 10 is NOT considered to be a dupe of RowID 3, but you include RowID 23 as a dupe of RowID 21?

Is it just because the rowkey changed to 2 and then went back to 1 between 10 and 3 but did not between 23 and 21?

(and, yeah, you have an error in the instructions on that nice graphic you made. There is no RowID 13... you meant RowID 23.). 🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

AnzioBake SSCarpal Tunnel Points: 4134 More actions · Answer 13

On my Desktop, because you mentioned performance, I created a table with 1 Million random records that a tweaked to ensure that I got Duplicates and series of data

I have series ranging from 1- 35 records and multiple Duplicates.

The result required is:

the first Row of a series

the first duplicate row

followed by the duplicate of a row after the previous duplicate row.

My solution is a bit long but executes in under three (3) minutes.

Here is s sample of the longer series and the results set. If Correct I will post my Solution

Data:

RowIDRowKeyLocaleCode

45691deA

45701enA

45711enA

45721enA

45731zhA

45741enA

45751enA

45761enA

45771enA

45781deA

45791enA

45801zhA

45811deA

45821enA

45831enA

45841enA

45851enA

45861deA

45871enA

45881zhA

45891enA

45901enA

45911zhA

45921enA

45931enA

45941deA

45951deA

45961zhA

45971enA

45981deA

45991zhA

46001zhA

46011deC

Results:

RowIDRowKeyLocaleCode

45691deA

45711enA

45741enA

45761enA

45791enA

45831enA

45851enA

45891enA

45921enA

45951deA

45991zhA

46011deC

AnzioBake SSCarpal Tunnel Points: 4134 More actions · Answer 14

I had some trouble logging on so read the other posts after my reply

I had to make a change but now I am not sure if the result set is correct (see below)

4574 appears as a duplicate of 4572 which is a duplicate of 4571 -4570

RowIDRowKeyLocaleCode

45691deA

45711enA

45721enA

45741enA

45751enA

45761enA

45771enA

45791enA

45821enA

45831enA

45841enA

45851enA

45871enA

45891enA

45901enA

45921enA

45931enA

45951deA

45981deA

46001zhA

46011deC

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 15

Jeff Moden (10/21/2008)
I swear I've done this problem on this forum before... :hehe:
Dan, in the GIF file example, please tell me why RowID 10 is NOT considered to be a dupe of RowID 3, but you include RowID 23 as a dupe of RowID 21?
Is it just because the rowkey changed to 2 and then went back to 1 between 10 and 3 but did not between 23 and 21?

Jeff, RowID 10 is NOT considered to be a dupe of RowID 3 because from RowID 3 onwards, iteratively the first dupe found is at RowID 5 (In this case I already have the range from RowID 3-5 which has unique data within it and need not to consider those data within that certain range to find another one) and therefore I should start looking again for another dupe from RowID 5 onwards.

Jeff Moden (10/21/2008)
(and, yeah, you have an error in the instructions on that nice graphic you made. There is no RowID 13... you meant RowID 23.). :)[

Oh yeah, damn haven't notice that...yes, I mean RowID 23 and NOT RowID 13.

Thanks Jeff.