Need help on performance and integrity

Question

Need help on performance and integrity

Viewing 15 posts - 31 through 45 (of 115 total)

You must be logged in to reply to this topic. Login to reply

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 1

AnzioBake (10/22/2008)
I had some trouble logging on so read the other posts after my reply
I had to make a change but now I am not sure if the result set is correct (see below)
4574 appears as a duplicate of 4572 which is a duplicate of 4571 -4570
RowIDRowKeyLocaleCode
45691deA
45711enA
45721enA
45741enA
45751enA
45761enA
45771enA
45791enA
45821enA
45831enA
45841enA
45851enA
45871enA
45891enA
45901enA
45921enA
45931enA
45951deA
45981deA
46001zhA
46011deC

Hi AnzioBake, yes you've got the CORRECT output on this second post of yours.

Thank you for your help.

Garadin One Orange Chip Points: 29613 More actions · Answer 2

Sorry for not getting back with you Dan. Lost the primary hard drive in my dev machine yesterday, so it was an all day affair getting the thing back up and running. Think I'm just about there, although the resolution / color balance on the monitors now seems all screwed up. In any case, looks like AnzioBake has discovered a solution.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

sgmunson SSC Guru Points: 110640 More actions · Answer 3

I tried to post this late in the day, yesterday, but was having trouble with the website:

Dan - I believe AnzioBake has now likely found the solution, but even he still had a question about exactly what constituted a duplicate: something for which you've never provided a complete and exact statement or definition that describes it in sufficiently excruciating detail AND handles all possible input.

I'm pretty sure that's why it's taking so long for someone to provide a solution. Also, you've never stated one word about WHY you need to do this. If we knew the ultimate objective, we'd have a far better chance at helping you get there.

Steve

(aka smunson)

:):):)

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 4

Garadin (10/22/2008)
Sorry for not getting back with you Dan. Lost the primary hard drive in my dev machine yesterday, so it was an all day affair getting the thing back up and running. Think I'm just about there, although the resolution / color balance on the monitors now seems all screwed up. In any case, looks like AnzioBake has discovered a solution.

No problem Seth, just waiting on what the solution AnzioBake does though but I think he got it already. But if you think you have time to fix the solution you have then that would be great. I really appreciate your help guys. Thank you for that.

RBarryYoung SSC Guru Points: 143329 More actions · Answer 5

AnzioBake (10/22/2008)
My solution is a bit long but executes in under three (3) minutes.
Here is s sample of the longer series and the results set. If Correct I will post my Solution...

Anzio, it is generally good form to always post your code here, as this gives others the chance to review it, check it and even test it. If you think that it might be too long for a post then just include it as an attachment. 🙂

[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung [font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]

AnzioBake SSCarpal Tunnel Points: 4134 More actions · Answer 6

I seem to have had some internet issues at work (where I was posting from earlier ) I will post the code as an attachment in the morning. Sorry for the delay, but I ended up in a meeting that took whole afternoon and left immediately.

Garadin One Orange Chip Points: 29613 More actions · Answer 7

Dan Segalles (10/22/2008)
But if you think you have time to fix the solution you have then that would be great.

Unfortunately, I've hit a dead wall with the method I was trying. I've tried it every which way I can think of (without completely ditching that method and using something different), and I get outdone by the query optimizer every time. It seems that it simply will not recursively run that subquery for each row. While that's really cool for the fact that it won't let me create RBAR(and has solved several other issues)... it's not very cool in the sense that it makes this one not work at all. Grabbing duplicates per group is easy. Grabbing them whilst ignoring other rows that happened before the last one, as well as including 'patches' of duplicates until the next change seems to add too much logic for this solution to handle.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

AnzioBake SSCarpal Tunnel Points: 4134 More actions · Answer 8

This is my solution. Having a problem at work connecting to website.

--Select * From #TempTable

IF OBJECT_ID ('TempDb..#Seq','U') IS NOT NULL

BEGIN

DROP TABLE #Seq

END

CREATE TABLE #Seq

(

RowID int PRIMARY KEY CLUSTERED,

Code char(1)

,NextSeqStartRowID int

)

IF OBJECT_ID ('TempDb..#Dupe','U') IS NOT NULL

BEGIN

DROP TABLE #Dupe

END

CREATE TABLE #Dupe (

SeqStartRowID int

, Code Char(1)

, NextSeqStartRowID int

, FirstRowID int

, DuplicateRowID int

, Primary Key Clustered( SeqStartRowID , Code , NextSeqStartRowID , FirstRowID )

)

IF OBJECT_ID ('TempDb..#Result','U') IS NOT NULL

BEGIN

DROP TABLE #Result

END

CREATE TABLE #Result (

RowID int PRIMARY KEY CLUSTERED,

RowKey varchar(1000),

Locale varchar(10),

Code char(1)

)

DECLARE @OptRows int; -- variable which records the optimum number of rows to scan

DECLARE@RowCnt int

SET @OptRows = 100;

----Find All the induvidual sequences -------------------

Insert into #Seq

Select

RowID = Min( RowiD)

, Code

, NextSeqStartRowID = isNUll( NextSeqStartRowID, Max(RowID)+1 )

From(

Select

a.RowID

, a.Code

, NextSeqStartRowID = Min( b.RowID )

From #TempTable as a

Inner Join #TempTable as b

on b.rowID > a.RowID

And b.RowID <= a.RowID +@OptRows

And b.Code <> a.Code

Group by

a.RowID

, a.Code

)as c

Group by

Code

, NextSeqStartRowID

-----------------------------------------------------------

----Identify all Duplicates in sequences--------------------

Insert into #Dupe

Select

SeqStartRowID = s.rowID

, Code = s.Code

, s.NextSeqStartRowID

, FirstRowID = t.RowID

, DuplicateRowID = Min( x.RowId )

From #Seq as s

Left Join #TempTable as t

on t.rowID >= s.RowID

And t.RowID < s.NextSeqStartRowID

Inner JOin #TempTable as x

on x.RowId >= s.RowID

And x.RowID < s.NextSeqStartRowID

And x.RowID > t.rowID

And x.Locale = t.locale

And x.rowKey = t.RowKey

Group by

s.rowID

, s.Code

, s.NextSeqStartRowID

, t.RowID

Order by

s.RowID

, t.rowID

---------------------------------------------------------

----Constuct the result set-------------------------------

Delete from #Result

Insert into #Result

Select d.*

From #Seq as s

Inner join #TempTable as d

on d.rowid = s.rowid

Set @RowCnt = 1

While @Rowcnt > 0 Begin

Insert into #Result

Select d.*

From

(

Select

S.LstRes

, RowId = Min( d.DuplicateRowId )

From(

----Find Last RowID in Result Set

Select

s.RowID

, s.Code

, s.NextSeqStartRowID

, LstRes = Max( r.RowID )

From #Seq as s

Inner join #Result as r

on r.RowId >= s.rowid

And r.RowID < s.NextSeqStartRowId

Group by

s.RowID

, s.Code

, s.NextSeqStartRowID

--------------------------------

) as s

Inner join #Dupe as d

on d.SeqStartRowID = s.rowid

And d.FirstRowID >= s.LstRes ----Can be cahnged to > if you do not want to copare to the last duplicate row

Group by

LstRes

) as x

Inner join #TempTable as d

on d.RowID = x.RowID

Select @RowCnt = @@RowCount

End

----------------------------------------------------

Select * From #Result

AnzioBake SSCarpal Tunnel Points: 4134 More actions · Answer 9

It has been 24 hours and no response. I just wanna confirm that the solution works and performs adequately.

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 10

AnzioBake (10/23/2008)
It has been 24 hours and no response. I just wanna confirm that the solution works and performs adequately.

Hello Anzio, I apologize for not getting back to you right away since I have to take care of some emergency matters. Anyway, I tested your code during the weekend for integrity but it wasn't giving me the correct output for some scenarios.

I simply test it with this kind of scenario:

INSERT #TempTable

SELECT 1,'1','en','A' UNION ALL

SELECT 2,'1','en','A' UNION ALL

SELECT 5,'1','en','C' UNION ALL

SELECT 7,'1','en','A' UNION ALL

SELECT 8,'1','zh','A' UNION ALL

SELECT 10,'1','en','A' UNION ALL

SELECT 11,'1','zh','A' UNION ALL

SELECT 20,'1','en','D' UNION ALL

SELECT 21,'1','en','D'

GO

Which should output:

RowIDRowKeyLocaleCode

21enA

51enC

71enA

101enA

201enD

Your script outputted this:

RowIDRowKeyLocaleCode

11enA

21enA

51enC

71enA

101enA

Thank for your help Anzio.

AnzioBake SSCarpal Tunnel Points: 4134 More actions · Answer 11

No problem with the delay. I just wanted to know if it has solved your problem...I will check the scenario where it gives the incorrect answer

AnzioBake SSCarpal Tunnel Points: 4134 More actions · Answer 12

Sorry for posting the whole script again. some minor changes required but the code below gives the required result.

I changed the script to

1. Take All codes into Account

2. Only look at duplicates for code = A and

3. Remove the first occurance of code = A from the result set.

Question

Do you want to remove the first occurance of Code ='A' or the first record? You can adjust the delete record script based on your answer.

IF OBJECT_ID ('TempDB..#TempTable','U') IS NOT NULL

BEGIN

DROP TABLE #TempTable

END

CREATE TABLE #TempTable

(

RowID int PRIMARY KEY CLUSTERED,

RowKey varchar(1000),

Locale varchar(10),

Code char(1)

)

GO

INSERT #TempTable

SELECT 1,'1','en','A' UNION ALL

SELECT 2,'1','en','A' UNION ALL

SELECT 5,'1','en','C' UNION ALL

SELECT 7,'1','en','A' UNION ALL

SELECT 8,'1','zh','A' UNION ALL

SELECT 10,'1','en','A' UNION ALL

SELECT 11,'1','zh','A' UNION ALL

SELECT 20,'1','en','D' UNION ALL

SELECT 21,'1','en','D'

GO

/*

Which should output:

RowID RowKey Locale Code

2 1 en A

5 1 en C

7 1 en A

10 1 en A

20 1 en D

Your script outputted this:

RowID RowKey Locale Code

1 1 en A

2 1 en A

5 1 en C

7 1 en A

10 1 en A

*/

--Select * From #TempTable

IF OBJECT_ID ('TempDb..#Seq','U') IS NOT NULL

BEGIN

DROP TABLE #Seq

END

CREATE TABLE #Seq

(

RowID int PRIMARY KEY CLUSTERED,

Code char(1)

,NextSeqStartRowID int

)

IF OBJECT_ID ('TempDb..#Dupe','U') IS NOT NULL

BEGIN

DROP TABLE #Dupe

END

CREATE TABLE #Dupe (

SeqStartRowID int

, Code Char(1)

, NextSeqStartRowID int

, FirstRowID int

, DuplicateRowID int

, Primary Key Clustered( SeqStartRowID , Code , NextSeqStartRowID , FirstRowID )

)

IF OBJECT_ID ('TempDb..#Result','U') IS NOT NULL

BEGIN

DROP TABLE #Result

END

CREATE TABLE #Result (

RowID int PRIMARY KEY CLUSTERED,

RowKey varchar(1000),

Locale varchar(10),

Code char(1)

)

DECLARE @OptRows int; -- variable which records the optimum number of rows to scan

DECLARE@RowCnt int

SET @OptRows = 100;

----Find All the induvidual sequences -------------------

Insert into #Seq

Select

RowID = Min( RowiD)

, Code

, NextSeqStartRowID = isNUll( NextSeqStartRowID, Max(RowID)+1 )

From(

Select

a.RowID

, a.Code

, NextSeqStartRowID = Min( b.RowID )

From #TempTable as a

Left Join #TempTable as b

on b.rowID > a.RowID

And b.RowID <= a.RowID +@OptRows

And b.Code <> a.Code

Group by

a.RowID

, a.Code

)as c

Group by

Code

, NextSeqStartRowID

-----------------------------------------------------------

----Identify all Duplicates in sequences--------------------

Insert into #Dupe

Select

SeqStartRowID = s.rowID

, Code = s.Code

, s.NextSeqStartRowID

, FirstRowID = t.RowID

, DuplicateRowID = Min( x.RowId )

From #Seq as s

Left Join #TempTable as t

on t.rowID >= s.RowID

And t.RowID < s.NextSeqStartRowID

Inner JOin #TempTable as x

on x.RowId >= s.RowID

And x.RowID < s.NextSeqStartRowID

And x.RowID > t.rowID

And x.Locale = t.locale

And x.rowKey = t.RowKey

Group by

s.rowID

, s.Code

, s.NextSeqStartRowID

, t.RowID

Order by

s.RowID

, t.rowID

---------------------------------------------------------

----Construct the result set-------------------------------

Delete from #Result

Insert into #Result

Select d.*

From #Seq as s

Inner join #TempTable as d

on d.rowid = s.rowid

Set @RowCnt = 1

While @Rowcnt > 0 Begin

Insert into #Result

Select d.*

From

(

Select

S.LstRes

, RowId = Min( d.DuplicateRowId )

From(

----Find Last RowID in Result Set

Select

s.RowID

, s.Code

, s.NextSeqStartRowID

, LstRes = Max( r.RowID )

From #Seq as s

Inner join #Result as r

on r.RowId >= s.rowid

And r.RowID < s.NextSeqStartRowId

Group by

s.RowID

, s.Code

, s.NextSeqStartRowID

--------------------------------

) as s

Inner join #Dupe as d

on d.SeqStartRowID = s.rowid

And d.FirstRowID >= s.LstRes ----Can be changed to > if you do not want to compare to the last duplicate row

And d.Code = 'A'--Only interested in Duplicates of A

Group by

LstRes

) as x

Inner join #TempTable as d

on d.RowID = x.RowID

Select @RowCnt = @@RowCount

End

----------------------------------------------------

--Remove First Record of the sequence for Code = 'A'---------

Delete from tgt

From #Result as tgt

Inner Join(

Select Min(RowID) as RowID

From #Seq

Where Code = 'A'

)as seq

on seq.RowID = Tgt.RowID

------------------------------

Select * From #Result

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 13

Also see

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=112721

http://www.sqlservercentral.com/Forums/Topic562564-338-1.aspx

http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=3829716&SiteID=1

http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=3928920&SiteID=17

N 56°04'39.16"
E 12°55'05.25"

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 14

Peso (10/27/2008)
Also see
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=112721
http://www.sqlservercentral.com/Forums/Topic562564-338-1.aspx
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=3829716&SiteID=1
http://forums.microsoft.com/TechNet/ShowPost.aspx?PostID=3928920&SiteID=17

Heh... but the question remains... has anyone done it correctly, yet?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 15

For records {1, 2} OP wants the second record with ID 2 because there is a duplicate record.

For records {7, 8, 10, 11} OP wants the first record with ID 7 because there are duplicate records.

Why first record for first batch and second record for second batch?

What are the business rules for this?

I know the business rule is to remove the first duplicate of every batch, but applying that rule on second batch would lead to return record with ID 8, not ID 7 as OP writes.

The reason I posted the many links is that in every link there are different descriptions of the business rules. I don't know which is correct.

N 56°04'39.16"
E 12°55'05.25"