Need help on performance and integrity

Question

Need help on performance and integrity

Viewing 15 posts - 46 through 60 (of 115 total)

You must be logged in to reply to this topic. Login to reply

sgmunson SSC Guru Points: 110640 More actions · Answer 1

I continue to think that a complete statement of the problem coming from the OP has yet to occur. There is no written description of the EXACT process to follow for any given record within a group of A's. I'm not even sure the OP knows what he wants.

Dan - PLEASE provide a COMPLETE description of EXACTLY how to handle ANY GIVEN RECORD within a set of A's, that will give the CORRECT RESULT, regardless of the input, and TAKES INTO ACCOUNT ALL POSSIBLE INPUTS. The description needs to be written, and it needs to leave absolutely NO ROOM for doubt about what the words mean. I'm thoroughly convinced at this point that this problem will not be solvable until we understand what you want in the EXACT SAME way that YOU understand it. We need to be able to read the description and come away without any questions. So far, every description you've provided has left any of a number of questions, and some of the behavior you're asking for in producing the correct result could only be a result of an inconsistent set of rules or some other undisclosed rule that we haven't seen you post as yet. Make NO assumptions, ok?

Steve

(aka smunson)

:):):)

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Garadin One Orange Chip Points: 29613 More actions · Answer 2

Or... the even more elusive question... what on earth the end goal of this thing really is. Part of the reason we are unable to come up with any good set based approach is because I don't think anyone (with the possible exception of yourself) really understands what you're trying to do here.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 3

This thread is still going !!?!! I'm sorry to have missed all the fun, but I'm feeling lucky, so I'll try again. If the results are correct, then someone else can make it go faster. From what I've read, there are only two things to identify:

1. The rows where the Code column "breaks" to a new code, which doesn't include the first row in the table.

2. Any duplicates (same Key, Locale and Code) within the range between breaks, OTHER than the first row in each set of duplicates.

So anyway, here goes.

============================================

CREATE TABLE #TempTable

( SeqID int identity(1,1) primary key

,RowID int

,RowKey varchar(1000)

,Locale varchar(10)

,Code char(1)

)

INSERT INTO #TempTable

SELECT 1 as rowID,'1','en','A' UNION ALL

SELECT 2,'1','en','A' UNION ALL

SELECT 5,'1','en','C' UNION ALL

SELECT 7,'1','en','A' UNION ALL

SELECT 8,'1','zh','A' UNION ALL

SELECT 10,'1','en','A' UNION ALL

SELECT 11,'1','zh','A' UNION ALL

SELECT 20,'1','en','D' UNION ALL

SELECT 21,'1','en','D'

order by rowID -- be sure to order by rowID

select * from #temptable

;with cte1 as -- identify breaks

(select t1.code,t1.rowid

from #temptable t1

left join #temptable t2 on t1.seqID = t2.seqID+1

where t1.code <> t2.code

)

,cte1a as -- include first row for ranging purposes

(select top 1 code,rowID

from #temptable

union all

select * from cte1)

,cte2 as -- identify ranges for duplicates

(select c1.code,c1.rowid as minRow,isnull(min(c1a.rowID)-1,c1.rowid+1) as maxRow

from cte1a c1

left join cte1a c1a on c1a.code <> c1.code and c1a.rowID > c1.rowID

-- where c1.code = 'A'

group by c1.code,c1.rowID

)

,cte3 as -- identify duplicates

(select c.minrow,max(rowid) as dupID,t.rowKey,t.locale,c.code

from #temptable t

cross join cte2 c

where t.rowid >=minRow and t.rowID <= maxRow

group by c.minRow,t.rowKey,t.locale,c.code

having count(*) > 1

)

,cte4 as

(select dupID as rowID

from cte3

union all

select rowID

from cte1

)

select distinct c.rowID,t.rowkey,t.locale,t.code

from cte4 c

join #temptable t on c.rowID = t.rowID

order by c.rowID

drop table #temptable

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Garadin One Orange Chip Points: 29613 More actions · Answer 4

bhovious (10/27/2008)
1. The rows where the Code column "breaks" to a new code, which doesn't include the first row in the table.
2. Any duplicates (same Key, Locale and Code) within the range between breaks, OTHER than the first row in each set of duplicates.

Unfortunately it's considerably more complicated than that. He wants it done iteratively, meaning you can't just look for duplicates, you have to go through it line by line taking into account where the last duplicate was within the code group, and restarting a duplicate match search from the last one. However, if there is a copy of a dupe immediately following the exact same dupe, that one is counted as well, but not if there is a different key/locale between them. Beyond that, there's discrepancies over whether the first or second dupe in an iterative series is kept. That's only my current understanding of it, and I don't think that quite encompasses everything. I've tried to do this using a clustered index update (ie. running total technique) about 30 different ways now, but it simply will not work with any local variable inside of the update statement. Nor will it look at the last row updated. Beyond that, with the row by row logic being applied, I simply do not see any set based logic working out here.

My only thoughts for optimization of this currently go towards breaking up your huge table of data into smaller chunks with a second outer loop (with each chunk also taking 100 rows of the next chunk to satisfy your @optrows for scanning) that lets you perform this operation on smaller data sets in hopes of improving the speed of the scans. This may just as easily hurt the performance as help it, but it's something to try.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 5

Hey Seth,

Could you give me one example of the iterative situation you describe. I didn't see that in all of the examples I looked at. If you could even point me to the example data he used, I'd be grateful. (Is this what he means back on page 1 when he says row 19 is already covered by the range 14-18?)

Bob

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Garadin One Orange Chip Points: 29613 More actions · Answer 6

Basically, this:

IDRowKeyLocaleCode

11ENA

22ENA

32DUA

41ENA

52DUA

You see that Rows 3 and 5 are duplicates. However, because Row 4 is a duplicate of Row 1, the matching starts over at row 4. There is no duplicate of row 5 that is greater than row 4. The proper output in this scenario would be 1 and 4. (as far as I understand it)

Here's where it gets really hairy.

If you have this:

IDRowKeyLocaleCode

11ENA

22ENA

32DUA

41ENA

51ENA

61ENA

72DUA

81ENA

92DUA

My understanding is that he wants 1,4,5,6,9 returned. 5 and 6 get added in because it's a "group" of duplicates. 7 still gets skipped because there is nothing above the last dupe (6) which matches it, and 8 gets skipped because the "group" of 1 EN A has now been broken. 9 gets returned because it is a duplicate of 7 which was the next candidate after the last dupe of 6.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Goldie Lesser SSCertifiable Points: 6155 More actions · Answer 7

bhovious (10/27/2008)
Hey Seth,
Could you give me one example of the iterative situation you describe. I didn't see that in all of the examples I looked at. If you could even point me to the example data he used, I'd be grateful. (Is this what he means back on page 1 when he says row 19 is already covered by the range 14-18?)
Bob

Seth's solution is actually iterative. He goes through each row in the table in order and flags duplicates. Take a look at his second post on this thread.

Read this article for further explanation on how this approach works http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

Garadin One Orange Chip Points: 29613 More actions · Answer 8

ggraber (10/27/2008)
bhovious (10/27/2008)
Hey Seth,
Could you give me one example of the iterative situation you describe. I didn't see that in all of the examples I looked at. If you could even point me to the example data he used, I'd be grateful. (Is this what he means back on page 1 when he says row 19 is already covered by the range 14-18?)
Bob
Seth's solution is actually iterative. He goes through each row in the table in order and flags duplicates. Take a look at his second post on this thread.
Read this article for further explanation on how this approach works http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

Or more aptly in this case, doesn't work. Because I need to start the searching over after the last row that matched the criteria, it completely breaks that method. I would dearly love to see someone get that to work, but no matter how much I change the thing around, the optimizer will not recursively:

A. Look for the highest row that has been flagged as a duplicate or

B. Use a local variable inside the search.

I've written that statement in a dozen different ways that were logically correct and should have solved it... but the problem is, the optimizer converts them all to derived tables that it generates before the update statement runs either off the local variable being 1 or the max duplicated rownumber in the table being NULL. It refuses to recursively run anything to take into account the last record that was a duplicate and search from there. It's actually driving me nuts.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 9

smunson (10/27/2008)
I continue to think that a complete statement of the problem coming from the OP has yet to occur. There is no written description of the EXACT process to follow for any given record within a group of A's. I'm not even sure the OP knows what he wants.
Dan - PLEASE provide a COMPLETE description of EXACTLY how to handle ANY GIVEN RECORD within a set of A's, that will give the CORRECT RESULT, regardless of the input, and TAKES INTO ACCOUNT ALL POSSIBLE INPUTS. The description needs to be written, and it needs to leave absolutely NO ROOM for doubt about what the words mean. I'm thoroughly convinced at this point that this problem will not be solvable until we understand what you want in the EXACT SAME way that YOU understand it. We need to be able to read the description and come away without any questions. So far, every description you've provided has left any of a number of questions, and some of the behavior you're asking for in producing the correct result could only be a result of an inconsistent set of rules or some other undisclosed rule that we haven't seen you post as yet. Make NO assumptions, ok?
Steve
(aka smunson)
:):):)

Hi Steve,

I just thought that the images that I provided was clear enough to achieve my goal here but I guess i'm wrong. I just need two things on this:

A. Get the RowIDs of the interleaving rows by Code, meaning the rows where the Code column "breaks" to a new code, which doesn't include the first row in the table as bhovious have said.

B. And second, is to look into every group of Code 'A'. Get the interleaving rows of duplicates within that group but should start from every previous duplicate found. This is to separate the duplicate so that for every range of rowids within A, no duplicates would be present.

Lets say for example, I have this data inside Code 'A'.

SELECT 1 as RowID,'1' as RowKey,'en' as Locale,'A' as Code UNION ALL

SELECT 2,'1','zh','A' UNION ALL

SELECT 3,'1','zh','A' UNION ALL

SELECT 4,'2','en','A' UNION ALL

SELECT 5,'2','en','A' UNION ALL

SELECT 10,'1','zh','A' UNION ALL

SELECT 11,'1','zh','A'

The result of this would be:

RowIDs 3,5, and 11.

Notice that the range result of this example has unique data inside it, duplicates were separated.

The ranges produced in the result was RowIDs 1-2, 3-4, 5-10, and 11 in which for every range, there

are no duplicates since it was already divided/separated and that's the reason why I wan't to iteratively locate

the duplicates and always start from the previous duplicate found to find another one.

After that, I need to combined the result of A & B.

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 10

What if there were an additional row in your example?

SELECT 6,'2','en','A' UNION ALL

Include 5? Include 6? Include 5 and 6 both?

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Dan Segalles SSC-Addicted Points: 412 More actions · Answer 11

bhovious (10/27/2008)
What if there were an additional row in your example?
SELECT 6,'2','en','A' UNION ALL
Include 5? Include 6? Include 5 and 6 both?

It should include both 5 & 6 since 5 is a dupe on 4 and 6 is a dupe on 5. Always start looking for another duplicate from the previous duplicate found.

AnzioBake SSCarpal Tunnel Points: 4134 More actions · Answer 12

Hi Dan

If you Check my last post your should be getting the correct results But need to change the last query before viewing results to below

INSERT #TempTable

SELECT 0,'1','en','C' UNION ALL

SELECT 1,'1','en','A' UNION ALL

SELECT 2,'1','en','A' UNION ALL

SELECT 5,'1','en','C' UNION ALL

SELECT 7,'1','en','A' UNION ALL

SELECT 8,'1','zh','A' UNION ALL

SELECT 10,'1','en','A' UNION ALL

SELECT 11,'1','zh','A' UNION ALL

SELECT 12,'1','zh','A' UNION ALL

SELECT 13,'1','zh','A' UNION ALL

SELECT 14,'1','zh','A' UNION ALL

SELECT 20,'1','en','D' UNION ALL

SELECT 21,'1','en','D'

GO

--Remove First Record in the table---------

Delete from tgt

From #Result as tgt

Inner Join(

Select Min(RowID) as RowID

From #Seq

)as seq

on seq.RowID = Tgt.RowID

------------------------------

Select * From #Result

]/code

AnzioBake SSCarpal Tunnel Points: 4134 More actions · Answer 13

The above change produces the following results

RowIDRowKeyLocaleCode

11enA

21enA

51enC

71enA

101enA

121zhA

131zhA

141zhA

201enD

Note

the first row in the table is removed RowId= 0

Rows 12, 13, 14 are all in the result set as required

Am I still missing something in the solution?

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 14

Thanks for the reply, I believe I understand now. Regardless of the "why", this was an interesting problem to look at. If Anziobake's solution still doesn't get it right, I'll give it another shot this evening.

It seems like a set-based solution should be possible, but no guarantees about the speed.

The final set should include the breaks from the Code Column and a subset of the "A" duplicates. If I'm on the right track, the definition of this subset eliminates any duplicates which were matched to a rowID which is less than the rowID of any other duplicate. Although this seems iterative, I still think someone can get there. I'm posting this thought because work has gotten busy and I may not be able to follow up tonight. Good hunting, Anzio and Garadin!!

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Garadin One Orange Chip Points: 29613 More actions · Answer 15

Ok, so I've given up on the old method, and come up with this. Not set based :crying:. Hopefully I didn't completely bugger up my cursor syntax, I had to look it up several times. Dan, if you could, let us know how both of these solutions work time and consistency wise.

Warning to sensitive eyes, RBAR follows :hehe:.

[font="Courier New"]USE tempdb;

SET NOCOUNT ON;

GO

IF OBJECT_ID ('#TempTable','U') IS NOT NULL

BEGIN

DROP TABLE #TempTable

END

IF OBJECT_ID ('#Deleted','U') IS NOT NULL

BEGIN

DROP TABLE #Deleted

END

CREATE TABLE #TempTable

(

RowID INT PRIMARY KEY CLUSTERED,

RowKey VARCHAR(1000),

Locale VARCHAR(10),

Code CHAR(1),

GroupID INT NULL

)

GO

INSERT #TempTable(RowID, RowKey, Locale, Code)

SELECT 1,'1','en','C' UNION ALL

SELECT 2,'1','de','C' UNION ALL

SELECT 3,'1','zh','A' UNION ALL

SELECT 4,'2','en','A' UNION ALL

SELECT 5,'2','en','A' UNION ALL

SELECT 10,'1','zh','A' UNION ALL

SELECT 11,'1','zh','A' UNION ALL

SELECT 15,'1','zh','C' UNION ALL

SELECT 16,'1','de','C' UNION ALL

SELECT 20,'1','en','A' UNION ALL

SELECT 21,'3','en','A' UNION ALL

SELECT 22,'3','de','A' UNION ALL

SELECT 23,'3','en','A' UNION ALL

SELECT 24,'3','de','A' UNION ALL

SELECT 30,'1','en','C' UNION ALL

SELECT 31,'1','en','C'

DECLARE @GroupID INT,

@PrevCode CHAR(5),

@RowID INT,

@Locale VARCHAR(5),

@RowKey VARCHAR(1000),

@PrevGroupID INT

SET @PrevCode = (SELECT Code FROM #TempTable WHERE RowID = 1)

SET @GroupID = 1

-----------------------------------------------------------

-- Breaks your data out into groups so it can be scanned --

UPDATE #TempTable

SET @GROUPID = CASE WHEN Code <> @PrevCode THEN @GroupID + 1 ELSE @GroupID END,

GroupID = @GroupID,

@PrevCode = Code

FROM #TempTable WITH (INDEX(0))

-----------------------------------------------------------

DECLARE @CurRows TABLE (

RowKey VARCHAR(1000),

Locale VARCHAR(10))

CREATE TABLE #Deleted(

RowID INT)

DECLARE TTCursor CURSOR FORWARD_ONLY FOR

SELECT RowID, GroupID, RowKey, Locale

FROM #TempTable

WHERE Code = 'A'

ORDER BY RowID

OPEN TTCursor

FETCH NEXT FROM TTCursor INTO @RowID, @GroupID, @RowKey, @Locale

WHILE @@FETCH_STATUS > -1

BEGIN

IF EXISTS (SELECT * FROM @CurRows WHERE RowKey = @RowKey AND Locale = @Locale)

BEGIN

INSERT INTO #Deleted(RowID)

VALUES(@RowID)

DELETE FROM @CurRows

END

INSERT INTO @CurRows(RowKey, Locale) -- Delete all the rows from the table)

VALUES(@RowKey, @Locale) -- Repopulate the table with the current row (this

IF @GroupID <> @PrevGroupID

DELETE FROM @CurRows

SET @PrevGroupID = @GroupID

FETCH NEXT FROM TTCursor INTO @RowID, @GroupID, @RowKey, @Locale

END

CLOSE TTCursor

DEALLOCATE TTCursor

SELECT * FROM #Deleted -- See just the ones to be deleted from the cursor.

SELECT * FROM #TempTable T -- See all rows matching criteria.

LEFT JOIN #Deleted D ON T.RowID = D.RowID

LEFT JOIN (SELECT GroupID, MIN(RowID) FirstGroupRow FROM #TempTable WHERE GROUPID > 1 GROUP BY GroupID) T2 ON T.RowID = T2.FirstGroupRow

WHERE D.RowID IS NOT NULL OR T2.FirstGroupRow IS NOT NULL[/font]

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]