Need help on performance and integrity

  • Compromise Solution:

    The following code uses set-based logic to get the set of all valid AND invalid duplicates within each 'A' range. It populates the @worktable table variable with all of the dups and the rowid of the previous row with the same key/locale/code (the [preMax] column.)

    The compromise is to then update this table using a CASE statement that tests the value of the [premax] colunmn against the value of the @lastDup variable and resets the variable. This update may run faster than the While loop. Dan, please test it out for performance. I have no objections to the table variable being converted to a temp table if you take this to production.

    (Yes, Anzio, under the covers it's all loops. We're talking EXPLICIT loops like Steve/smunson said. The difference is that in while loops, the inserts/select queries are running multiple times, which is usually less efficient than running one query.)

    ------------------------------------------------------------------------------------------------------------------

    declare @worktable table (ARange int, rowID int primary key, rowKey int, locale char(2), code char(1), preMin int, preMax int,lastDup int)

    declare @example int

    declare @lastDup int

    set @example = 3

    drop table #temptable

    CREATE TABLE #TempTable

    (RowID int primary key

    ,RowKey varchar(1000)

    ,Locale varchar(10)

    ,Code char(1)

    )

    IF @example = 1

    begin

    INSERT INTO #TempTable

    SELECT 1 as rowID,'1','en','A' UNION ALL

    SELECT 2,'1','en','A' UNION ALL

    SELECT 5,'1','en','C' UNION ALL

    SELECT 7,'1','en','A' UNION ALL

    SELECT 8,'1','zh','A' UNION ALL

    SELECT 9,'1','en','A' UNION ALL -- test

    SELECT 10,'1','en','A' UNION ALL

    SELECT 11,'1','zh','A' UNION ALL

    SELECT 20,'1','en','D' UNION ALL

    SELECT 21,'1','en','D'

    order by rowID

    end

    IF @example = 2-- A's only

    begin

    INSERT INTO #TempTable

    SELECT 1 as RowID,'1' as RowKey,'en' as Locale,'A' as Code UNION ALL

    SELECT 2,'1','zh','A' UNION ALL

    SELECT 3,'1','zh','A' UNION ALL

    SELECT 4,'2','en','A' UNION ALL

    SELECT 5,'2','en','A' UNION ALL

    SELECT 10,'1','zh','A' UNION ALL

    SELECT 11,'1','zh','A'

    order by rowID

    end

    IF @example = 3

    begin

    insert into #tempTable

    SELECT 1 as rowID,'1','en','A' UNION ALL

    SELECT 2,'2','en','A' UNION ALL

    SELECT 3,'3','en','A' UNION ALL

    SELECT 4,'1','en','A' UNION ALL

    SELECT 5,'2','en','A' UNION ALL

    SELECT 6,'2','en','A' UNION ALL

    SELECT 13,'3','en','A' UNION ALL

    SELECT 14,'1','en','A' UNION ALL

    SELECT 15,'2','en','A' UNION ALL

    SELECT 23,'3','en','A' UNION ALL

    SELECT 24,'1','en','A' UNION ALL

    --SELECT 30,'2','en','A' UNION ALL

    SELECT 31,'2','en','A'

    order by rowID

    end

    select * from #temptable

    ;with cte1 as -- identify breaks

    (select t1.RowID,t1.code

    from #temptable t1

    where t1.code <> (select top 1 code from #temptable where RowID < t1.rowID order by rowID desc)

    )

    ,cte1a as -- include first and last row for ranging purposes

    (select t1.RowID, t1.code

    from #temptable t1

    where RowID = (select min(RowID) from #temptable t2)

    union all

    select * from cte1

    union all

    select max(RowID)+1, '~'

    from #temptable t1

    )

    ,cte2 as -- identify ranges

    (select c1.code,c1.RowID as ARange,isnull(min(c1a.RowID),max(c1.RowID)) as maxRow

    from cte1a c1

    left join cte1a c1a on c1a.code <> c1.code and c1a.RowID > c1.RowID

    where c1.code = 'A'

    group by c1.code,c1.RowID

    )

    ,cte3 as -- associate individual rows with their ranges

    (select ARange,t.*

    from #tempTable t

    join cte2 c on t.rowID between ARange and MaxRow

    where t.code = 'A'

    )

    ,cte4 as -- identify "raw" duplicates within ranges

    (select c1.*,min(c2.rowID) as preMin,max(c2.rowID) as preMax

    from cte3 c1

    join cte3 c2 on c2.ARange = c1.ARange

    and c2.rowKey = c1.rowKey

    and c2.locale = c1.locale

    and c2.code = c1.code

    and c2.rowID < c1.rowID

    group by c1.ARange, c1.rowID,c1.rowKey,c1.locale,c1.code

    )

    -- populate worktable with all dups

    insert into @workTable

    select Arange, RowId,RowKey,Locale,Code,preMin,PreMax, null as validDup

    from cte4 c

    -- flag valid dups

    set @lastDup = 0

    update @worktable

    set @lastDup = lastDup = case when(preMax) >= @lastdup then rowID

    else @lastDup

    end

    select rowID,rowKey,locale,code

    from @worktable

    where rowID = lastDup

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Example 1 in your "compromise solution" provides a result set consisting of RowID's 2, 9, & 10. Isn't RowID 5 supposed to be included?

    Also, we STILL don't have an answer from Dan Segalles about exactly what real-world problem this query is supposed to solve.... Dan ???

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • No, if you look at the example 1 data in the compromise solution, 5 is code "C". Change it to "A" and it will show up.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Row 5 should be included, as it's a code change. He doesn't want *dupes* of C, but if it's a code change, it is still included.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • GIMME A BREAK HERE !! :w00t:

    I was focused on showing the compromise solution to the duplicates only, since no one had any problems identifying the breaks. How about this?

    Also, you guys missed that rows 7 and 20 were also breaks, so nyah.....:cool:

    ------------------------------------------------------------------------------------------------------------------

    declare @worktable table (ARange int, rowID int primary key, rowKey int, locale char(2), code char(1), preMin int, preMax int,lastDup int)

    declare @breaks table (RowID int primary key,RowKey int, Locale char(2), code char(1))

    declare @example int

    declare @lastDup int

    set @example = 1

    drop table #temptable

    CREATE TABLE #TempTable

    (RowID int primary key

    ,RowKey varchar(1000)

    ,Locale varchar(10)

    ,Code char(1)

    )

    IF @example = 1

    begin

    INSERT INTO #TempTable

    SELECT 1 as rowID,'1','en','A' UNION ALL

    SELECT 2,'1','en','A' UNION ALL

    SELECT 5,'1','en','C' UNION ALL

    SELECT 7,'1','en','A' UNION ALL

    SELECT 8,'1','zh','A' UNION ALL

    SELECT 9,'1','en','A' UNION ALL -- test

    SELECT 10,'1','en','A' UNION ALL

    SELECT 11,'1','zh','A' UNION ALL

    SELECT 20,'1','en','D' UNION ALL

    SELECT 21,'1','en','D'

    order by rowID

    end

    select * from #temptable

    insert into @breaks

    select t1.RowID,t1.rowKey,t1.locale,t1.code

    from #temptable t1

    where t1.code <> (select top 1 code from #temptable where RowID < t1.rowID order by rowID desc)

    ;with cte1a as -- include first and last row for ranging purposes

    (select t1.RowID, t1.code

    from #temptable t1

    where RowID = (select min(RowID) from #temptable t2)

    union all

    select rowId,code from @breaks

    union all

    select max(RowID)+1, '~'

    from #temptable t1

    )

    ,cte2 as -- identify ranges

    (select c1.code,c1.RowID as ARange,isnull(min(c1a.RowID),max(c1.RowID)) as maxRow

    from cte1a c1

    left join cte1a c1a on c1a.code <> c1.code and c1a.RowID > c1.RowID

    where c1.code = 'A'

    group by c1.code,c1.RowID

    )

    ,cte3 as -- associate individual rows with their ranges

    (select ARange,t.*

    from #tempTable t

    join cte2 c on t.rowID between ARange and MaxRow

    where t.code = 'A'

    )

    ,cte4 as -- identify "raw" duplicates within ranges

    (select c1.*,min(c2.rowID) as preMin,max(c2.rowID) as preMax

    from cte3 c1

    join cte3 c2 on c2.ARange = c1.ARange

    and c2.rowKey = c1.rowKey

    and c2.locale = c1.locale

    and c2.code = c1.code

    and c2.rowID < c1.rowID

    group by c1.ARange, c1.rowID,c1.rowKey,c1.locale,c1.code

    )

    -- populate worktable with all dups

    insert into @workTable

    select Arange, RowId,RowKey,Locale,Code,preMin,PreMax, null as validDup

    from cte4 c

    -- flag valid dups

    set @lastDup = 0

    update @worktable

    set @lastDup = lastDup = case when(preMax) >= @lastdup then rowID

    else @lastDup

    end

    select rowID,rowKey,locale,code

    from @worktable

    where rowID = lastDup

    union all

    select * from @breaks

    order by rowID

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • GIMME A BREAK HERE !!

    I was focused on showing the compromise solution to the duplicates only, since no one had any problems identifying the breaks. How about this?

    Also, you guys missed that rows 7 and 20 were also breaks, so nyah.....

    NO BREAKS FOR YOU!

    Heh, I didn't really look too much at it, I've been slammed today, this is one of the only threads I've even semi-checked, I just glanced at 5 when Steve pointed it out.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin (10/30/2008)

    NO BREAKS FOR YOU!

    Tough crowd today....

    Hey! I got the quote to work!

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Dan, any updates for us on the performance and integrity of any of the proposed solutions? I believe there are at least 3 right now (Mine, anzio's and bhovious'... I can't remember if Steve withdrew his or not) waiting on testing.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • I've been lurking on this topic, and I'd be interested in what the final outcome is also.

  • I'm still waiting for a complete and unambiguous description of the problem, as well as a good description of exactly what real-world problem the solution to the query problem is supposed to solve. I have yet to see either one. I've seen some things get close on the description, but in every case, details are either missing or are glossed over sufficiently to leave enough doubt and gaps to drive an 18-wheeler through. At this point, I suspect the OP has the solution only because one responder just happened to get it right, and is just no longer interested in this any more. Apparently, that's nothing new around here...

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Did you see my breakdown of the logic involved? I think it's pretty unambiguous, and covers all scenarios. The logic behind it is actually not as complicated as it initially seemed. Now, as to the second part... I haven't the slightest still ;).

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • How true....

  • Unfortunately, your example from several pages back conflicts with a statement by the OP that no range contains a duplicate, and your example starts out with one, unless he meant that a range has to start at row 2, or that such is an exception. Also, you did a great job describing the reason why any given row failed to qualify to be marked, but your description does NOT specify exactly how to determine that a given row is a duplicate - it says only go back to the last duplicate. Unfortunately, that leaves a host of options not covered, such as what if there is no previous duplicate in that set of A's? I might well be making this more complex than it really is, and now having reviewed your description, I think I actually know what the desired result is, but there are all kinds of problems one can run into that aren't explicitly covered. One needs to know how to handle each and every possibility.

    Just a thought, but why would a set of data that starts with rows 1 thru 3 as duplicates, having only rows 2 and 3 be marked, and yet, if row 4 were C and then rows 5 thru 7 were 'A' dupes, all 3 of those dupes would be marked? I'm still trying to concoct ANY kind of scenario where that kind of lack of consistency is an essential part of anything other than some bizarre logic puzzle. At this point, I can't even remember if you have to identify a break between a C and a D, and your example didn't specify on that either.

    Steve

    (aka smunson)

    :):):)

    Garadin (11/3/2008)


    Did you see my breakdown of the logic involved? I think it's pretty unambiguous, and covers all scenarios. The logic behind it is actually not as complicated as it initially seemed. Now, as to the second part... I haven't the slightest still ;).

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • smunson (11/3/2008)


    Unfortunately, your example from several pages back conflicts with a statement by the OP that no range contains a duplicate, and your example starts out with one, unless he meant that a range has to start at row 2, or that such is an exception. Also, you did a great job describing the reason why any given row failed to qualify to be marked, but your description does NOT specify exactly how to determine that a given row is a duplicate - it says only go back to the last duplicate. Unfortunately, that leaves a host of options not covered, such as what if there is no previous duplicate in that set of A's? I might well be making this more complex than it really is, and now having reviewed your description, I think I actually know what the desired result is, but there are all kinds of problems one can run into that aren't explicitly covered. One needs to know how to handle each and every possibility.

    Just a thought, but why would a set of data that starts with rows 1 thru 3 as duplicates, having only rows 2 and 3 be marked, and yet, if row 4 were C and then rows 5 thru 7 were 'A' dupes, all 3 of those dupes would be marked? I'm still trying to concoct ANY kind of scenario where that kind of lack of consistency is an essential part of anything other than some bizarre logic puzzle. At this point, I can't even remember if you have to identify a break between a C and a D, and your example didn't specify on that either.

    Steve

    (aka smunson)

    :):):)

    Garadin (11/3/2008)


    Did you see my breakdown of the logic involved? I think it's pretty unambiguous, and covers all scenarios. The logic behind it is actually not as complicated as it initially seemed. Now, as to the second part... I haven't the slightest still ;).

    While I think I could answer all this and continue to explain it, it's all somewhat of an exercise in futility as I'm not even sure the OP is still with us and I think the bigger issue you want to comprehend is how this could be useful in any practical application... which all my explanations of logic would nothing to clarify. :hehe:

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Seth,

    I'm glad you came to that conclusion. I gave up my attempt long ago because it was becoming abundantly clear that the OP was just not going to "give it up" as to what this was all about, and he also just kept trying to explain without ever really being clear about what the final objective was (pardon my politics, but that reminds me of a certain pair of prominent liberal democrats). Given he needed to change his desired result several times, it just seemed that we were doing all the work and he was doing the absolute minimum necessary to get help. I try not to get too involved in that kind of effort.

    Steve

    (aka smunson)

    :):):)

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

Viewing 15 posts - 91 through 105 (of 115 total)

You must be logged in to reply to this topic. Login to reply