Need help on performance and integrity

  • smunson (11/3/2008)



    Just a thought, but why would a set of data that starts with rows 1 thru 3 as duplicates, having only rows 2 and 3 be marked, and yet, if row 4 were C and then rows 5 thru 7 were 'A' dupes, all 3 of those dupes would be marked? I'm still trying to concoct ANY kind of scenario where that kind of lack of consistency is an essential part of anything other than some bizarre logic puzzle. At this point, I can't even remember if you have to identify a break between a C and a D, and your example didn't specify on that either.

    Steve, 5 would be marked as a break from C back to A, not as a dup.

    I was surprised to see the sudden burst of traffic on this. It's very tempting to unsubscribe here, but I'm going to stick around a while longer hoping to hear solid confirmation that one of the solutions was correct. I don't think we'll ever get a solid explanation of the "why" behind this pretzel logic. However, we could solve the issue of performance among using the million-row generation.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • Heh... I wanna know who ever said that "set based" meant "everything done in one query... 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • HI.

    I too, would Love to know why you would do this, but I had more fun writing the solution that performs reasonably well on a million rows. Maybe it is just a "problem" with sites like these....there are people on here because they SQL is the bigger part of their daily jobs and then...

    there are those who have a problem.

    I get the impression the OP was using non-DB logic to solve a problem and got stuck in that, and therefore did not know how to explain the overall problem, just the step he got stuck at.

    my 0.02 ZAR

  • HI.

    I too, would Love to know why you would do this, but I had more fun writing the solution that performs reasonably well on a million rows. Maybe it is just a "problem" with sites like these....there are people on here because they SQL is the bigger part of their daily jobs and then...

    there are those who have a problem.

    I get the impression the OP was using non-DB logic to solve a problem and got stuck in that, and therefore did not know how to explain the overall problem, just the step he got stuck at.

    BTW Jeff, where you able to use the random data generator?

    my 0.02 ZAR

  • AnzioBake (11/3/2008)


    BTW Jeff, where you able to use the random data generator?

    Yep... made a couple of changes to it to make it run just a little faster... give it a try...

    SELECT TOP 1000000

    IDENTITY(INT,1,1) AS RowID,

    ABS(CHECKSUM(NEWID()))%2+1 AS RowKey,

    CASE WHEN ABS(CHECKSUM(NEWID()))%30 < 16 THEN 'EN'

    WHEN ABS(CHECKSUM(NEWID()))%30 > 24 THEN 'ZH'

    ELSE 'DE'

    END AS Locale,

    CASE WHEN ABS(CHECKSUM(NEWID()))%10 < 7 THEN 'A'

    ELSE 'C'

    END AS Code

    INTO dbo.SSCData

    FROM Master.dbo.SysComments sc1

    CROSS JOIN Master.dbo.SysComments sc2

    Sorry I haven't spent much time on this post, yet... I'll be back. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Nice....That is an interesting way to get a random number i.e. newID combined with checksum()....

    I like it

  • You can thank Matt Miller for that little gem. I used to use RAND(NEWID), but CHECKSUM is quite a bit faster. Either way, don't need a loop because NEWID does provide a random number. You just use CHECKSUM or RAND to do a conversion. CHECKSUM just happens to return an INT which makes it scream.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Just beware that CHECKSUM can return a negative value.

    Use ABS(CHECKSUM(NEWID())) to get positive random values.


    N 56°04'39.16"
    E 12°55'05.25"

  • Heh... thanks, Peter... I put that in my code but forgot to mention it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • If anyone does do any testing of these methods, please post the results to this thread. I'm curious as to the outcome of this. Otherwise, I'll try to get something together this weekend.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Hey Seth,

    Work got busy so I had to leave this behind. Please let us know if you pursue it. I don't expect my cte-on-cte solution to be the winner, but the update approach should run quicker than the while loop.

    Let me ask a question about set-based solutions based on a comment Jeff made earlier. Although doing everything in one query may not be the fastest solution to a problem, isn't it theoretically possible to use a single query (with subqueries and/or ctes) to solve a problem that has a set-based solution?

    Bob

    P.S. Thanks to all for ABS(CHECKSUM(NEWID())). That one went immediately to the utility belt.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

Viewing 11 posts - 106 through 115 (of 115 total)

You must be logged in to reply to this topic. Login to reply