Finding Sequences (Need help ASAP)

  • Paul White NZ (10/13/2010)


    Craig Farrell (10/8/2010)


    Craig,

    You can't use joins in a Quirky Update.

    Paul

    Can't, or shouldn't? I did get the correct results here when I tested the code, with no errors. What's the possible pitfalls of using the join, besides having clustered index sorting problems?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Sorry Paul, that was a bit abrupt and you deserve better than that, your explainations are always concise and instructive.

    I would agree in general that quirky update should not be joined under normal circumstances, but in this case it's a self join with only a single clustered index in control and the second join is a scan by nature. By using the self-join against a completely controlled object, the #tmp, the ordering doesn't falter as you might expect from other methods.

    I've played with this technique for creating data islands (to steal Jeff's term, it's better then mine) in 2k and 2k5 (I admit to having little practical experience in 2k8 besides goofing off at home), and have never had an issue as long as I control the entire table environment locally. If I needed to do more selective row computations, or external tables, I would usually include more columns to the #tmp to pre-select the data before igniting the quirky update.

    Now, I haven't done the deep root level analysis this would deserve as a dissertation, but my understanding is that the scan, and thus the merge join off the clustered scan, will not modify the ordering because the clustered scan from the primary input won't alter. However, if I had left in a second index with a flipped ordering and it decided to use that instead, I'd have been in trouble.

    But, Jeff's come up with something I haven't thought of before and haven't had time to puzzle through and grok it, and it seems more efficient for creating data islands. I may have to switch my methodology. 🙂


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (10/14/2010)


    Sorry Paul, that was a bit abrupt and you deserve better than that, your explainations are always concise and instructive.

    No need to apologise. My reply was shorter than usual because I was pressed for time and just scanned this thread while responding to another of Craig's threads.

    The Quirky Update has a number of rules (as you know) and one of the main ones is that there cannot be a join operation. It is frequently possible to achieve the desired 'running total' results while breaking one or more QU 'rules', but I think it puts Jeff in an awkward position: Someone else may see your code, adapt it for their own use, and then blame the QU when it all blows up horribly next week.

    I think it's only fair that if one uses the QU method, one should stick to the rules. If you do stray from those, or otherwise modify the method, it might be best not to call it Quirky Update, or at least put a very clear disclaimer on it.

    My personal preference is not to use QU, but if I did, I would use the 'safety mechanism' I suggested (see the discussion on Jeff's QU article) to ensure that if the QU assumptions do not hold, a runtime error occurs and no data is changed.

    Paul

  • Paul White NZ (10/14/2010)


    It is frequently possible to achieve the desired 'running total' results while breaking one or more QU 'rules', but I think it puts Jeff in an awkward position: Someone else may see your code, adapt it for their own use, and then blame the QU when it all blows up horribly next week.

    I think it's only fair that if one uses the QU method, one should stick to the rules. If you do stray from those, or otherwise modify the method, it might be best not to call it Quirky Update, or at least put a very clear disclaimer on it.

    Part of the issue, I guess, is that I don't see the QU as being Jeff's method, as I was using it long before I ever saw his article, so I'm not sure how that puts Jeff in an awkward position. Mind, I didn't know it under the name "Quirky Update", and that's a simple thing to fix by not using that descriptive name again so Jeff doesn't get associated with some of my wilder, not-so-safe-if-not-done-just-so, coding mechanics. 🙂

    I still do a lot of coding back against 2k, due to politics, cheap managers, and any other number of reasons, so I still use the serial update (Another name I know it as. :w00t: ) a lot when there's no other easy outs besides while or cursor loops.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (10/14/2010)


    Part of the issue, I guess, is that I don't see the QU as being Jeff's method, as I was using it long before I ever saw his article, so I'm not sure how that puts Jeff in an awkward position. Mind, I didn't know it under the name "Quirky Update", and that's a simple thing to fix by not using that descriptive name again so Jeff doesn't get associated with some of my wilder, not-so-safe-if-not-done-just-so, coding mechanics. 🙂

    Ah well that's different then - sorry, I didn't know the history there.

    I still do a lot of coding back against 2k, due to politics, cheap managers, and any other number of reasons, so I still use the serial update (Another name I know it as. :w00t: ) a lot when there's no other easy outs besides while or cursor loops.

    I see. Yes SQL Server 2000 is a bit tricky - no SQLCLR (my favourite running-totals tool) and Hugo Kornelis' fine Set-Based Iteration method uses a ranking function, so that's out too.

    For 2005 onward, consider adding the 'safety code' I mentioned - you can find it here: http://www.sqlservercentral.com/Forums/FindPost980118.aspx

    One more interesting thing about the 'serial update': if you do a loops join to the inserted or deleted pseudo-tables (inside a trigger, obviously) you'll find the running-total runs exactly backwards. There are other instances like this where side-effects of engine implementation details can trip the method up when a join is involved. Just FYI.

  • Paul White NZ (10/14/2010)


    For 2005 onward, consider adding the 'safety code' I mentioned - you can find it here: http://www.sqlservercentral.com/Forums/FindPost980118.aspx

    Well, far be it from me to turn down a damned good idea. 🙂 I'll do my best to incorporate that going forward. It's a nice safety test to make sure nothing goes sideways.

    One more interesting thing about the 'serial update': if you do a loops join to the inserted or deleted pseudo-tables (inside a trigger, obviously) you'll find the running-total runs exactly backwards. There are other instances like this where side-effects of engine implementation details can trip the method up when a join is involved. Just FYI.

    I've never actually needed to do that in a trigger, so I've never ran into that. That's... odd. Do you know if it still occurs if you force a merge join instead between the inserted/deleted, or is that only when you're running against exterior tables to the trigger?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (10/14/2010)


    I've never actually needed to do that in a trigger, so I've never ran into that. That's... odd. Do you know if it still occurs if you force a merge join instead between the inserted/deleted, or is that only when you're running against exterior tables to the trigger?

    As I recall, it was a join to the 'external' table. It was a while ago so I can't pretend to remember all the details.

    BTW I'm finishing up a SQLCLR solution to the problem on this thread (original and Jeff's test rig), which is astonishingly fast. I'll post it up later today for anyone that's interested.

  • Your range is static

    means %%%%%%%%%%0001-%%%%%%%%%%003

    or it will change on some condition

  • rishabh.upkr (10/14/2010)


    Your range is static

    means %%%%%%%%%%0001-%%%%%%%%%%003

    or it will change on some condition

    Can you explain what your talking about just a bit more?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Craig Farrell (10/14/2010)


    Paul White NZ (10/14/2010)


    For 2005 onward, consider adding the 'safety code' I mentioned - you can find it here: http://www.sqlservercentral.com/Forums/FindPost980118.aspx

    Well, far be it from me to turn down a damned good idea. 🙂 I'll do my best to incorporate that going forward. It's a nice safety test to make sure nothing goes sideways.

    It's not only a damned fine safety test that's more effecient than any other way of checking that I've seen, it also forces runs that may otherwise fail into running because of the requirements it places on the clustered index. It's a bit like a super index hint without the slowdown of using one of the two clustered index hints (Index 0 or 1).

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Craig Farrell (10/14/2010)


    Paul White NZ (10/14/2010)


    It is frequently possible to achieve the desired 'running total' results while breaking one or more QU 'rules', but I think it puts Jeff in an awkward position: Someone else may see your code, adapt it for their own use, and then blame the QU when it all blows up horribly next week.

    I think it's only fair that if one uses the QU method, one should stick to the rules. If you do stray from those, or otherwise modify the method, it might be best not to call it Quirky Update, or at least put a very clear disclaimer on it.

    Part of the issue, I guess, is that I don't see the QU as being Jeff's method, as I was using it long before I ever saw his article, so I'm not sure how that puts Jeff in an awkward position. Mind, I didn't know it under the name "Quirky Update", and that's a simple thing to fix by not using that descriptive name again so Jeff doesn't get associated with some of my wilder, not-so-safe-if-not-done-just-so, coding mechanics. 🙂

    I still do a lot of coding back against 2k, due to politics, cheap managers, and any other number of reasons, so I still use the serial update (Another name I know it as. :w00t: ) a lot when there's no other easy outs besides while or cursor loops.

    Heh, heh.... Man, I love it and I'm right there with you. Like I said in the article, the QU isn't my idea. People have been using it in Sybase since before SQL Server was a gleam in MS's eye. I just ended up writing about it in a very pro QU way and that's why I take a lot of heat about it.

    Can I take it that some of your "wilder, not-so-safe-if-not-done-just-so, coding mechanics" still hasn't caused you a problem with the QU? It's amazing how well things work when you're trying to make something work instead of writing code to make something break. 😀

    Mind, I didn't know it under the name "Quirky Update"...

    Me either. I didn't coin the phrase "Quirky Update". Robyn Page did that on the "Simple Talk" forum. I used to just call it a "running total update" but it confused people when I used it to describe a process that wasn't actually a running total so I started using the term "Quirky Update".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • CELKO (10/14/2010)


    Jeff Moden (10/13/2010)


    Very cool, Joe. I'm thinking that you didn't test it...

    [font="Courier New"]Msg 207, Level 16, State 1, Line 18

    Invalid column name 'part_id'.

    Msg 245, Level 16, State 1, Line 2

    Conversion failed when converting the varchar value 'NULL' to data type int. [/font]

    ... but it runs quite fast. The other problem is that your code misses the 50,000 to 60,000 single entry because you only check the first column of the data, but I think there might be a workaround for that.

    Here is my sample data set. It runs fine for me:

    CREATE TABLE Gaps (part_id VARCHAR(20));

    INSERT INTO Gaps

    VALUES

    ('PPCI000000001' ),

    ('PPCI000000002' ),

    ('PPCI000000003' ),

    ('PPCI000000005' ),

    ('PPCI000000010' ),

    ('PPCI000000011' ),

    ('PPCI000000012' ),

    ('PPCI000000013' ),

    ('PPCI000000014' ),

    ('PPCI000000015' ),

    ('PPCI000000016' ),

    ('PPCI000000017' ),

    ('PPCI000000100' ),

    ('PPCI000000101' ),

    ('PPCI000000102' ),

    ('PPCI000000103' ),

    ('PPCI000000104' ),

    ('PPCI000000105' ),

    ('PPCI000000106' ),

    ('PPCI000000167' ),

    ('PPCI000000197'),

    ('XXX000000001' ),

    ('XXX000000002' ),

    ('XXX000000003' ),

    ('XXX000000005' ),

    ('XXX000000010' ),

    ('XXX000000011' ),

    ('XXX000000012' ),

    ('XXX000000013' ),

    ('XXX000000014' ),

    ('XXX000000015' ),

    ('XXX000000016' ),

    ('XXX000000017' ),

    ('XXX000000100' ),

    ('XXX000000101' ),

    ('XXX000000102' ),

    ('XXX000000103' ),

    ('XXX000000104' ),

    ('XXX000000105' ),

    ('XXX000000106' ),

    ('XXX000000167' ),

    ('XXX000000197');

    WITH

    Splits

    AS

    (SELECT part_id, -- nest REPLACEs inside each other

    (REPLACE(REPLACE(REPLACE(

    REPLACE(REPLACE(REPLACE(REPLACE(

    REPLACE(REPLACE(

    REPLACE (part_id,'0',''),

    '1',''),

    '2',''),'3',''),'4',''),'5',''),

    '6',''),'7',''),'8',''),'9','')) AS alpha_prefix,

    CAST (REPLACE (REPLACE (part_id, 'XXX',''), 'PPCI', '') AS INTEGER) AS num_postfix

    FROM Gaps),

    Sequenced

    AS

    (SELECT part_id, alpha_prefix, num_postfix,

    (num_postfix

    - ROW_NUMBER() OVER (PARTITION BY alpha_prefix ORDER BY num_postfix)

    ) AS seq_in_part_grp

    FROM Splits)

    SELECT alpha_prefix, MIN(num_postfix) AS run_start, MAX(num_postfix) AS run_end

    FROM Sequenced

    GROUP BY alpha_prefix, seq_in_part_grp

    ORDER BY alpha_prefix, seq_in_part_grp;

    Yep.... runs fine if you have the right column names. The original test data folks provided didn't use the same column names you did and you didn't provide any test data until now.

    You're example also doesn't solve the original problem although I think the way it does solve the problem it solves is very, very cool.

    Thanks for the feedback, Joe.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The SQLCLR test results are in:

    Craig's test data (267,150 rows): 360ms

    Jeff's test data (889,892 rows): 1386ms

    Test rig and csv data for both runs attached, along with C# source code.

    This SQLCLR solution uses Adam Machanic's Query Parallelizer engine, which you can find here: http://sqlblog.com/files/folders/beta/entry29021.aspx

    The solution is 2008-only in this version.

    Paul

  • Paul White NZ (10/17/2010)


    The SQLCLR test results are in:

    Craig's test data (267,150 rows): 360ms

    Jeff's test data (889,892 rows): 1386ms

    Test rig and csv data for both runs attached, along with C# source code.

    This SQLCLR solution uses Adam Machanic's Query Parallelizer engine, which you can find here: http://sqlblog.com/files/folders/beta/entry29021.aspx

    The solution is 2008-only in this version.

    Paul

    Paul, you must be slipping... those times seem a bit slow for your clr code.:-D:w00t::-P

    What makes this CLR code 2008 specific?

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (10/17/2010)


    Paul, you must be slipping... those times seem a bit slow for your clr code...

    To put the timings in context, the T-SQL solutions ran for about 30-45 seconds on the same hardware. The major part of the time taken by the SQLCLR code is actually taken by T-SQL: reading the data from the source tables 😛

    WayneS (10/17/2010)


    What makes this CLR code 2008 specific?

    It's a side effect of my decision to code the test rig to run in tempdb. If anyone has a burning need to try it out on 2005, please let me know, and I'll see what I can do. The client assembly will also need to be catalogued as UNSAFE for 2005 though, which isn't ideal, but hey who's still using 2005 anyway :laugh:

Viewing 15 posts - 76 through 90 (of 91 total)

You must be logged in to reply to this topic. Login to reply