Finding Sequences (Need help ASAP)

  • Craig Farrell (10/6/2010)


    WayneS (10/6/2010)


    craig-404139 (10/6/2010)


    Question: when a prefix changes, do the numbers restart at 1? Or do they continue?

    (I've got some real nifty, high-speed code (thanks to Jeff!) to quickly identify gaps - but it's designed to work on numbers. Will take some mods to work with the prefix, but should be simple.)

    Hey Wayne, if you can modify to match the Pattern/NumPiece code above I'd be curious to see that result. The Pattern splitter is pretty zippy and will return a group/numeric resultset.

    Okay... enough teasing. But, considering that the OP mentioned that all codes for the same prefix will be the same length, it would be a better test to follow that rule.

    if object_id('tempdb..#GapTest') IS NOT NULL DROP TABLE #GapTest;

    -- make a temp table so that we can use an index...

    CREATE TABLE #GapTest (

    Prefix varchar(20),

    Suffix int,

    PRIMARY KEY CLUSTERED(prefix,suffix));

    -- put the data into the temp table, separating the prefix and suffix

    INSERT INTO #GapTest

    SELECT Prefix = left(TextID, PatIndex('%[0-9]%', TextID)-1),

    Suffix = convert(int, SubString(TextID, PatIndex('%[0-9]%', TextID), 20))

    FROM #TestHarness;

    -- get the results

    SELECT Prefix,

    GapStart = (SELECT ISNULL(MAX(lo.Suffix),0)+1

    FROM #GapTest lo

    WHERE lo.Prefix = hi.Prefix

    AND lo.Suffix < hi.Suffix),

    GapEnd = hi.Suffix - 1

    FROM #GapTest hi

    WHERE hi.Suffix NOT IN (SELECT Suffix + 1

    FROM #GapTest g1

    WHERE g1.Prefix = hi.Prefix);

    @Craig Ferrell.... 2.4 seconds!

    @jeff - when you read this... THANKS!

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • I have tweaked Craig's test data generator so that the number of digits is fixed for a given prefix. In my tests below, generated just over 2 million rows. Note that I've added a primary key index to the TextID column (which I believe is the case with the actual data).

    CREATE TABLE #TestHarness (

    TextID VARCHAR(200) NOT NULL PRIMARY KEY CLUSTERED

    )

    --TRUNCATE TABLE #TestHarness

    INSERT INTO #TestHarness(TextID)

    SELECT PREFIX.Code + REPLACE(STR(T.N, PREFIX.Digits), ' ', '0')

    FROM Tally T

    CROSS JOIN (

    SELECT REPLICATE(CHAR(64 + T2.N), T1.N), (T1.N + T2.N) % 4 + 6

    FROM Tally T1 CROSS JOIN Tally T2

    WHERE T1.N BETWEEN 1 AND 6

    AND T2.N BETWEEN 1 AND 10

    ) PREFIX(Code, Digits)

    WHERE (T.N < 50000)

    AND NOT (T.N % 5 = 0 OR T.N % 7 = 0)

    Using this test data, I compared my query below with Wayne's query. I realise that the two queries are not strictly comparable since mine returns contiguous sequences, whereas Wayne's return gaps, but the OP seemed to be interested in both.

    ;WITH cteParse(prefix, num, rn) AS (

    SELECT

    SUBSTRING(TextID, 1, PATINDEX('%[0-9]%', TextID) - 1),

    CAST(SUBSTRING(TextID, PATINDEX('%[0-9]%', TextID), 10) AS int),

    ROW_NUMBER() OVER (ORDER BY TextID)

    FROM #TestHarness

    )

    SELECT

    prefix, MIN(num), MAX(num)

    FROM cteParse

    GROUP BY prefix, num - rn

    ORDER BY prefix, num - rn

    On my test server, my query completed in 11 seconds, whereas Wayne's query completed in about 22 seconds (14 seconds to build temporary table, 8 seconds for the final query). [EDIT: With about 260,000 rows of test data, the execution times are similar for the two queries on my server: about 3 seconds.]

    Below is a variation of the above query that returns the start and end IDs of contiguous sequences, and is almost as fast as the query above.

    ;WITH cteParse(id, prefix, num, ndig, rn) AS (

    SELECT

    TextID,

    SUBSTRING(TextID, 1, PATINDEX('%[0-9]%', TextID) - 1),

    CAST(SUBSTRING(TextID, PATINDEX('%[0-9]%', TextID), 10) AS int),

    LEN(TextID) - PATINDEX('%[0-9]%', TextID) + 1,

    ROW_NUMBER() OVER (ORDER BY TextID)

    FROM #TestHarness

    )

    SELECT

    prefix + RIGHT(REPLICATE('0', ndig) + CAST(MIN(num) AS varchar(10)), ndig) AS [From],

    CASE WHEN (COUNT(*) > 1) THEN

    prefix + RIGHT(REPLICATE('0', ndig) + CAST(MAX(num) AS varchar(10)), ndig)

    ELSE '' END AS [To]

    FROM cteParse

    GROUP BY prefix, num - rn, ndig

    ORDER BY prefix, num - rn

  • Andrew - I'll test out your modifications tonight. Interesting the large difference in time between our systems.

    Thanks for making the change to use the same # of characters for a given prefix.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • So... without me running all the code presented so far, what has been decided? Does the numbering restart for each prefix or not?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (10/7/2010)


    So... without me running all the code presented so far, what has been decided? Does the numbering restart for each prefix or not?

    Haven't heard back from the OP yet on the restarting.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (10/7/2010)


    Jeff Moden (10/7/2010)


    So... without me running all the code presented so far, what has been decided? Does the numbering restart for each prefix or not?

    k from the OP yet on the restarting.

    It does not. each prefix identifies a unique 'set' of received data. They have no relation to each other , other than they are documents for the same case.

    Does that answer your question?

  • Thanks for the help guys, but I am 100% lost AND my requirements just changed. FML. I will update soon with new test data (if I don't pill my hair out first)

  • Ok. Because of this time crunch and my frustration with this, let's try this a different way. I need to compare the controlnumber to the endno and find the sequences. (Realize I didn't have correct information in my original post, because I didn't know at that time)

    The controlnumber and endno are dealing with ranges of documents. So for example line one

    PPCI000000040 is the starting number and PPCI000000044 is the ending number. Meaning that is a 4 page document. PPCI000000045 is starting and PPCI000000048 is ending, meaning that is a 3 page doc. Etc...Since we are loooking for missing documents I need to compare the endno to the controlnumber and return the range if the last endno is not one less than the next control number.

    Ideally I need to see something like this:

    PPCI000000040 - PPCI000000053

    PPCI000000057 - PPCI000000102

    PPCI000000115 - PPCI000000197

    etc...

    Let's forget about the variable range text prefix for now and just work with this data (I'm getting wayyyy to confused with the previous code) If possible please write this to apply to my sample dataset.

    Please let me know if you need clarification, I tried to explain it the best way possible and provide enough dummy data.

    --Create test table

    create table gaps2

    (controlnumber varchar(255)

    ,endno varchar(255))

    Code removed per request.

  • craig-404139 (10/6/2010)


    Please skip to post Post #1000624 for updated requirements.

    It's actually post # 1000643. Also, if you click the post number, you will get a URL that you can then paste in so that it's clickable. Just be sure to enclose in in [ url ] and [ / url ] tags. (remove all spaces between the brackets!) Like this: http://www.sqlservercentral.com/Forums/FindPost1000643.aspx. Just quote my message to see how it's done.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (10/7/2010)


    craig-404139 (10/6/2010)


    Please skip to post Post #1000624 for updated requirements.

    It's actually post # 1000643. Also, if you click the post number, you will get a URL that you can then paste in so that it's clickable. Just be sure to enclose in in [ url ] and [ / url ] tags. (remove all spaces between the brackets!) Like this: http://www.sqlservercentral.com/Forums/FindPost1000643.aspx. Just quote my message to see how it's done.

    Thanks for the info. I edited my original post. This just isn't my morning.

  • Here's some code.

    Note that I kept in the prefix checking... I need to strip it out anyway, so I just kept it in.

    -- need to add some columns to the table.

    ALTER TABLE gaps2 ADD Prefix varchar(255),

    GroupNo int;

    -- change the controlnumber to not null and put a primary key on it.

    ALTER TABLE gaps2 ALTER COLUMN controlnumber varchar(255) NOT NULL;

    ALTER TABLE gaps2 ADD PRIMARY KEY CLUSTERED (controlnumber);

    -- declare and initialize some variables for the update statement.

    declare @Prefix varchar(255),

    @GroupNo int,

    @ControlNumber varchar(255),

    @LastNumber int;

    set @GroupNo = 0;

    set @Prefix = '';

    set @ControlNumber = '';

    set @LastNumber = 0;

    -- This form of the UPDATE statement has some particular rules.

    -- See Jeff Moden's article at http://www.sqlservercentral.com/articles/T-SQL/68467/

    -- for a complete discussion of how this works, and all of the rules for utilizing it.

    UPDATE t1

    SET @GroupNo = GroupNo = CASE WHEN @Prefix <> LEFT(controlnumber, PATINDEX('%[0-9]%', controlnumber)-1) THEN 1

    -- new prefix - start over

    WHEN CONVERT(INT, SUBSTRING(controlnumber, PATINDEX('%[0-9]%', controlnumber), 255)) = @LastNumber + 1 THEN @GroupNo

    -- continues the chain - use same group number

    ELSE @GroupNo + 1 -- increment the group number

    END,

    @Prefix = Prefix = LEFT(controlnumber, PATINDEX('%[0-9]%', controlnumber)-1), -- store the prefix so we can use it later

    @LastNumber = CONVERT(INT, SUBSTRING(endno, PATINDEX('%[0-9]%', endno), 255)), -- get what the last number is, to compare to the next number

    @ControlNumber = controlnumber -- anchor column

    FROM gaps2 t1 WITH (TABLOCKX) -- prevent others from using the table

    OPTION (MAXDOP 1); -- prevent parallelism

    SELECT Prefix, GroupNo, MIN(controlnumber), max(endno)

    FROM gaps2

    GROUP BY Prefix, GroupNo

    ORDER BY Prefix, GroupNo;

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • Wow, impressive. Can you explain what the code is doing? I've never seen things like :

    @GroupNo = GroupNo = case....

    @Prefix = Prefix = LEFT.....

    Looks like you are setting the variable, but why the extra words?

    Also the update statement is just updating variables, correct? No actual table data? I'll review the link you posted as well.

    I'll need to remove the table lock and option(maxdrop 1) options, is that going to create a problem?

    This is a huge help man, I appreciate it.

    ****edit***

    I see you added columns to the table, Can this be done without adding columns? I can't modify my production table(s). Also explains why the are like that. It's inserting into the newly created table. Can I just remove that bit of coding?

    @GroupNo = GroupNo = case....

    @Prefix = Prefix = LEF

  • craig-404139 (10/7/2010)


    Wow, impressive. Can you explain what the code is doing? I've never seen things like :

    @GroupNo = GroupNo = case....

    @Prefix = Prefix = LEFT.....

    Looks like you are setting the variable, but why the extra words?

    Also the update statement is just updating variables, correct? No actual table data? I'll review the link you posted as well.

    I'll need to remove the table lock and option(maxdrop 1) options, is that going to create a problem?

    This is a huge help man, I appreciate it.

    Well, like I commented in the code:

    -- This form of the UPDATE statement has some particular rules.

    -- See Jeff Moden's article at http://www.sqlservercentral.com/articles/T-SQL/68467/

    -- for a complete discussion of how this works, and all of the rules for utilizing it.

    If you remove the tablockx, option (maxdop 1), the CLUSTERED index or the anchor column, then DON'T USE THIS!!!!!!! Those are REQUIRED for this to work properly. Definitely review that link.

    The update statement is updating BOTH variables, and table data. It's doing the update in the order of the clustered index. By updating a value into a variable, I can utilize the variable in the next row.

    FYI, this has two nicknames: "quirky update" and "pseudo cursor". Once you understand what is going on, you'll understand both of these.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • craig-404139 (10/7/2010)


    I see you added columns to the table, Can this be done without adding columns? I can't modify my production table(s). Also explains why the are like that. It's inserting into the newly created table. Can I just remove that bit of coding?

    @GroupNo = GroupNo = case....

    @Prefix = Prefix = LEF

    Then just select the data into a #temp table, i.e. #gaps2, and change all the references of gaps2 to #gaps2.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • WayneS (10/7/2010)


    craig-404139 (10/7/2010)


    I see you added columns to the table, Can this be done without adding columns? I can't modify my production table(s). Also explains why the are like that. It's inserting into the newly created table. Can I just remove that bit of coding?

    @GroupNo = GroupNo = case....

    @Prefix = Prefix = LEF

    Then just select the data into a #temp table, i.e. #gaps2, and change all the references of gaps2 to #gaps2.

    Will do. I am playing with it all now. Also reading that article you linked.

Viewing 15 posts - 16 through 30 (of 91 total)

You must be logged in to reply to this topic. Login to reply