Finding Sequences (Need help ASAP)

  • Craig Farrell (10/8/2010)


    I had assumed, since 2k5 and 2k don't behave, but figured it was worth the question. I'm sure Wayne and the others will be able to get you a direct solution off the actual data. I'll just step back and watch the show unless we can do some form of CSV importing, but I'm not sure how much bandwidth you have to try to get that exported.

    lemme see if I can put something together. The more help the better. I am completely lost/frustrated with this whole thing.

  • CSV file attached.

    First column = id

    Second column = controlnumber

    Third column = endno

  • craig-404139 (10/8/2010)


    CSV file attached.

    First column = id

    Second column = controlnumber

    Third column = endno

    That worked. Alright, so we're looking at taking two unique values, figuring out the 'fill', and splitting off the pattern.

    @Wayne: You're right. With no numerics ever in the alpha code, yours runs faster. Against this test data my method is 7 seconds, yours is 5. Mind that's with a data return and using select vs. select. I like to keep things as easy as possible.

    @Craig-4: I should have a solution soonish. I'm getting null results from endno for some reason and I'm trying to track down the data burp.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (10/8/2010)


    craig-404139 (10/8/2010)


    CSV file attached.

    First column = id

    Second column = controlnumber

    Third column = endno

    That worked. Alright, so we're looking at taking two unique values, figuring out the 'fill', and splitting off the pattern.

    @Wayne: You're right. With no numerics ever in the alpha code, yours runs faster. Against this test data my method is 7 seconds, yours is 5. Mind that's with a data return and using select vs. select. I like to keep things as easy as possible.

    @Craig-4: I should have a solution soonish. I'm getting null results from endno for some reason and I'm trying to track down the data burp.

    I know about the 13 null rows. Sorry I forgot to remove them. You can discard those rows. It's an error in how I put this together.

  • Found them, the sample rows:

    2048205PPCI000737005NULL

    2048206PPCI000737009NULL

    2048207PPCI000999798NULL

    2048208PPCI001346238NULL

    2048209PPCI001350958NULL

    2048210PPCI001462905NULL

    2048211PPCI001620877NULL

    2048212PPCI001620882NULL

    2048213PPCI001620891NULL

    2048214PPCI001620899NULL

    2048215PPCI001620909NULL

    2048216PPCI001620957NULL

    2048217PPCI001620962NULL

    They appear to be single page docs. Is this expected in the data?


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (10/8/2010)


    Found them, the sample rows:

    2048205PPCI000737005NULL

    2048206PPCI000737009NULL

    2048207PPCI000999798NULL

    2048208PPCI001346238NULL

    2048209PPCI001350958NULL

    2048210PPCI001462905NULL

    2048211PPCI001620877NULL

    2048212PPCI001620882NULL

    2048213PPCI001620891NULL

    2048214PPCI001620899NULL

    2048215PPCI001620909NULL

    2048216PPCI001620957NULL

    2048217PPCI001620962NULL

    They appear to be single page docs. Is this expected in the data?

    Technically yes. But you can discard those rows or update the control = endno. Either way.

  • I think i'm on to something. The below code works if I strip out the PPCI prefix. (Used a update/replace statement on the test table) Runs in a few seconds. Now to modify it to work with the prefixes.....

    select d.controlnumber as start

    ,(select min(a.endno) as id

    from document as a

    left outer join document as b

    on a.endno = b.controlnumber - 1

    where b.controlnumber is null

    and a.endno >= d.endno

    ) as ending

    from document as d

    left outer join document as d1

    on d1.endno = d.controlnumber - 1

    where d1.endno is null

    order by d.controlnumber

    These are my initial results with the prefixes stripped out.

    000000001 - 000737004

    000737006- 000737008

    000737010- 000999797

    000999799- 001346237

    001346239- 001350957

    001350959- 001462904

    001462906- 001620876

    001620878- 001620881

    001620883- 001620890

    001620892- 001620898

    001620900- 001620908

    001620910- 001620956

    001620958- 001620961

    001620963- 002903970

  • This will sound strange but... are you sure you have gaps?

    Run this code:

    CREATE TABLE #MidPoint

    (RowID INT,

    Pattern VARCHAR(100),

    Startnum BIGINT,

    EndNum BIGINT

    )

    INSERT INTO #Midpoint

    select

    ROW_NUMBER() OVER (ORDER BY

    left(controlnumber, PatIndex('%[0-9]%', controlnumber)-1),

    convert(int, SubString(controlnumber, PatIndex('%[0-9]%', controlnumber), 20))

    ) AS RowNum,

    left(controlnumber, PatIndex('%[0-9]%', controlnumber)-1) AS Pattern,

    convert(int, SubString(controlnumber, PatIndex('%[0-9]%', controlnumber), 20)) AS StartNum

    ,ISNULL( convert(int, SubString(endno, PatIndex('%[0-9]%', endno), 20))

    , convert(int, SubString(controlnumber, PatIndex('%[0-9]%', controlnumber), 20))) --StartNum if null

    AS EndNum

    from

    gaps

    order by

    controlnumber

    SELECT

    p.*, p2.EndNum AS PreviousEndNum,

    CASE WHEN (p.startNum = p2.endnum + 1) THEN 'Filled' ELSE 'Gapped' END

    FROM

    #Midpoint AS p

    LEFT JOIN-- Link to the prior row

    #Midpoint AS p2

    ONp.RowID = p2.RowID + 1

    --WHEREp.StartNum <> p2.EndNum + 1

    order by

    p.RowID


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Did you delete the 'null' endno rows? If so I ran your code against mine and it didnt find any. try running my code against yours and see what happens? You'll see the gaps.

    **edit**

    Looking at the 13 null rows and comparing my results, I see there are no gaps had I left them in. try removing them and if our codes are the same, we should see the same results.

    **double edit**

    My ideal output is returned from my code. You code produces alot of extra fields. Any quick way to cut it down? See my previous post for an example or run my code.

  • These are my results with the null rows removed:

    Pattern GapStart GapEnd

    -------- -------------------- --------------------

    PPCI 737005 737005

    PPCI 737009 737009

    PPCI 999798 999798

    PPCI 1346238 1346238

    PPCI 1350958 1350958

    PPCI 1462905 1462905

    PPCI 1620877 1620877

    PPCI 1620882 1620882

    PPCI 1620891 1620891

    PPCI 1620899 1620899

    PPCI 1620909 1620909

    PPCI 1620957 1620957

    PPCI 1620962 1620962

    The code modification in question (I'm not nearly done with this, just a validation step confused me):

    SELECT

    CAST( p.Pattern AS VARCHAR(8)) AS Pattern,

    p2.EndNum +1 AS GapStart,

    p.StartNum -1 AS GapEnd

    --p.*, p2.EndNum AS PreviousEndNum,

    --CASE WHEN (p.startNum = p2.endnum + 1) THEN 'Filled' ELSE 'Gapped' END

    FROM

    #Midpoint AS p

    LEFT JOIN-- Link to the prior row

    #Midpoint AS p2

    ONp.RowID = p2.RowID + 1

    WHEREp.StartNum <> p2.EndNum + 1

    order by

    p.RowID

    EDIT:

    I just did a spot check on your results, and they're inaccurate. For example, you return this:

    000737006 - 000737008

    RowID 73230 has startnum 737006 through 737008

    You're picking up some false positives with your code.

    2nd edit: your row Id is 1972826


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Craig Farrell (10/8/2010)


    These are my results with the null rows removed:

    Pattern GapStart GapEnd

    -------- -------------------- --------------------

    PPCI 737005 737005

    PPCI 737009 737009

    PPCI 999798 999798

    PPCI 1346238 1346238

    PPCI 1350958 1350958

    PPCI 1462905 1462905

    PPCI 1620877 1620877

    PPCI 1620882 1620882

    PPCI 1620891 1620891

    PPCI 1620899 1620899

    PPCI 1620909 1620909

    PPCI 1620957 1620957

    PPCI 1620962 1620962

    The code modification in question (I'm not nearly done with this, just a validation step confused me):

    SELECT

    CAST( p.Pattern AS VARCHAR(8)) AS Pattern,

    p2.EndNum +1 AS GapStart,

    p.StartNum -1 AS GapEnd

    --p.*, p2.EndNum AS PreviousEndNum,

    --CASE WHEN (p.startNum = p2.endnum + 1) THEN 'Filled' ELSE 'Gapped' END

    FROM

    #Midpoint AS p

    LEFT JOIN-- Link to the prior row

    #Midpoint AS p2

    ONp.RowID = p2.RowID + 1

    WHEREp.StartNum <> p2.EndNum + 1

    order by

    p.RowID

    I see the confusion now. Your code is returning the actual gaps. My code is returning the range of continuous numbers. In the end i'll need to identify the gaps as your code does in addition to the actual ranges. So it does help.

    How can I add the logic to remove the prefix to my existing code? Any ideas?

  • Craig if you were local i'd buy you a beer. We killed 2 birds with one stone. I needed to identify the ranges of sequential numbers (my code) and the ranges of the gaps (your code).

    Now I just need to tweak my code to dynamically remove the prefix like yours does.

  • craig-404139 (10/8/2010)


    Craig if you were local i'd buy you a beer. We killed 2 birds with one stone. I needed to identify the ranges of sequential numbers (my code) and the ranges of the gaps (your code).

    Now I just need to tweak my code to dynamically remove the prefix like yours does.

    Easy enough, gimme 2 minutes here and I'll get you both result sets. 🙂 Btw, the method of removing the pattern portion (PPCI) is the whole substring(charindex()) bit.

    I'm almost there. I just goofed something when I tried to include fake 0 rows to catch the 1 through x bits if we start at say, 90. Should be easy enough to do a double result set from the same code.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • This should do it:

    DROP TABLE #gaps

    DROP TABLE #Contiguous

    DROP TABLE #MidPoint

    CREATE TABLE #MidPoint

    (RowID INT,

    PrevRowID INT,

    Pattern VARCHAR(100),

    Startnum BIGINT,

    EndNum BIGINT,

    Series INT

    )

    INSERT INTO #Midpoint

    ( RowID, Pattern, StartNum, EndNum)

    select

    ROW_NUMBER() OVER (ORDER BY

    left(controlnumber, PatIndex('%[0-9]%', controlnumber)-1),

    convert(int, SubString(controlnumber, PatIndex('%[0-9]%', controlnumber), 20))

    ) AS RowNum,

    left(controlnumber, PatIndex('%[0-9]%', controlnumber)-1) AS Pattern,

    convert(int, SubString(controlnumber, PatIndex('%[0-9]%', controlnumber), 20)) AS StartNum

    ,

    --ISNULL(

    convert(int, SubString(endno, PatIndex('%[0-9]%', endno), 20))

    --, convert(int, SubString(controlnumber, PatIndex('%[0-9]%', controlnumber), 20))) --StartNum if null

    AS EndNum

    from

    (SELECT

    *

    FROM

    gaps

    UNION ALL -- Create a dummy 0 record for each pattern to catch 1 through x gaps

    SELECT

    NULL AS id,

    Pattern + REPLICATE('0', NumPlaces) AS controlNumber,

    Pattern + REPLICATE('0', NumPlaces) AS endno

    FROM

    (SELECT DISTINCT

    left(controlnumber, PatIndex('%[0-9]%', controlnumber)-1) AS Pattern,

    LEN( controlnumber) - LEN( left(controlnumber, PatIndex('%[0-9]%', controlnumber))) AS NumPlaces

    FROM

    gaps

    ) AS PatZero

    ) as drv

    where

    endno is not null

    order by

    controlnumber

    UPDATE #Midpoint SET PrevRowID = RowID -1

    CREATE CLUSTERED INDEX idx_Midpoint1 ON #MidPoint (RowID)

    --CREATE NONCLUSTERED INDEX idx_Midpoint2 ON #MidPoint( PrevRowID) INCLUDE ( EndNum)

    SELECT

    CAST( p.Pattern AS VARCHAR(8)) AS Pattern,

    p2.EndNum +1 AS GapStart,

    p.StartNum -1 AS GapEnd

    --p.*, p2.EndNum AS PreviousEndNum,

    --CASE WHEN (p.startNum = p2.endnum + 1) THEN 'Filled' ELSE 'Gapped' END

    INTO

    #gaps

    FROM

    #Midpoint AS p

    LEFT JOIN-- Link to the prior row

    #Midpoint AS p2

    ONp.PrevRowID = p2.RowID

    WHEREp.StartNum <> p2.EndNum + 1

    order by

    p.RowID

    --Quirky Update here. It SHOULD be ran with OPTION( MAXDOP 1). Only needs it for this query.

    DECLARE @Series INT

    SET @Series = 1

    UPDATE p

    SET

    @Series = Series = CASE WHEN p.StartNum <> p2.EndNum + 1 THEN @Series + 1 ELSE @Series END

    FROM

    #Midpoint AS p

    LEFT JOIN-- Link to the prior row

    #Midpoint AS p2

    ONp.PrevRowID = p2.RowID

    -- ORDER BY p.RowID

    -- The clustered index deals with this for us.

    OPTION( MAXDOP 1)

    SELECT

    Pattern,

    Series,

    MIN( StartNum) AS BeginSeries,

    Max( EndNum) AS EndSeries

    INTO

    #Contiguous

    FROM

    #MidPoint

    GROUP BY

    Pattern,

    Series

    select * from #gaps order by gapstart

    select * from #Contiguous

    ORDER by

    pattern, series, beginseries


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • I am getting weird results from the #contiguous table. It looks like:

    PPCI412903540

    PPCI7102903235

    PPCI1932903562

    PPCI2122853875

    PPCI1022903952

    PPCI131402902866

    PPCI8592903560

    PPCI1452903559

    PPCI5492903970

    PPCI11452900794

    PPCI632903558

    PPCI12542903556

    PPCI33962903208

    PPCI902903539

Viewing 15 posts - 46 through 60 (of 91 total)

You must be logged in to reply to this topic. Login to reply