Finding Sequences (Need help ASAP)

Question

Finding Sequences (Need help ASAP)

Viewing 15 posts - 46 through 60 (of 91 total)

You must be logged in to reply to this topic. Login to reply

craig-404139 Hall of Fame Points: 3274 More actions · Answer 1

Craig Farrell (10/8/2010)
I had assumed, since 2k5 and 2k don't behave, but figured it was worth the question. I'm sure Wayne and the others will be able to get you a direct solution off the actual data. I'll just step back and watch the show unless we can do some form of CSV importing, but I'm not sure how much bandwidth you have to try to get that exported.

lemme see if I can put something together. The more help the better. I am completely lost/frustrated with this whole thing.

craig-404139 Hall of Fame Points: 3274 More actions · Answer 2

CSV file attached.

First column = id

Second column = controlnumber

Third column = endno

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 3

craig-404139 (10/8/2010)
CSV file attached.
First column = id
Second column = controlnumber
Third column = endno

That worked. Alright, so we're looking at taking two unique values, figuring out the 'fill', and splitting off the pattern.

@Wayne: You're right. With no numerics ever in the alpha code, yours runs faster. Against this test data my method is 7 seconds, yours is 5. Mind that's with a data return and using select vs. select. I like to keep things as easy as possible.

@Craig-4: I should have a solution soonish. I'm getting null results from endno for some reason and I'm trying to track down the data burp.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

craig-404139 Hall of Fame Points: 3274 More actions · Answer 4

Craig Farrell (10/8/2010)
craig-404139 (10/8/2010)
CSV file attached.
First column = id
Second column = controlnumber
Third column = endno
That worked. Alright, so we're looking at taking two unique values, figuring out the 'fill', and splitting off the pattern.
@Wayne: You're right. With no numerics ever in the alpha code, yours runs faster. Against this test data my method is 7 seconds, yours is 5. Mind that's with a data return and using select vs. select. I like to keep things as easy as possible.
@Craig-4: I should have a solution soonish. I'm getting null results from endno for some reason and I'm trying to track down the data burp.

I know about the 13 null rows. Sorry I forgot to remove them. You can discard those rows. It's an error in how I put this together.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 5

Found them, the sample rows:

2048205PPCI000737005NULL

2048206PPCI000737009NULL

2048207PPCI000999798NULL

2048208PPCI001346238NULL

2048209PPCI001350958NULL

2048210PPCI001462905NULL

2048211PPCI001620877NULL

2048212PPCI001620882NULL

2048213PPCI001620891NULL

2048214PPCI001620899NULL

2048215PPCI001620909NULL

2048216PPCI001620957NULL

2048217PPCI001620962NULL

They appear to be single page docs. Is this expected in the data?

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

craig-404139 Hall of Fame Points: 3274 More actions · Answer 6

Craig Farrell (10/8/2010)
Found them, the sample rows:
2048205PPCI000737005NULL
2048206PPCI000737009NULL
2048207PPCI000999798NULL
2048208PPCI001346238NULL
2048209PPCI001350958NULL
2048210PPCI001462905NULL
2048211PPCI001620877NULL
2048212PPCI001620882NULL
2048213PPCI001620891NULL
2048214PPCI001620899NULL
2048215PPCI001620909NULL
2048216PPCI001620957NULL
2048217PPCI001620962NULL
They appear to be single page docs. Is this expected in the data?

Technically yes. But you can discard those rows or update the control = endno. Either way.

craig-404139 Hall of Fame Points: 3274 More actions · Answer 7

I think i'm on to something. The below code works if I strip out the PPCI prefix. (Used a update/replace statement on the test table) Runs in a few seconds. Now to modify it to work with the prefixes.....

select d.controlnumber as start

,(select min(a.endno) as id

from document as a

left outer join document as b

on a.endno = b.controlnumber - 1

where b.controlnumber is null

and a.endno >= d.endno

) as ending

from document as d

left outer join document as d1

on d1.endno = d.controlnumber - 1

where d1.endno is null

order by d.controlnumber

These are my initial results with the prefixes stripped out.

000000001 - 000737004

000737006- 000737008

000737010- 000999797

000999799- 001346237

001346239- 001350957

001350959- 001462904

001462906- 001620876

001620878- 001620881

001620883- 001620890

001620892- 001620898

001620900- 001620908

001620910- 001620956

001620958- 001620961

001620963- 002903970

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 8

This will sound strange but... are you sure you have gaps?

Run this code:

CREATE TABLE #MidPoint

(RowID INT,

Pattern VARCHAR(100),

Startnum BIGINT,

EndNum BIGINT

)

INSERT INTO #Midpoint

select

ROW_NUMBER() OVER (ORDER BY

left(controlnumber, PatIndex('%[0-9]%', controlnumber)-1),

convert(int, SubString(controlnumber, PatIndex('%[0-9]%', controlnumber), 20))

) AS RowNum,

left(controlnumber, PatIndex('%[0-9]%', controlnumber)-1) AS Pattern,

convert(int, SubString(controlnumber, PatIndex('%[0-9]%', controlnumber), 20)) AS StartNum

,ISNULL( convert(int, SubString(endno, PatIndex('%[0-9]%', endno), 20))

, convert(int, SubString(controlnumber, PatIndex('%[0-9]%', controlnumber), 20))) --StartNum if null

AS EndNum

from

gaps

order by

controlnumber

SELECT

p.*, p2.EndNum AS PreviousEndNum,

CASE WHEN (p.startNum = p2.endnum + 1) THEN 'Filled' ELSE 'Gapped' END

FROM

#Midpoint AS p

LEFT JOIN-- Link to the prior row

#Midpoint AS p2

ONp.RowID = p2.RowID + 1

--WHEREp.StartNum <> p2.EndNum + 1

order by

p.RowID

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

craig-404139 Hall of Fame Points: 3274 More actions · Answer 9

Did you delete the 'null' endno rows? If so I ran your code against mine and it didnt find any. try running my code against yours and see what happens? You'll see the gaps.

**edit**

Looking at the 13 null rows and comparing my results, I see there are no gaps had I left them in. try removing them and if our codes are the same, we should see the same results.

**double edit**

My ideal output is returned from my code. You code produces alot of extra fields. Any quick way to cut it down? See my previous post for an example or run my code.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 10

These are my results with the null rows removed:

Pattern GapStart GapEnd

-------- -------------------- --------------------

PPCI 737005 737005

PPCI 737009 737009

PPCI 999798 999798

PPCI 1346238 1346238

PPCI 1350958 1350958

PPCI 1462905 1462905

PPCI 1620877 1620877

PPCI 1620882 1620882

PPCI 1620891 1620891

PPCI 1620899 1620899

PPCI 1620909 1620909

PPCI 1620957 1620957

PPCI 1620962 1620962

The code modification in question (I'm not nearly done with this, just a validation step confused me):

SELECT

CAST( p.Pattern AS VARCHAR(8)) AS Pattern,

p2.EndNum +1 AS GapStart,

p.StartNum -1 AS GapEnd

--p.*, p2.EndNum AS PreviousEndNum,

--CASE WHEN (p.startNum = p2.endnum + 1) THEN 'Filled' ELSE 'Gapped' END

FROM

#Midpoint AS p

LEFT JOIN-- Link to the prior row

#Midpoint AS p2

ONp.RowID = p2.RowID + 1

WHEREp.StartNum <> p2.EndNum + 1

order by

p.RowID

EDIT:

I just did a spot check on your results, and they're inaccurate. For example, you return this:

000737006 - 000737008

RowID 73230 has startnum 737006 through 737008

You're picking up some false positives with your code.

2nd edit: your row Id is 1972826

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

craig-404139 Hall of Fame Points: 3274 More actions · Answer 11

Craig Farrell (10/8/2010)
These are my results with the null rows removed:
Pattern GapStart GapEnd
-------- -------------------- --------------------
PPCI 737005 737005
PPCI 737009 737009
PPCI 999798 999798
PPCI 1346238 1346238
PPCI 1350958 1350958
PPCI 1462905 1462905
PPCI 1620877 1620877
PPCI 1620882 1620882
PPCI 1620891 1620891
PPCI 1620899 1620899
PPCI 1620909 1620909
PPCI 1620957 1620957
PPCI 1620962 1620962
The code modification in question (I'm not nearly done with this, just a validation step confused me):
SELECT
CAST( p.Pattern AS VARCHAR(8)) AS Pattern,
p2.EndNum +1 AS GapStart,
p.StartNum -1 AS GapEnd
--p.*, p2.EndNum AS PreviousEndNum,
--CASE WHEN (p.startNum = p2.endnum + 1) THEN 'Filled' ELSE 'Gapped' END
FROM
#Midpoint AS p
LEFT JOIN-- Link to the prior row
#Midpoint AS p2
ONp.RowID = p2.RowID + 1
WHEREp.StartNum <> p2.EndNum + 1
order by
p.RowID

I see the confusion now. Your code is returning the actual gaps. My code is returning the range of continuous numbers. In the end i'll need to identify the gaps as your code does in addition to the actual ranges. So it does help.

How can I add the logic to remove the prefix to my existing code? Any ideas?

craig-404139 Hall of Fame Points: 3274 More actions · Answer 12

Craig if you were local i'd buy you a beer. We killed 2 birds with one stone. I needed to identify the ranges of sequential numbers (my code) and the ranges of the gaps (your code).

Now I just need to tweak my code to dynamically remove the prefix like yours does.

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 13

craig-404139 (10/8/2010)
Craig if you were local i'd buy you a beer. We killed 2 birds with one stone. I needed to identify the ranges of sequential numbers (my code) and the ranges of the gaps (your code).
Now I just need to tweak my code to dynamically remove the prefix like yours does.

Easy enough, gimme 2 minutes here and I'll get you both result sets. 🙂 Btw, the method of removing the pattern portion (PPCI) is the whole substring(charindex()) bit.

I'm almost there. I just goofed something when I tried to include fake 0 rows to catch the 1 through x bits if we start at say, 90. Should be easy enough to do a double result set from the same code.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 14

This should do it:

DROP TABLE #gaps

DROP TABLE #Contiguous

DROP TABLE #MidPoint

CREATE TABLE #MidPoint

(RowID INT,

PrevRowID INT,

Pattern VARCHAR(100),

Startnum BIGINT,

EndNum BIGINT,

Series INT

)

INSERT INTO #Midpoint

( RowID, Pattern, StartNum, EndNum)

select

ROW_NUMBER() OVER (ORDER BY

left(controlnumber, PatIndex('%[0-9]%', controlnumber)-1),

convert(int, SubString(controlnumber, PatIndex('%[0-9]%', controlnumber), 20))

) AS RowNum,

left(controlnumber, PatIndex('%[0-9]%', controlnumber)-1) AS Pattern,

convert(int, SubString(controlnumber, PatIndex('%[0-9]%', controlnumber), 20)) AS StartNum

,

--ISNULL(

convert(int, SubString(endno, PatIndex('%[0-9]%', endno), 20))

--, convert(int, SubString(controlnumber, PatIndex('%[0-9]%', controlnumber), 20))) --StartNum if null

AS EndNum

from

(SELECT

*

FROM

gaps

UNION ALL -- Create a dummy 0 record for each pattern to catch 1 through x gaps

SELECT

NULL AS id,

Pattern + REPLICATE('0', NumPlaces) AS controlNumber,

Pattern + REPLICATE('0', NumPlaces) AS endno

FROM

(SELECT DISTINCT

left(controlnumber, PatIndex('%[0-9]%', controlnumber)-1) AS Pattern,

LEN( controlnumber) - LEN( left(controlnumber, PatIndex('%[0-9]%', controlnumber))) AS NumPlaces

FROM

gaps

) AS PatZero

) as drv

where

endno is not null

order by

controlnumber

UPDATE #Midpoint SET PrevRowID = RowID -1

CREATE CLUSTERED INDEX idx_Midpoint1 ON #MidPoint (RowID)

--CREATE NONCLUSTERED INDEX idx_Midpoint2 ON #MidPoint( PrevRowID) INCLUDE ( EndNum)

SELECT

CAST( p.Pattern AS VARCHAR(8)) AS Pattern,

p2.EndNum +1 AS GapStart,

p.StartNum -1 AS GapEnd

--p.*, p2.EndNum AS PreviousEndNum,

--CASE WHEN (p.startNum = p2.endnum + 1) THEN 'Filled' ELSE 'Gapped' END

INTO

#gaps

FROM

#Midpoint AS p

LEFT JOIN-- Link to the prior row

#Midpoint AS p2

ONp.PrevRowID = p2.RowID

WHEREp.StartNum <> p2.EndNum + 1

order by

p.RowID

--Quirky Update here. It SHOULD be ran with OPTION( MAXDOP 1). Only needs it for this query.

DECLARE @Series INT

SET @Series = 1

UPDATE p

SET

@Series = Series = CASE WHEN p.StartNum <> p2.EndNum + 1 THEN @Series + 1 ELSE @Series END

FROM

#Midpoint AS p

LEFT JOIN-- Link to the prior row

#Midpoint AS p2

ONp.PrevRowID = p2.RowID

-- ORDER BY p.RowID

-- The clustered index deals with this for us.

OPTION( MAXDOP 1)

SELECT

Pattern,

Series,

MIN( StartNum) AS BeginSeries,

Max( EndNum) AS EndSeries

INTO

#Contiguous

FROM

#MidPoint

GROUP BY

Pattern,

Series

select * from #gaps order by gapstart

select * from #Contiguous

ORDER by

pattern, series, beginseries

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

craig-404139 Hall of Fame Points: 3274 More actions · Answer 15

I am getting weird results from the #contiguous table. It looks like:

PPCI412903540

PPCI7102903235

PPCI1932903562

PPCI2122853875

PPCI1022903952

PPCI131402902866

PPCI8592903560

PPCI1452903559

PPCI5492903970

PPCI11452900794

PPCI632903558

PPCI12542903556

PPCI33962903208

PPCI902903539