Finding Sequences (Need help ASAP)

Question

Finding Sequences (Need help ASAP)

Viewing 15 posts - 16 through 30 (of 91 total)

You must be logged in to reply to this topic. Login to reply

WayneS SSC Guru Points: 95460 More actions · Answer 1

Craig Farrell (10/6/2010)

WayneS (10/6/2010)
craig-404139 (10/6/2010)
Question: when a prefix changes, do the numbers restart at 1? Or do they continue?
(I've got some real nifty, high-speed code (thanks to Jeff!) to quickly identify gaps - but it's designed to work on numbers. Will take some mods to work with the prefix, but should be simple.)
Hey Wayne, if you can modify to match the Pattern/NumPiece code above I'd be curious to see that result. The Pattern splitter is pretty zippy and will return a group/numeric resultset.

Okay... enough teasing. But, considering that the OP mentioned that all codes for the same prefix will be the same length, it would be a better test to follow that rule.

if object_id('tempdb..#GapTest') IS NOT NULL DROP TABLE #GapTest;

-- make a temp table so that we can use an index...

CREATE TABLE #GapTest (

Prefix varchar(20),

Suffix int,

PRIMARY KEY CLUSTERED(prefix,suffix));

-- put the data into the temp table, separating the prefix and suffix

INSERT INTO #GapTest

SELECT Prefix = left(TextID, PatIndex('%[0-9]%', TextID)-1),

Suffix = convert(int, SubString(TextID, PatIndex('%[0-9]%', TextID), 20))

FROM #TestHarness;

-- get the results

SELECT Prefix,

GapStart = (SELECT ISNULL(MAX(lo.Suffix),0)+1

FROM #GapTest lo

WHERE lo.Prefix = hi.Prefix

AND lo.Suffix < hi.Suffix),

GapEnd = hi.Suffix - 1

FROM #GapTest hi

WHERE hi.Suffix NOT IN (SELECT Suffix + 1

FROM #GapTest g1

WHERE g1.Prefix = hi.Prefix);

@Craig Ferrell.... 2.4 seconds!

@jeff - when you read this... THANKS!

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

andrewd.smith SSCrazy Eights Points: 8444 More actions · Answer 2

I have tweaked Craig's test data generator so that the number of digits is fixed for a given prefix. In my tests below, generated just over 2 million rows. Note that I've added a primary key index to the TextID column (which I believe is the case with the actual data).

CREATE TABLE #TestHarness (

TextID VARCHAR(200) NOT NULL PRIMARY KEY CLUSTERED

)

--TRUNCATE TABLE #TestHarness

INSERT INTO #TestHarness(TextID)

SELECT PREFIX.Code + REPLACE(STR(T.N, PREFIX.Digits), ' ', '0')

FROM Tally T

CROSS JOIN (

SELECT REPLICATE(CHAR(64 + T2.N), T1.N), (T1.N + T2.N) % 4 + 6

FROM Tally T1 CROSS JOIN Tally T2

WHERE T1.N BETWEEN 1 AND 6

AND T2.N BETWEEN 1 AND 10

) PREFIX(Code, Digits)

WHERE (T.N < 50000)

AND NOT (T.N % 5 = 0 OR T.N % 7 = 0)

Using this test data, I compared my query below with Wayne's query. I realise that the two queries are not strictly comparable since mine returns contiguous sequences, whereas Wayne's return gaps, but the OP seemed to be interested in both.

;WITH cteParse(prefix, num, rn) AS (

SELECT

SUBSTRING(TextID, 1, PATINDEX('%[0-9]%', TextID) - 1),

CAST(SUBSTRING(TextID, PATINDEX('%[0-9]%', TextID), 10) AS int),

ROW_NUMBER() OVER (ORDER BY TextID)

FROM #TestHarness

)

SELECT

prefix, MIN(num), MAX(num)

FROM cteParse

GROUP BY prefix, num - rn

ORDER BY prefix, num - rn

On my test server, my query completed in 11 seconds, whereas Wayne's query completed in about 22 seconds (14 seconds to build temporary table, 8 seconds for the final query). [EDIT: With about 260,000 rows of test data, the execution times are similar for the two queries on my server: about 3 seconds.]

Below is a variation of the above query that returns the start and end IDs of contiguous sequences, and is almost as fast as the query above.

;WITH cteParse(id, prefix, num, ndig, rn) AS (

SELECT

TextID,

SUBSTRING(TextID, 1, PATINDEX('%[0-9]%', TextID) - 1),

CAST(SUBSTRING(TextID, PATINDEX('%[0-9]%', TextID), 10) AS int),

LEN(TextID) - PATINDEX('%[0-9]%', TextID) + 1,

ROW_NUMBER() OVER (ORDER BY TextID)

FROM #TestHarness

)

SELECT

prefix + RIGHT(REPLICATE('0', ndig) + CAST(MIN(num) AS varchar(10)), ndig) AS [From],

CASE WHEN (COUNT(*) > 1) THEN

prefix + RIGHT(REPLICATE('0', ndig) + CAST(MAX(num) AS varchar(10)), ndig)

ELSE '' END AS [To]

FROM cteParse

GROUP BY prefix, num - rn, ndig

ORDER BY prefix, num - rn

WayneS SSC Guru Points: 95460 More actions · Answer 3

Andrew - I'll test out your modifications tonight. Interesting the large difference in time between our systems.

Thanks for making the change to use the same # of characters for a given prefix.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

Jeff Moden SSC Guru Points: 1004403 More actions · Answer 4

So... without me running all the code presented so far, what has been decided? Does the numbering restart for each prefix or not?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

WayneS SSC Guru Points: 95460 More actions · Answer 5

Jeff Moden (10/7/2010)
So... without me running all the code presented so far, what has been decided? Does the numbering restart for each prefix or not?

Haven't heard back from the OP yet on the restarting.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

craig-404139 Hall of Fame Points: 3274 More actions · Answer 6

WayneS (10/7/2010)
Jeff Moden (10/7/2010)
So... without me running all the code presented so far, what has been decided? Does the numbering restart for each prefix or not?
k from the OP yet on the restarting.

It does not. each prefix identifies a unique 'set' of received data. They have no relation to each other , other than they are documents for the same case.

Does that answer your question?

craig-404139 Hall of Fame Points: 3274 More actions · Answer 7

Thanks for the help guys, but I am 100% lost AND my requirements just changed. FML. I will update soon with new test data (if I don't pill my hair out first)

craig-404139 Hall of Fame Points: 3274 More actions · Answer 8

Ok. Because of this time crunch and my frustration with this, let's try this a different way. I need to compare the controlnumber to the endno and find the sequences. (Realize I didn't have correct information in my original post, because I didn't know at that time)

The controlnumber and endno are dealing with ranges of documents. So for example line one

PPCI000000040 is the starting number and PPCI000000044 is the ending number. Meaning that is a 4 page document. PPCI000000045 is starting and PPCI000000048 is ending, meaning that is a 3 page doc. Etc...Since we are loooking for missing documents I need to compare the endno to the controlnumber and return the range if the last endno is not one less than the next control number.

Ideally I need to see something like this:

PPCI000000040 - PPCI000000053

PPCI000000057 - PPCI000000102

PPCI000000115 - PPCI000000197

etc...

Let's forget about the variable range text prefix for now and just work with this data (I'm getting wayyyy to confused with the previous code) If possible please write this to apply to my sample dataset.

Please let me know if you need clarification, I tried to explain it the best way possible and provide enough dummy data.

--Create test table

create table gaps2

(controlnumber varchar(255)

,endno varchar(255))

Code removed per request.

WayneS SSC Guru Points: 95460 More actions · Answer 9

craig-404139 (10/6/2010)
Please skip to post Post #1000624 for updated requirements.

It's actually post # 1000643. Also, if you click the post number, you will get a URL that you can then paste in so that it's clickable. Just be sure to enclose in in [ url ] and [ / url ] tags. (remove all spaces between the brackets!) Like this: http://www.sqlservercentral.com/Forums/FindPost1000643.aspx. Just quote my message to see how it's done.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

craig-404139 Hall of Fame Points: 3274 More actions · Answer 10

WayneS (10/7/2010)
craig-404139 (10/6/2010)
Please skip to post Post #1000624 for updated requirements.
It's actually post # 1000643. Also, if you click the post number, you will get a URL that you can then paste in so that it's clickable. Just be sure to enclose in in [ url ] and [ / url ] tags. (remove all spaces between the brackets!) Like this: http://www.sqlservercentral.com/Forums/FindPost1000643.aspx. Just quote my message to see how it's done.

Thanks for the info. I edited my original post. This just isn't my morning.

WayneS SSC Guru Points: 95460 More actions · Answer 11

Here's some code.

Note that I kept in the prefix checking... I need to strip it out anyway, so I just kept it in.

-- need to add some columns to the table.

ALTER TABLE gaps2 ADD Prefix varchar(255),

GroupNo int;

-- change the controlnumber to not null and put a primary key on it.

ALTER TABLE gaps2 ALTER COLUMN controlnumber varchar(255) NOT NULL;

ALTER TABLE gaps2 ADD PRIMARY KEY CLUSTERED (controlnumber);

-- declare and initialize some variables for the update statement.

declare @Prefix varchar(255),

@GroupNo int,

@ControlNumber varchar(255),

@LastNumber int;

set @GroupNo = 0;

set @Prefix = '';

set @ControlNumber = '';

set @LastNumber = 0;

-- This form of the UPDATE statement has some particular rules.

-- See Jeff Moden's article at http://www.sqlservercentral.com/articles/T-SQL/68467/

-- for a complete discussion of how this works, and all of the rules for utilizing it.

UPDATE t1

SET @GroupNo = GroupNo = CASE WHEN @Prefix <> LEFT(controlnumber, PATINDEX('%[0-9]%', controlnumber)-1) THEN 1

-- new prefix - start over

WHEN CONVERT(INT, SUBSTRING(controlnumber, PATINDEX('%[0-9]%', controlnumber), 255)) = @LastNumber + 1 THEN @GroupNo

-- continues the chain - use same group number

ELSE @GroupNo + 1 -- increment the group number

END,

@Prefix = Prefix = LEFT(controlnumber, PATINDEX('%[0-9]%', controlnumber)-1), -- store the prefix so we can use it later

@LastNumber = CONVERT(INT, SUBSTRING(endno, PATINDEX('%[0-9]%', endno), 255)), -- get what the last number is, to compare to the next number

@ControlNumber = controlnumber -- anchor column

FROM gaps2 t1 WITH (TABLOCKX) -- prevent others from using the table

OPTION (MAXDOP 1); -- prevent parallelism

SELECT Prefix, GroupNo, MIN(controlnumber), max(endno)

FROM gaps2

GROUP BY Prefix, GroupNo

ORDER BY Prefix, GroupNo;

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

craig-404139 Hall of Fame Points: 3274 More actions · Answer 12

Wow, impressive. Can you explain what the code is doing? I've never seen things like :

@GroupNo = GroupNo = case....

@Prefix = Prefix = LEFT.....

Looks like you are setting the variable, but why the extra words?

Also the update statement is just updating variables, correct? No actual table data? I'll review the link you posted as well.

I'll need to remove the table lock and option(maxdrop 1) options, is that going to create a problem?

This is a huge help man, I appreciate it.

****edit***

I see you added columns to the table, Can this be done without adding columns? I can't modify my production table(s). Also explains why the are like that. It's inserting into the newly created table. Can I just remove that bit of coding?

@GroupNo = GroupNo = case....

@Prefix = Prefix = LEF

WayneS SSC Guru Points: 95460 More actions · Answer 13

craig-404139 (10/7/2010)
Wow, impressive. Can you explain what the code is doing? I've never seen things like :
@GroupNo = GroupNo = case....
@Prefix = Prefix = LEFT.....
Looks like you are setting the variable, but why the extra words?
Also the update statement is just updating variables, correct? No actual table data? I'll review the link you posted as well.
I'll need to remove the table lock and option(maxdrop 1) options, is that going to create a problem?
This is a huge help man, I appreciate it.

Well, like I commented in the code:

-- This form of the UPDATE statement has some particular rules.

-- See Jeff Moden's article at http://www.sqlservercentral.com/articles/T-SQL/68467/

-- for a complete discussion of how this works, and all of the rules for utilizing it.

If you remove the tablockx, option (maxdop 1), the CLUSTERED index or the anchor column, then DON'T USE THIS!!!!!!! Those are REQUIRED for this to work properly. Definitely review that link.

The update statement is updating BOTH variables, and table data. It's doing the update in the order of the clustered index. By updating a value into a variable, I can utilize the variable in the next row.

FYI, this has two nicknames: "quirky update" and "pseudo cursor". Once you understand what is going on, you'll understand both of these.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

WayneS SSC Guru Points: 95460 More actions · Answer 14

craig-404139 (10/7/2010)
I see you added columns to the table, Can this be done without adding columns? I can't modify my production table(s). Also explains why the are like that. It's inserting into the newly created table. Can I just remove that bit of coding?
@GroupNo = GroupNo = case....
@Prefix = Prefix = LEF

Then just select the data into a #temp table, i.e. #gaps2, and change all the references of gaps2 to #gaps2.

Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes

If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!

Links:
For better assistance in answering your questions
Performance Problems
Common date/time routines
Understanding and Using APPLY Part 1 & Part 2

craig-404139 Hall of Fame Points: 3274 More actions · Answer 15

WayneS (10/7/2010)
craig-404139 (10/7/2010)
I see you added columns to the table, Can this be done without adding columns? I can't modify my production table(s). Also explains why the are like that. It's inserting into the newly created table. Can I just remove that bit of coding?
@GroupNo = GroupNo = case....
@Prefix = Prefix = LEF
Then just select the data into a #temp table, i.e. #gaps2, and change all the references of gaps2 to #gaps2.

Will do. I am playing with it all now. Also reading that article you linked.