October 6, 2010 at 7:07 pm
Craig Farrell (10/6/2010)
WayneS (10/6/2010)
craig-404139 (10/6/2010)
Question: when a prefix changes, do the numbers restart at 1? Or do they continue?(I've got some real nifty, high-speed code (thanks to Jeff!) to quickly identify gaps - but it's designed to work on numbers. Will take some mods to work with the prefix, but should be simple.)
Hey Wayne, if you can modify to match the Pattern/NumPiece code above I'd be curious to see that result. The Pattern splitter is pretty zippy and will return a group/numeric resultset.
Okay... enough teasing. But, considering that the OP mentioned that all codes for the same prefix will be the same length, it would be a better test to follow that rule.
if object_id('tempdb..#GapTest') IS NOT NULL DROP TABLE #GapTest;
-- make a temp table so that we can use an index...
CREATE TABLE #GapTest (
Prefix varchar(20),
Suffix int,
PRIMARY KEY CLUSTERED(prefix,suffix));
-- put the data into the temp table, separating the prefix and suffix
INSERT INTO #GapTest
SELECT Prefix = left(TextID, PatIndex('%[0-9]%', TextID)-1),
Suffix = convert(int, SubString(TextID, PatIndex('%[0-9]%', TextID), 20))
FROM #TestHarness;
-- get the results
SELECT Prefix,
GapStart = (SELECT ISNULL(MAX(lo.Suffix),0)+1
FROM #GapTest lo
WHERE lo.Prefix = hi.Prefix
AND lo.Suffix < hi.Suffix),
GapEnd = hi.Suffix - 1
FROM #GapTest hi
WHERE hi.Suffix NOT IN (SELECT Suffix + 1
FROM #GapTest g1
WHERE g1.Prefix = hi.Prefix);
@Craig Ferrell.... 2.4 seconds!
@jeff - when you read this... THANKS!
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
October 7, 2010 at 6:35 am
I have tweaked Craig's test data generator so that the number of digits is fixed for a given prefix. In my tests below, generated just over 2 million rows. Note that I've added a primary key index to the TextID column (which I believe is the case with the actual data).
CREATE TABLE #TestHarness (
TextID VARCHAR(200) NOT NULL PRIMARY KEY CLUSTERED
)
--TRUNCATE TABLE #TestHarness
INSERT INTO #TestHarness(TextID)
SELECT PREFIX.Code + REPLACE(STR(T.N, PREFIX.Digits), ' ', '0')
FROM Tally T
CROSS JOIN (
SELECT REPLICATE(CHAR(64 + T2.N), T1.N), (T1.N + T2.N) % 4 + 6
FROM Tally T1 CROSS JOIN Tally T2
WHERE T1.N BETWEEN 1 AND 6
AND T2.N BETWEEN 1 AND 10
) PREFIX(Code, Digits)
WHERE (T.N < 50000)
AND NOT (T.N % 5 = 0 OR T.N % 7 = 0)
Using this test data, I compared my query below with Wayne's query. I realise that the two queries are not strictly comparable since mine returns contiguous sequences, whereas Wayne's return gaps, but the OP seemed to be interested in both.
;WITH cteParse(prefix, num, rn) AS (
SELECT
SUBSTRING(TextID, 1, PATINDEX('%[0-9]%', TextID) - 1),
CAST(SUBSTRING(TextID, PATINDEX('%[0-9]%', TextID), 10) AS int),
ROW_NUMBER() OVER (ORDER BY TextID)
FROM #TestHarness
)
SELECT
prefix, MIN(num), MAX(num)
FROM cteParse
GROUP BY prefix, num - rn
ORDER BY prefix, num - rn
On my test server, my query completed in 11 seconds, whereas Wayne's query completed in about 22 seconds (14 seconds to build temporary table, 8 seconds for the final query). [EDIT: With about 260,000 rows of test data, the execution times are similar for the two queries on my server: about 3 seconds.]
Below is a variation of the above query that returns the start and end IDs of contiguous sequences, and is almost as fast as the query above.
;WITH cteParse(id, prefix, num, ndig, rn) AS (
SELECT
TextID,
SUBSTRING(TextID, 1, PATINDEX('%[0-9]%', TextID) - 1),
CAST(SUBSTRING(TextID, PATINDEX('%[0-9]%', TextID), 10) AS int),
LEN(TextID) - PATINDEX('%[0-9]%', TextID) + 1,
ROW_NUMBER() OVER (ORDER BY TextID)
FROM #TestHarness
)
SELECT
prefix + RIGHT(REPLICATE('0', ndig) + CAST(MIN(num) AS varchar(10)), ndig) AS [From],
CASE WHEN (COUNT(*) > 1) THEN
prefix + RIGHT(REPLICATE('0', ndig) + CAST(MAX(num) AS varchar(10)), ndig)
ELSE '' END AS [To]
FROM cteParse
GROUP BY prefix, num - rn, ndig
ORDER BY prefix, num - rn
October 7, 2010 at 7:04 am
Andrew - I'll test out your modifications tonight. Interesting the large difference in time between our systems.
Thanks for making the change to use the same # of characters for a given prefix.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
October 7, 2010 at 7:39 am
So... without me running all the code presented so far, what has been decided? Does the numbering restart for each prefix or not?
--Jeff Moden
Change is inevitable... Change for the better is not.
October 7, 2010 at 7:41 am
Jeff Moden (10/7/2010)
So... without me running all the code presented so far, what has been decided? Does the numbering restart for each prefix or not?
Haven't heard back from the OP yet on the restarting.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
October 7, 2010 at 8:26 am
WayneS (10/7/2010)
Jeff Moden (10/7/2010)
So... without me running all the code presented so far, what has been decided? Does the numbering restart for each prefix or not?k from the OP yet on the restarting.
It does not. each prefix identifies a unique 'set' of received data. They have no relation to each other , other than they are documents for the same case.
Does that answer your question?
October 7, 2010 at 9:24 am
Thanks for the help guys, but I am 100% lost AND my requirements just changed. FML. I will update soon with new test data (if I don't pill my hair out first)
October 7, 2010 at 9:43 am
Ok. Because of this time crunch and my frustration with this, let's try this a different way. I need to compare the controlnumber to the endno and find the sequences. (Realize I didn't have correct information in my original post, because I didn't know at that time)
The controlnumber and endno are dealing with ranges of documents. So for example line one
PPCI000000040 is the starting number and PPCI000000044 is the ending number. Meaning that is a 4 page document. PPCI000000045 is starting and PPCI000000048 is ending, meaning that is a 3 page doc. Etc...Since we are loooking for missing documents I need to compare the endno to the controlnumber and return the range if the last endno is not one less than the next control number.
Ideally I need to see something like this:
PPCI000000040 - PPCI000000053
PPCI000000057 - PPCI000000102
PPCI000000115 - PPCI000000197
etc...
Let's forget about the variable range text prefix for now and just work with this data (I'm getting wayyyy to confused with the previous code) If possible please write this to apply to my sample dataset.
Please let me know if you need clarification, I tried to explain it the best way possible and provide enough dummy data.
--Create test table
create table gaps2
(controlnumber varchar(255)
,endno varchar(255))
Code removed per request.
October 7, 2010 at 10:13 am
craig-404139 (10/6/2010)
Please skip to post Post #1000624 for updated requirements.
It's actually post # 1000643. Also, if you click the post number, you will get a URL that you can then paste in so that it's clickable. Just be sure to enclose in in [ url ] and [ / url ] tags. (remove all spaces between the brackets!) Like this: http://www.sqlservercentral.com/Forums/FindPost1000643.aspx. Just quote my message to see how it's done.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
October 7, 2010 at 10:20 am
WayneS (10/7/2010)
craig-404139 (10/6/2010)
Please skip to post Post #1000624 for updated requirements.It's actually post # 1000643. Also, if you click the post number, you will get a URL that you can then paste in so that it's clickable. Just be sure to enclose in in [ url ] and [ / url ] tags. (remove all spaces between the brackets!) Like this: http://www.sqlservercentral.com/Forums/FindPost1000643.aspx. Just quote my message to see how it's done.
Thanks for the info. I edited my original post. This just isn't my morning.
October 7, 2010 at 10:54 am
Here's some code.
Note that I kept in the prefix checking... I need to strip it out anyway, so I just kept it in.
-- need to add some columns to the table.
ALTER TABLE gaps2 ADD Prefix varchar(255),
GroupNo int;
-- change the controlnumber to not null and put a primary key on it.
ALTER TABLE gaps2 ALTER COLUMN controlnumber varchar(255) NOT NULL;
ALTER TABLE gaps2 ADD PRIMARY KEY CLUSTERED (controlnumber);
-- declare and initialize some variables for the update statement.
declare @Prefix varchar(255),
@GroupNo int,
@ControlNumber varchar(255),
@LastNumber int;
set @GroupNo = 0;
set @Prefix = '';
set @ControlNumber = '';
set @LastNumber = 0;
-- This form of the UPDATE statement has some particular rules.
-- See Jeff Moden's article at http://www.sqlservercentral.com/articles/T-SQL/68467/
-- for a complete discussion of how this works, and all of the rules for utilizing it.
UPDATE t1
SET @GroupNo = GroupNo = CASE WHEN @Prefix <> LEFT(controlnumber, PATINDEX('%[0-9]%', controlnumber)-1) THEN 1
-- new prefix - start over
WHEN CONVERT(INT, SUBSTRING(controlnumber, PATINDEX('%[0-9]%', controlnumber), 255)) = @LastNumber + 1 THEN @GroupNo
-- continues the chain - use same group number
ELSE @GroupNo + 1 -- increment the group number
END,
@Prefix = Prefix = LEFT(controlnumber, PATINDEX('%[0-9]%', controlnumber)-1), -- store the prefix so we can use it later
@LastNumber = CONVERT(INT, SUBSTRING(endno, PATINDEX('%[0-9]%', endno), 255)), -- get what the last number is, to compare to the next number
@ControlNumber = controlnumber -- anchor column
FROM gaps2 t1 WITH (TABLOCKX) -- prevent others from using the table
OPTION (MAXDOP 1); -- prevent parallelism
SELECT Prefix, GroupNo, MIN(controlnumber), max(endno)
FROM gaps2
GROUP BY Prefix, GroupNo
ORDER BY Prefix, GroupNo;
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
October 7, 2010 at 11:13 am
Wow, impressive. Can you explain what the code is doing? I've never seen things like :
@GroupNo = GroupNo = case....
@Prefix = Prefix = LEFT.....
Looks like you are setting the variable, but why the extra words?
Also the update statement is just updating variables, correct? No actual table data? I'll review the link you posted as well.
I'll need to remove the table lock and option(maxdrop 1) options, is that going to create a problem?
This is a huge help man, I appreciate it.
****edit***
I see you added columns to the table, Can this be done without adding columns? I can't modify my production table(s). Also explains why the are like that. It's inserting into the newly created table. Can I just remove that bit of coding?
@GroupNo = GroupNo = case....
@Prefix = Prefix = LEF
October 7, 2010 at 11:18 am
craig-404139 (10/7/2010)
Wow, impressive. Can you explain what the code is doing? I've never seen things like :@GroupNo = GroupNo = case....
@Prefix = Prefix = LEFT.....
Looks like you are setting the variable, but why the extra words?
Also the update statement is just updating variables, correct? No actual table data? I'll review the link you posted as well.
I'll need to remove the table lock and option(maxdrop 1) options, is that going to create a problem?
This is a huge help man, I appreciate it.
Well, like I commented in the code:
-- This form of the UPDATE statement has some particular rules.
-- See Jeff Moden's article at http://www.sqlservercentral.com/articles/T-SQL/68467/
-- for a complete discussion of how this works, and all of the rules for utilizing it.
If you remove the tablockx, option (maxdop 1), the CLUSTERED index or the anchor column, then DON'T USE THIS!!!!!!! Those are REQUIRED for this to work properly. Definitely review that link.
The update statement is updating BOTH variables, and table data. It's doing the update in the order of the clustered index. By updating a value into a variable, I can utilize the variable in the next row.
FYI, this has two nicknames: "quirky update" and "pseudo cursor". Once you understand what is going on, you'll understand both of these.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
October 7, 2010 at 11:36 am
craig-404139 (10/7/2010)
I see you added columns to the table, Can this be done without adding columns? I can't modify my production table(s). Also explains why the are like that. It's inserting into the newly created table. Can I just remove that bit of coding?@GroupNo = GroupNo = case....
@Prefix = Prefix = LEF
Then just select the data into a #temp table, i.e. #gaps2, and change all the references of gaps2 to #gaps2.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
October 7, 2010 at 11:38 am
WayneS (10/7/2010)
craig-404139 (10/7/2010)
I see you added columns to the table, Can this be done without adding columns? I can't modify my production table(s). Also explains why the are like that. It's inserting into the newly created table. Can I just remove that bit of coding?@GroupNo = GroupNo = case....
@Prefix = Prefix = LEF
Then just select the data into a #temp table, i.e. #gaps2, and change all the references of gaps2 to #gaps2.
Will do. I am playing with it all now. Also reading that article you linked.
Viewing 15 posts - 16 through 30 (of 91 total)
You must be logged in to reply to this topic. Login to reply