February 21, 2014 at 7:22 am
I need to be able to identify breaks in a sequence so I can evaluate the data more correctly. In the sample I have given I need to be able to identify the break in sequence at 69397576, ideally I would set that as a D. My query also needs to recognize that the 3 sequences following 69397576 are sequential and would belong to that set. so the out come would look like this.
id file_name page_follow
693975631555557564_22222221114014810D
693975641555557564_22222221114014810F
693975651555557564_22222221114014810F
693975661555557564_22222221114014810F
693975671555557564_22222221114014810F
693975681555557564_22222221114014810F
693975691555557564_22222221114014810F
693975761555557564_22222221114014810D
693975771555557564_22222221114014810F
693975781555557564_22222221114014810F
693975791555557564_22222221114014810F
here is some test data.
create table test1 (id INT
, [file_name] VARCHAR(100)
, page_follow CHAR(1));
go
insert test1
values ('69397563','1555557564_22222221114014810','D')
,('69397564','1555557564_22222221114014810','F')
,('69397565','1555557564_22222221114014810','F')
,('69397566','1555557564_22222221114014810','F')
,('69397567','1555557564_22222221114014810','F')
,('69397568','1555557564_22222221114014810','F')
,('69397569','1555557564_22222221114014810','F')
,('69397576','1555557564_22222221114014810','F')
,('69397577','1555557564_22222221114014810','F')
,('69397578','1555557564_22222221114014810','F')
,('69397579','1555557564_22222221114014810','F');
Any advise would be greatly appreciated. I am looking for a set based solution preferably. Thanks!
February 21, 2014 at 7:39 am
Great job posting ddl and sample data. You stated the desired results but it is exactly the same as your sample data. What exactly are looking to do here?
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
February 21, 2014 at 7:46 am
Maybe something like this?
SELECT fileName
, RecID
, page_follow
, rn
, PrevID
, RecID-PrevID AS GapSize
FROM
(SELECT [file_name] AS fileName
, [id] AS RecID
, page_follow
, ROW_NUMBER() OVER (ORDER BY [id]) AS rn
, MAX([id]) OVER (PARTITION BY [file_name]
ORDER BY [id]
ROWS BETWEEN 1 PRECEDING
AND 1 PRECEDING) AS PrevID
FROM test1) x
WHERE x.RecID-x.PrevID>1;
If you do this a lot, you should definitely buy Itzik Ben-Gan's book MS SQL Server 2012 High-Performance T-SQL Using Window Functions. Kathi Kellenberger was right -- worth the read!
February 21, 2014 at 7:55 am
The difference between the sample and actual was a D for the id 69397576, which marks a break in the sequence.
February 21, 2014 at 7:58 am
This one is not so bad since you are looking for a break in an integer sequence.
Just outer join to the next record (id+1) and check for the null.
This will also call the last record a gap:
create table #test1 (id INT
, [file_name] VARCHAR(100)
, page_follow CHAR(1));
go
insert #test1
values ('69397563','1555557564_22222221114014810','D')
,('69397564','1555557564_22222221114014810','F')
,('69397565','1555557564_22222221114014810','F')
,('69397566','1555557564_22222221114014810','F')
,('69397567','1555557564_22222221114014810','F')
,('69397568','1555557564_22222221114014810','F')
,('69397569','1555557564_22222221114014810','F')
,('69397576','1555557564_22222221114014810','F')
,('69397577','1555557564_22222221114014810','F')
,('69397578','1555557564_22222221114014810','F')
,('69397579','1555557564_22222221114014810','F');
UPDATE t1
SET page_follow = 'D'
FROM #test1 t1
LEFT JOIN #test1 t2
ON t1.id+1 = t2.id
WHERE t2.id IS NULL
SELECT * FROM #test1
drop table #test1
replace the update with this to have the last row NOT be a gap:
UPDATE t1
SET page_follow = 'D'
FROM #test1 t1
LEFT JOIN #test1 t2
ON t1.id+1 = t2.id
CROSS APPLY (SELECT MAX(id) id FROM #test1) t3
WHERE t2.id IS NULL
AND t1.id<>t3.id
February 21, 2014 at 8:07 am
Here are a couple of options that you could try.
SELECT t.id,
t.[FILE_NAME],
CASE WHEN NOT EXISTS( SELECT 1 FROM test1 x WHERE t.id = x.id + 1)
THEN 'D' ELSE 'F' END page_follow
FROM test1 t
--Example of query
--SELECT *
--FROM test1 t
--WHERE NOT EXISTS(
--SELECT 1
--FROM test1 x
--WHERE t.id = x.id + 1)
UPDATE t SET
page_follow = 'D'
FROM test1 t
WHERE NOT EXISTS(
SELECT 1
FROM test1 x
WHERE t.id = x.id + 1)
SELECT *
FROM test1
February 21, 2014 at 8:14 am
That is perfect! Thank you so much for your help.
February 21, 2014 at 8:40 am
One more question, What if I wanted throw out, or mark the original set 69397563-69397569, which is the D plus all F until the sequence break? Thoughts?
February 21, 2014 at 8:54 am
That is usually solved like this:
WITH CTE AS(
SELECT id, id - ROW_NUMBER() OVER (ORDER BY id) grouper
FROM #test1
)
SELECT MIN(id) min_id, MAX(id) max_id
FROM CTE
GROUP BY grouper
ORDER BY min_id
I just noticed that you're on SQL 2012 and you might want to check the following article as well. https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-gaps-and-islands-in-sequences/
February 22, 2014 at 4:50 pm
I recommend a web search for: Itzik Ben-Gan Gaps Islands. There are some amazingly-efficient solutions for this now.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply