December 15, 2005 at 2:23 pm
I'm pulling data from a legacy system to SQL Server. Let's say I have a table with one column named Manifest and it is populated from this data. In our legacy system, this is a sequential number. So I should have sequential numbers in my table.
I'm having a problem with missing data. To track it down, I want to display the Manifest numbers that are missing in the sequence. If the data in my column is, for example:
423704
423703
423702
423701
423699
423698
423697
423696
423695
Then I want the results of the query to be:
423700
Is this doable? TIA......Cindy
December 15, 2005 at 2:36 pm
from table a left outer join table b on a.key = b.key -1 where b.key is null
gets you last row BEFORE a break
reverse to get last row AFTER a break in sequence and merge these two to get empty ranges
December 15, 2005 at 2:46 pm
Create a number table with numbers 1 thru whatever
then perform a left join
Select *
From MyTable
right Join numbers on mytable.pk = number
where mytable.pk is null
and number < (select max(mytable.pk) from mytable)
This will create a number table with 1 millon sequential numbers
CREATE TABLE Numbers(Number INT NOT NULL PRIMARY KEY)
GO
INSERT Numbers
SELECT HundredThousand * 100000 + Tenthousand * 10000 + thousand * 1000 + Hundred * 100 + Ten * 10 + Unit + 1
FROM (
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9) Units (Unit)
CROSS JOIN
(
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9) Tens(Ten)
CROSS JOIN
(
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9) Hundreds(Hundred)
CROSS JOIN
(
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9) thousands(thousand)
CROSS JOIN
(
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9) Tenthousands(Tenthousand)
CROSS JOIN
(
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9) Hundredthousands(Hundredthousand)
GO
December 15, 2005 at 2:53 pm
Great idea. And thanks for the sql code.
December 16, 2005 at 5:44 am
This will create a number table with 1 millon sequential numbers
CREATE TABLE Numbers(Number INT NOT NULL PRIMARY KEY)
GO
INSERT Numbers
SELECT HundredThousand * 100000 + Tenthousand * 10000 + thousand * 1000 + Hundred * 100 + Ten * 10 + Unit + 1
FROM (
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9) Units (Unit)
CROSS JOIN
(
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9) Tens(Ten)
CROSS JOIN
(
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9) Hundreds(Hundred)
CROSS JOIN
(
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9) thousands(thousand)
CROSS JOIN
(
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9) Tenthousands(Tenthousand)
CROSS JOIN
(
SELECT 0 UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL
SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL
SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL
SELECT 9) Hundredthousands(Hundredthousand)
GO
Why that complicated?
SELECT TOP 1000000 Num = IDENTITY(INT, 1, 1)
INTO #Numbers
FROM Northwind..sysobjects a1
CROSS JOIN Northwind..sysobjects a2
CROSS JOIN Northwind..sysobjects a3
--
Frank Kalis
Microsoft SQL Server MVP
Webmaster: http://www.insidesql.org/blogs
My blog: http://www.insidesql.org/blogs/frankkalis/[/url]
December 16, 2005 at 9:02 am
Just for fun, here's another variation on the first answer:
SELECT
Key+1
FROM
MyTable
WHERE
--The next # is missing
Key+1 NOT IN (SELECT Key FROM MyTable)
AND
--We haven't reached the max of values
Key+1 < (SELECT Max(Key) FROM MyTable)
December 18, 2005 at 11:04 am
Ok, just for fun, the following code will report ranges of both present and missing ID's. Why do it this way? Because if you have a huge number of missing ID's in a given range (ie. ID's 500,000 through 700,000 are missing), who wants a list of over 200,000 ID's? Another advantage of doing it this way is you don't need to build a huge temporary numbers table if the MaxID in the table being examined is huge (picky ol' DBA's get really twisted when you run TempDB out of space creating a numbers table with a couple of hundred million entries in it ) and it run's almost as fast (nothing runs faster than comparing to a numbers table, though).
To summarize the code below...
Lemme know wat'cha think... sorry it'll be so hard to figure out what the code is doing ... my documentation skills aren't what they used to be...
--===== Setup for speed and to prevent blocking
SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
--=============================================================================
-- Create an experimental table to simulate the table being examined
--=============================================================================
--===== If the experimental temp table exists, drop it so we can do repeat runs
-- if we want to "play".
IF OBJECT_ID('TempDB..#yourtable') IS NOT NULL
DROP TABLE #yourtable
--===== Create the experimental temp table and populate with IDs on the fly
-- This always works because SYSCOLUMNS always has at least 256 entries
-- even in a new database and 256*256*256 > 1,000,000
SELECT TOP 1000000 ID = IDENTITY(INT, 1, 1)
INTO #yourtable
FROM dbo.SYSCOLUMNS sc1,
dbo.SYSCOLUMNS sc2,
dbo.SYSCOLUMNS sc3
--===== Like any good table, our experimental table needs a Primary Key
ALTER TABLE #yourtable
ADD PRIMARY KEY CLUSTERED (ID)
--===== Ok, we have an experimental table with a million IDs from 1 to 1,000,000.
-- Let's remove some of the rows and see if we can find if they're missing
-- This deletes a handful of individual rows and 3 very small ranges
DELETE #yourtable
WHERE ID IN (1,2,3,250,251,2000,4000,4002,4004,900001,900002,900003)
-- This deletes a "monster" range just to see how it's handled.
DELETE #yourtable
WHERE ID BETWEEN 500000 AND 700000
--===== At this point, the experiment is ready to rock
--=============================================================================
-- Create a report of all ranges of ID's, big or small, from 1 to the max
-- ID present in the table. Status of each range will be 'Missing' or
-- 'Present' and the MinID and MaxID of each range will also be reported.
-- This is done without loops and without a Tally (numbers) table. The
-- reason for doing it without a numbers table is because if the max ID
-- for the table is very large (hundreds of millions), it's not a good
-- idea to put a table that large in TempDB (although some do) and the
-- DBA's frequently deny table creation privs to make a working table in
-- the DB. The temp tables used below should stay fairly small unless
-- someone did something really stupid like deleting every other row.
-- AND, for this million row example, it only takes about 7 seconds to
-- run on my little ol' desktop PC. It should run MUCH faster on a server
-- quality machine.
--=============================================================================
--===== If the working temp tables exist, drop them
-- Conditionally drop the table that keeps track of "MinIDs" in a range
IF OBJECT_ID('TempDB..#MyMins') IS NOT NULL
DROP TABLE #MyMins
-- Conditionally drop the table that keeps track of "MaxIDs" in a range
IF OBJECT_ID('TempDB..#MyMaxes') IS NOT NULL
DROP TABLE #MyMaxes
-- Conditionally drop the table that holds the final report on ranges of IDs
IF OBJECT_ID('TempDB..#Report') IS NOT NULL
DROP TABLE #Report
--===== Find all of the "MinIDs" for ranges that are present and put them in a table
SELECT IDENTITY(INT,1,1) AS RowNum,
CAST(ID AS INT) AS ID --Cast is necessary because ID is an IDENTITY column
INTO #MyMins
FROM #yourtable
WHERE ID NOT IN (SELECT t1.ID
FROM #yourtable t1,
#yourtable t2
WHERE t1.ID=t2.ID+1)
ORDER BY ID --This is necessary so we can match up with the MaxIDs
--===== Find all of the "MaxIDs" for ranges that are present and put them in a table
SELECT IDENTITY(INT,1,1) AS RowNum,
CAST(ID AS INT) AS ID --Cast is necessary because ID is an IDENTITY column
INTO #MyMaxes
FROM #yourtable
WHERE ID NOT IN (SELECT t1.ID
FROM #yourtable t1,
#yourtable t2
WHERE t1.ID+1=t2.ID)
ORDER BY ID --This is necessary so we can match up with the MinIDs
--===== Insert all of the "Present" ranges into the report table using the Min and Max
-- tables to derive the ranges by matching on the RowNum of each table. This
-- step creates the Report table on the fly and the RowNum is very necessary for
-- deriving the missing ID's in the steps that follow.
SELECT IDENTITY(INT,1,1) AS RowNum,
m.ID AS MinID,
x.ID AS MaxID,
'Present' AS Status
INTO #Report
FROM #MyMins m,
#MyMaxes x
WHERE m.RowNum = x.RowNum
--===== Now, derive and insert all of the missing ranges of ID's save, perhaps, one.
INSERT INTO #Report (MinID,MaxID,Status)
SELECT r1.MaxID+1 AS MinID,
r2.MinID-1 AS MaxID,
'Missing' AS Status
FROM #Report r1,
#Report r2
WHERE r1.RowNum+1 = r2.RowNum
--===== If ID #1 is missing, we wouldn't have caught that so far. So let's check
-- now and add it's missing range if it's missing
IF (SELECT MIN(MinID) FROM #Report)>1
INSERT INTO #Report (MinID,MaxID,Status)
SELECT 1 AS MinID,
MIN(MinID)-1 AS MaxID,
'Missing' AS Status
FROM #Report
--===== All set... display the report (could be used as a result set, as well)
SELECT MinID,MaxID,Status
FROM #REPORT
ORDER BY MinID
Now, WOULD SOMEONE GIVE ME AN EXAMPLE AS TO WHY THIS EVER NEEDS TO BE DONE? I could see doing it maybe as a check for transmitted data but not much more. And the old saw about a legacy system never having a row with a sequential column deleted just doesn't sit quite right with me. I agree it's not a good idea to physically delete records but, it happens and no one should care about reusing the ID when there are so many to choose from. So, why would anyone need to do this (find missing ID's)? Surely you don't intend to reuse them
--Jeff Moden
Change is inevitable... Change for the better is not.
December 20, 2005 at 2:23 am
Still just for fun I think the following might be a bit simpler (I use Jeff's code to create #yourtable)...
IF OBJECT_ID('TempDB..#MyMins') IS NOT NULL
DROP TABLE #MyMins
-- Conditionally drop the table that keeps track of "MaxIDs" in a range
IF OBJECT_ID('TempDB..#MyMaxes') IS NOT NULL
DROP TABLE #MyMaxes
create table #MyMins(RowNum int identity(1,1), MinID int)
create table #MyMaxes(RowNum int identity(1,1), MaxID int)
insert #MyMins (MinID)
select t1.ID
from #yourtable t1 left join #yourtable t2
on t1.ID = t2.ID + 1
where t2.ID is null
order by t1.ID
insert #MyMaxes (MaxID)
select t1.ID
from #yourtable t1 left join #yourtable t2
on t1.ID = t2.ID - 1
where t2.ID is null
order by t1.ID
select t1.MinID as PresentFrom, t2.MaxID as PresentTo from #MyMins t1 inner join #MyMaxes t2 on
t1.RowNum = t2.RowNum
select isnull(t2.MaxID, 0) + 1 as MissingFrom, isnull(t1.MinID, 1000001) - 1 as MissingTo from #MyMins t1 full outer join #MyMaxes t2 on
t1.RowNum = t2.RowNum + 1
where t2.MaxID != 1000000
December 20, 2005 at 9:36 am
This is way too much fun.
I have an unnatural aversion to temp tables and an equally unnatural propensity for clever querying.
SELECT MinRange = (SELECT ISNULL(MAX(subA.TheKey),0)+1 FROM MyValues subA WHERE subA.TheKey < A.TheKey), MaxRange = TheKey - 1 FROM MyValues A WHERE A.TheKey - 1 NOT IN (SELECT TheKey FROM MyValues)
December 20, 2005 at 9:48 am
You're right...that is very clever. I'm glad you guys are having fun with this. Meanwhile, I used the "create a table of Numbers" approach and am quite pleased with the results.
Thanks for all your help.........cindy
December 20, 2005 at 5:18 pm
Very cool, jratwork... I have a similar un-natural aversion to correlated subqueries and I had my doubts when I saw the "<" in the correlated subquery... but I just ran this against the million row temp table I made... it ran in less than 3 seconds and correctly found all of the missing ranges. Like I said, Very cool.
There is, however, a fly in the ointment... if ID #1 is present, the first row returned is always
MinRange MaxRange
-------- --------
1 0
I added "the fix" to the last line of the code...
SELECT MinRange = (SELECT ISNULL(MAX(suba.ID),0)+1
FROM #yourtable suba
WHERE suba.ID < a.ID),
MaxRange = ID - 1
FROM #yourtable a
WHERE a.ID - 1 NOT IN (SELECT ID FROM #yourtable)
AND a.ID - 1 > 0
--Jeff Moden
Change is inevitable... Change for the better is not.
December 21, 2005 at 6:00 am
I had exactly the same doubts when I saw the statement, but I haven't found anything faster... I give up, you beat me... this time...
December 21, 2005 at 8:36 am
I am humbled by your veteran approval.
I should have caught that one off scenario but I had focused on dealing with a missing id of 1. Thank you.
Until next time.
JR
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply