August 5, 2008 at 3:39 pm
Very cool article, my hat is off to you. It's been a long time, but I've used a technique like your recovered key space before. I had an initial problem with your first thing about the two phone numbers as I would have put a constraint on the field so that there could only be two records per employee, which would have complicated matters slightly.
We need to get Steve to add a Pingback because I'm definitely linking to this article from my blog.
Oh, and you might find the Four Yorkshiremen, if memory serves, is actually (technically) NOT Monty Python: it was actually first on At Last, The 1946 Show. However, that show was written by Cleese and Chapman with occasional appearances of other soon-to-be Pythons such as Idle and I think Palin. Unfortunately most of that series was lost, I think there are only five episodes out on a two DVD set. Even though the video is really bad, it's still worth watching. I was overjoyed to find them available.
-----
[font="Arial"]Knowledge is of two kinds. We know a subject ourselves or we know where we can find information upon it. --Samuel Johnson[/font]
August 5, 2008 at 4:11 pm
Wayne West (8/5/2008)
Very cool article, my hat is off to you.
Thanks, Wayne.
Oh, and you might find the Four Yorkshiremen, if memory serves, is actually (technically) NOT Monty Python: it was actually first on At Last, The 1946 Show. However, that show was written by Cleese and Chapman with occasional appearances of other soon-to-be Pythons such as Idle and I think Palin. Unfortunately most of that series was lost, I think there are only five episodes out on a two DVD set. Even though the video is really bad, it's still worth watching. I was overjoyed to find them available.
Huh, now that's another thing I learned today!
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 5, 2008 at 4:48 pm
Such a great article. I really like how you took the time to explain things. I came onto the computer scene when issues like that were still talked about as recent history, but not worried about. I never worried about every last bit anyway.
The only thing that was missing for me (because I wanted to try out one of your queries) was the sample data for the main table and you provided that in an earlier comment. So, now you know you helped out at least 2 people. 😉
Finally, I think humor is very important, even in technical articles. So, I wanted to give you public cudos for my favorite line in your article: "Now one byte may not seem like a lot today, but in the 70's an entire family of programmers could live in the space provided by a single byte."
August 5, 2008 at 5:04 pm
Thanks for the feedback JJ. I think that (appropiate) humor is an important part of work and life too. Glad you liked the joke!
😀
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 5, 2008 at 5:24 pm
Andy DBA (8/5/2008)
Also, the concept of storing a tally table would be a pretty hard sell. "So you're suggesting we have a permanent table of nothing but consectutive numbers? REALLY? Well, how high should we count? I say we use long ints for this table of yours and we'll take the storage cost out of your salary."
11,000 is my magic number... covers VARCHAR(8000) and a smidge more than 30 years of days for mortgages. For anything else in 2k5, a Cross Join CTE does well up to 121 million.
Anyway, as reluctant as I am to suggest this, (please don't hate me, Jeff) if one assumes duplicate rows are grouped together, a RBAR solution to this interesting problem is likely what would have been adopted back then. Of course you can forget about concurrency if you're RBAR grinding through GBs of data.
Well, maybe just a little... 😛
Please don't get me wrong. I totally understand and promote set based solutions over RBAR and have used tally tables to great advantage. It's just interesting to see hybrid solutions combining 70s constraints, goals, and techiques with modern ones. We certainly have it better than we used to, but we're doing a lot more now, too.
I absolutely agree... some of the best code I've ever seen has come about by folks "messing around" with code and hybrid solutions. For example, the fastest "missing sequence" code I've ever seen uses a couple of correlated sub-queries which is a form of RBAR... but the constraints it used make it all nasty fast. 🙂
--Jeff Moden
Change is inevitable... Change for the better is not.
August 5, 2008 at 6:12 pm
Jeff Moden (8/5/2008)
...For example, the fastest "missing sequence" code I've ever seen uses a couple of correlated sub-queries which is a form of RBAR... but the constraints it used make it all nasty fast. 🙂
I'm sure I've seen this floating around here somewhere, but can't find it atm. Link please?
Regards,
Jacob
August 5, 2008 at 6:46 pm
Jacob Luebbers (8/5/2008)
Jeff Moden (8/5/2008)
...For example, the fastest "missing sequence" code I've ever seen uses a couple of correlated sub-queries which is a form of RBAR... but the constraints it used make it all nasty fast. 🙂I'm sure I've seen this floating around here somewhere, but can't find it atm. Link please?
Regards,
Jacob
No link necessary... here's the test data... as usual, comments in the code explain what's going on...
--===== Setup for speed and to prevent blocking
SET NOCOUNT ON
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
--=============================================================================
-- Create an experimental table to simulate the table being examined
-- Again... 6 million rows...
--=============================================================================
--===== Create the experimental temp table and populate with Serial #'s on the fly
-- This works because SysColumns always has at least 4000 entries
-- even in a new database and 4000*4000 = 16,000,000
SELECT TOP 6000000 SerialNumber = IDENTITY(INT, 1, 1)
INTO #JbmTest
FROM Master.dbo.SysColumns sc1,
Master.dbo.SysColumns sc2
-- --===== Like any good table, our experimental table needs a Primary Key
ALTER TABLE #JbmTest
ADD PRIMARY KEY CLUSTERED (SerialNumber)
-- This deletes a "monster" range just to see how it's handled.
DELETE #JbmTest
WHERE SerialNumber BETWEEN 5000000 AND 5500000
-- This deletes every third row in the first 1000 rows
DELETE #JbmTest
WHERE SerialNumber %3 = 0
AND SerialNumber <= 1000
... and here's the little pearl that does the trick so nicely... if you think 10 seconds is a long time, remember that there's SIX MILLION rows in the test data. 😉
--=============================================================================
-- Test the code
--=============================================================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
--===== Calculated Gaps
SELECT GapStart = (SELECT ISNULL(MAX(b.SerialNumber),0)+1
FROM #JbmTest b
WHERE b.SerialNumber < a.SerialNumber),
GapEnd = SerialNumber - 1
FROM #JbmTest a
WHERE a.SerialNumber - 1 NOT IN (SELECT SerialNumber FROM #JbmTest)
AND a.SerialNumber - 1 > 0
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
--Jeff Moden
Change is inevitable... Change for the better is not.
August 5, 2008 at 7:48 pm
Nice - just ran a quick comparison (dual Xeon 2.8, 6GB RAM, SQL Server 2005 Ent. SP2) - your method vs Alex Kozac's original (http://msdn.microsoft.com/en-us/library/aa175780.aspx) vs Goce Smilevski's UDF-based improvement (http://www.sqlservercentral.com/articles/Advanced+Querying/anefficientsetbasedsolutionforislandsandgaps/1619/[/url]):
--...SNIP... Jeff's test table setup, changed to use a permanent instead of temp table
-- Goce's UDFs
create FUNCTION dbo.udfFindGapStarts()
RETURNS @res_tbl TABLE (
row_num int identity(1, 1) NOT NULL PRIMARY KEY,
gap_start int NOT NULL)
AS
BEGIN
INSERT INTO @res_tbl (gap_start)
SELECT SerialNumber + 1
FROM JbmTest AS g1
WHERE NOT EXISTS (SELECT SerialNumber FROM JbmTest AS g2
WHERE (g2.SerialNumber = g1.SerialNumber + 1))
ORDER BY SerialNumber + 1
OPTION (MERGE JOIN)
RETURN
END
create FUNCTION dbo.udfFindGapEnds()
RETURNS @res_tbl TABLE (
row_num int identity(0, 1) NOT NULL PRIMARY KEY,
gap_end int NOT NULL)
AS
BEGIN
INSERT INTO @res_tbl (gap_end)
SELECT SerialNumber - 1
FROM JbmTest AS g1
WHERE NOT EXISTS (SELECT SerialNumber FROM JbmTest AS g2
WHERE (g2.SerialNumber = g1.SerialNumber - 1))
ORDER BY SerialNumber - 1
OPTION (MERGE JOIN)
RETURN
END
--=============================================================================
-- Test the code
--=============================================================================
SET STATISTICS IO ON
SET STATISTICS TIME ON
-- GAPS - Jeff Moden
SELECT GapStart = (SELECT ISNULL(MAX(b.SerialNumber),0)+1
FROM JbmTest b
WHERE b.SerialNumber < a.SerialNumber),
GapEnd = SerialNumber - 1
FROM JbmTest a
WHERE a.SerialNumber - 1 NOT IN (SELECT SerialNumber FROM JbmTest)
AND a.SerialNumber - 1 > 0
-- GAPS - Goce Smilevski
SELECT
t1.gap_start, t2.gap_end
FROM
dbo.udfFindGapStarts() AS t1
INNER JOIN dbo.udfFindGapEnds() AS t2
ON (t2.row_num = t1.row_num)
OPTION
(MERGE JOIN)
-- GAPS - Alexander Kozak
SELECT
t1.col1 as startOfGroup,
MIN(t2.col1) AS endOfGroup
FROM
(SELECT
col1 = SerialNumber+1
FROM JbmTest tbl1
WHERE NOT EXISTS(SELECT * FROM JbmTest tbl2 WHERE tbl2.SerialNumber = tbl1.SerialNumber + 1)
AND SerialNumber <> (SELECT MAX(SerialNumber) FROM JbmTest)
) t1
INNER JOIN
(SELECT
col1 = SerialNumber-1
FROM JbmTest tbl1
WHERE NOT EXISTS (SELECT * FROM JbmTest tbl2 WHERE tbl1.SerialNumber = tbl2.SerialNumber + 1)
AND SerialNumber <> (SELECT MIN(SerialNumber) FROM JbmTest)
) t2 ON t1.col1 <= t2.col1
GROUP BY t1.col1
SET STATISTICS IO OFF
SET STATISTICS TIME OFF
Results:
Jeff Moden: 7 secs, 18743 logical reads
Goce Smilevski: 11 secs, 36822 logical reads
Alex Kozac: killed it after 20 mins, no result
Regards,
Jacob
August 5, 2008 at 8:06 pm
Very cool, Jacob! Thanks for taking the time to test it and other methods, as well! 🙂
I wish I could take the credit for the code, but I can't. I got the code off of some website about a million years ago and wish I could find it again so I could give the lad a hefty pat on the back.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 6, 2008 at 12:18 am
Great article.....
August 6, 2008 at 9:46 am
create table #tab (lname varchar(5), age smallint, sex char(1))
insert into #tab (lname, age, sex) select 'ABC', 24, 'M'
insert into #tab (lname, age, sex) select 'ABC', 24, 'M'
insert into #tab (lname, age, sex) select 'LMN', 27, 'M'
insert into #tab (lname, age, sex) select 'LMN', 27, 'M'
insert into #tab (lname, age, sex) select 'LMN', 27, 'M'
insert into #tab (lname, age, sex) select 'PQRS', 25, 'F'
insert into #tab (lname, age, sex) select 'XYZ', 24, 'M'
insert into #tab (lname, age, sex) select 'XYZ', 25, 'M'
select * from #tab
set rowcount 0
while exists(select lname, count(*) from #tab group by lname having count(*) > 1)
begin
set rowcount 1
delete from #tab where lname in (select lname from #tab group by lname having count(*) > 1)
set rowcount 0
end
select * from #tab
August 6, 2008 at 10:03 am
Bill; one of the rules is "No Loops". "WHILE..." is a loop.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
August 6, 2008 at 10:06 am
Admittedly, I didn't comb through the rules. I just threw together the code to perform the "impossible delete". 🙂
August 6, 2008 at 10:07 am
rbarryyoung (8/6/2008)
Bill; one of the rules is "No Loops". "WHILE..." is a loop.
Another [implicit] rule is that the solution should return the desired result. While I don't understand the purpose driving the original design problem, this "solution" fails to deliver correct results.
Andrew
--Andrew
August 6, 2008 at 10:11 am
You're right, sorry. I misread the article to mean ditch the dupes.
My bad, in a hurry while trying to do actual work. 🙂
Viewing 15 posts - 61 through 75 (of 156 total)
You must be logged in to reply to this topic. Login to reply