November 29, 2007 at 3:45 pm
I am trying to get a list of random names using something similar to the following query:
DECLARE @FirstNameCount int
SELECT @FirstNameCount = count(*) FROM SampleData..First_Names
SELECT TOP 1000
( SELECT FirstName
FROM SampleData..First_Names
WHERE RecNo = 1 + ABS(CHECKSUM(NEWID())) % @FirstNameCount
) AS FirstName
FROM sysobjects d1, sysobjects d2 -- (any large tables will do)
This works fine in SQL2005 but in SQL2000 I get the same name for all rows returned. The difference in the Execution plan is that SQL 2000 is doing a Stream Aggregate, Assert and Table spool.
StmtText ---------------------------------------------------------------
|--Top(1000)
|--Compute Scalar(DEFINE:([Expr1007]=[Expr1007]))
|--Nested Loops(Left Outer Join)
|--Nested Loops(Inner Join)
| |--Index Scan(OBJECT:([Paceart_Database].[dbo].[sysobjects].[ncsysobjects2] AS [d2]))
| |--Index Scan(OBJECT:([Paceart_Database].[dbo].[sysobjects].[ncsysobjects2] AS [d1]))
|--Table Spool
|--Assert(WHERE:(If ([Expr1006]>1) then 0 else NULL))
|--Stream Aggregate(DEFINE:([Expr1006]=Count(*), [Expr1007]=ANY([FIRST_NAMES].[FirstName])))
|--Clustered Index Seek(OBJECT:([SampleData].[dbo].[FIRST_NAMES].[PK_FIRST_NAMES]), SEEK:([FIRST_NAMES].[RecNo]=1+abs(checksum(newid()))%[@FirstNameCount]) ORDERED FORWARD)
Why is this? Can I do anything to force a plan like SQL 2005?
Thanks in advance for any help.
November 29, 2007 at 3:57 pm
I may be misunderstanding your request, but why not just use NEWID()
SELECT TOP 1000 FirstName
FROM SampleData..First_Names
ORDER BY NEWID()
______________________________________________________________________
Personal Motto: Why push the envelope when you can just open it?
If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.
Jason L. SelburgNovember 29, 2007 at 4:14 pm
Sorry. I tried to simply the post and may have confused it more.
This is part of a script that inserts data into one of our tables. So I am selecting other data along with the first name. The select for the first name is a subquery and needs to return a single random name from the FIRST_NAMES table but it has to change for each row in the outer select.
November 29, 2007 at 4:43 pm
So do you need this to run in 2000 AND 2005?
______________________________________________________________________
Personal Motto: Why push the envelope when you can just open it?
If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.
Jason L. SelburgNovember 29, 2007 at 4:57 pm
Ideally yes. It currently works on 2005. I do need it to work on SQL 2000 but it does not necessarily need to be the same code. Just the same results.
November 29, 2007 at 5:00 pm
Well if you're select a bunch of data and only the first name needs to be random for EACH record inserted. then change my query to read "TOP 1".
Otherwise, you'll need to be more specific on your needs.
______________________________________________________________________
Personal Motto: Why push the envelope when you can just open it?
If you follow the direction given HERE[/url] you'll likely increase the number and quality of responses you get to your question.
Jason L. SelburgNovember 30, 2007 at 4:36 pm
Here is some sample code and expected results. When I ran this sample code it worked fine in both SQL 2000 and 2005 but when I use my real tables I get a get a table spool on the first name and the same name returned for each row. I am not sure what is causing the different execution plan.
DROP TABLE [dbo].[FIRST_NAMES]
CREATE TABLE [dbo].[FIRST_NAMES] (
[RecNo] [int] NOT NULL,
[FirstName] [varchar] (100) NULL
) ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [PK_FirstNameRecNo] ON [dbo].[FIRST_NAMES]([RecNo]) ON [PRIMARY]
GO
INSERT INTO FIRST_NAMES (RecNo, FirstName) VALUES (1,'Jim')
INSERT INTO FIRST_NAMES (RecNo, FirstName) VALUES (2,'John')
INSERT INTO FIRST_NAMES (RecNo, FirstName) VALUES (3,'Jeff')
CREATE TABLE [dbo].[LAST_NAMES] (
[RecNo] [int] NOT NULL,
[LastName] [varchar] (100) NULL
) ON [PRIMARY]
GO
CREATE UNIQUE CLUSTERED INDEX [PK_LastNameRecNo] ON [dbo].[LAST_NAMES]([RecNo]) ON [PRIMARY]
GO
INSERT INTO LAST_NAMES (RecNo, LastName) VALUES (1,'Smith')
INSERT INTO LAST_NAMES (RecNo, LastName) VALUES (2,'Jone')
INSERT INTO LAST_NAMES (RecNo, LastName) VALUES (3,'Anderson')
DECLARE @FirstNameCount int
DECLARE @LastNameCount int
SELECT @FirstNameCount = count(*) FROM First_Names
SELECT @LastNameCount = count(*) FROM Last_Names
SELECT TOP 10
( SELECT FirstName
FROM First_Names
WHERE RecNo = 1 + ABS(CHECKSUM(NEWID())) % 3 --@FirstNameCount
) AS FirstName,
( SELECT LastName
FROM Last_Names
WHERE RecNo = 1 + ABS(CHECKSUM(NEWID())) % 3 --@LastNameCount
) AS LastName
FROM sysobjects d1, sysobjects d2
Results:
FirstName LastName
----------- ---------
Jeff Anderson
John Anderson
Jim Jone
John Jone
Jeff Smith
John Anderson
Jim Jone
John Smith
John Anderson
Jim Anderson
Incorrect Result:
FirstName LastName
----------- ---------
Jeff Anderson
Jeff Anderson
Jeff Anderson
Jeff Anderson
Jeff Anderson
Jeff Anderson
Jeff Anderson
Jeff Anderson
Jeff Anderson
Jeff Anderson
November 30, 2007 at 8:01 pm
Try this instead...
--DROP TABLE dbo.FirstName,dbo.LastName
--===== Create a couple of test tables
CREATE TABLE dbo.FirstName
(
FirstName VARCHAR(100) NOT NULL,
CONSTRAINT PK_FirstName_FirstName
PRIMARY KEY CLUSTERED (FirstName)
)
CREATE TABLE dbo.LastName
(
LastName VARCHAR(100) NOT NULL,
CONSTRAINT PK_LastName_LastName
PRIMARY KEY CLUSTERED (LastName)
)
--===== Populate the test tables with test data
INSERT INTO dbo.FirstName
(FirstName)
SELECT 'Jim' UNION ALL
SELECT 'Bob' UNION ALL
SELECT 'John' UNION ALL
SELECT 'Joe' UNION ALL
SELECT 'Josh' UNION ALL
SELECT 'Matt' UNION ALL
SELECT 'Serqiy' UNION ALL
SELECT 'Steve' UNION ALL
SELECT 'Tony'
INSERT INTO dbo.LastName
(LastName)
SELECT 'Smith' UNION ALL
SELECT 'Jones' UNION ALL
SELECT 'Moden' UNION ALL
SELECT 'Anderson' UNION ALL
SELECT 'Miller' UNION ALL
SELECT 'Macy' UNION ALL
SELECT 'Bobson'
--===== This creates the randomized list of 10 first and last names
-- Run this as often as you like... each run will be different
SELECT TOP 10
fn.FirstName, ln.LastName
FROM dbo.FirstName fn
CROSS JOIN
dbo.LastName ln
ORDER BY NEWID()
--Jeff Moden
Change is inevitable... Change for the better is not.
December 3, 2007 at 9:48 am
Thanks for the feedback.
That would work but the performance is terrible (I killed it after 10 minutes). A cross join between my first names and last names produces over 450,000,000 records which then have to be sorted.
The main question I had with the post is "Why am I getting the different execution plans on SQL 2000?" It is processing the subquery once and returning the same value for each row in the outer query on SQL 2000 but in 2005 it correctly executes the subquery for each row in the outer query.
SQL 2005 plan:
StmtText
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|--Top(TOP EXPRESSION:((10)))
|--Compute Scalar(DEFINE:([SampleData].[dbo].[LAST_NAMES].[LastName]=[SampleData].[dbo].[LAST_NAMES].[LastName]))
|--Nested Loops(Left Outer Join)
|--Nested Loops(Inner Join)
| |--Concatenation
| | |--Index Scan(OBJECT:([mssqlsystemresource].[sys].[sysobjrdb].[ncl02f040]))
| | |--Filter(WHERE:(has_access('CO',[master].[sys].[sysschobjs].[id])=(1)))
| | |--Clustered Index Scan(OBJECT:([master].[sys].[sysschobjs].[clst]), WHERE:([master].[sys].[sysschobjs].[nsclass]=(0) AND [master].[sys].[sysschobjs].[pclass]=(1)))
| |--Row Count Spool
| |--Concatenation
| |--Filter(WHERE:(has_access('CO',[master].[sys].[sysschobjs].[id])=(1)))
| | |--Clustered Index Scan(OBJECT:([master].[sys].[sysschobjs].[clst]), WHERE:([master].[sys].[sysschobjs].[nsclass]=(0) AND [master].[sys].[sysschobjs].[pclass]=(1)))
| |--Index Scan(OBJECT:([mssqlsystemresource].[sys].[sysobjrdb].[ncl02f040]))
|--Clustered Index Seek(OBJECT:([SampleData].[dbo].[LAST_NAMES].[PK_LastNameRecno]), SEEK:([SampleData].[dbo].[LAST_NAMES].[RecNo]=(1)+abs(checksum(newid()))%[@LastNameCount]) ORDERED FORWARD)
SQL 2000 Plan:
|--Top(10)
|--Compute Scalar(DEFINE:([Expr1007]=[Expr1007]))
|--Nested Loops(Left Outer Join)
|--Nested Loops(Inner Join)
| |--Index Scan(OBJECT:([Paceart_Database].[dbo].[sysobjects].[ncsysobjects2] AS [d1]))
| |--Row Count Spool
| |--Index Scan(OBJECT:([Paceart_Database].[dbo].[sysobjects].[ncsysobjects2] AS [d2]))
|--Table Spool
|--Assert(WHERE:(If ([Expr1006]>1) then 0 else NULL))
|--Stream Aggregate(DEFINE:([Expr1006]=Count(*), [Expr1007]=ANY([LAST_NAMES].[LastName])))
|--Clustered Index Seek(OBJECT:([SampleData].[dbo].[LAST_NAMES].[PK_LAST_NAMES]), SEEK:([LAST_NAMES].[RecNo]=1+abs(checksum(newid()))%[@LastNameCount]) ORDERED FORWARD)
December 3, 2007 at 11:14 am
Ther query appears to work fine in SQL 2000 until the Last_Names table get more than about 10,000 rows. Then the optimizer changes the execution plan and the query no longer returns the same results.
December 3, 2007 at 1:09 pm
Try this instead - no cross join. On my machine - 1M records inserted finish in 10 seconds.
declare @g datetime
select @g=getdate()
drop table randomFullname
drop table firstname
drop table lastName
drop table fullRandname
create table randomFullname (rid int identity(1,1) Primary Key, fnID int,lnid int)
create table FirstName (fnID int identity(1,1) Primary Key, FName varchar(100))
create table lastName (lnID int identity(1,1) Primary Key, lName varchar(100))
insert Firstname (fname)
select 'Matt' union all
select 'Jeff' union all
select 'Jason' union all
select 'Mark' union all
select 'Andrew' union all
select 'Bob' union all
select 'Joe' union all
select 'Steve' union all
select 'Kelly' union all
select 'Beverly' union all
select 'Janet' union all
select 'Angela' union all
select 'Brandie' union all
select 'Josephine'
insert Lastname (lname)
select 'Miller' union all
select 'Moden' union all
select 'Smith' union all
select 'Jones' union all
select 'Tarvin' union all
select 'ODoul' union all
select 'Dupont' union all
select 'Kelley' union all
select 'Taylor' union all
select 'Barton'
declare @fncount int
declare @lncount int
select @fncount=count(*) from firstname
select @lncount=count(*) from lastname
insert randomfullname (fnid,lnid)
select top 1000000
cast(rand(checksum(newid()))*(@fncount-1) as int)+1,
cast(rand(checksum(newid()))*(@lncount-1) as int)+1
from Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2
create index ix_rfn1 on randomFullname(fnid,lnid)
select datediff(ms, @g, getdate())
select rid,lname,fname
into fullRandname
from randomFullname r
inner join firstname fn on r.fnid=fn.fnid
inner join lastname ln on r.lnid=ln.lnid
--just to know how long it took
select datediff(ms, @g, getdate())
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 3, 2007 at 2:52 pm
Thanks Matt.
That will work.
I would still like to understand why the optimizer chooses to do a table spool on my query and change how the query functions when the Last Names source table exceeds a certain row count.
December 3, 2007 at 3:11 pm
As I recall, table spools occur when temporary operations get to a point where the operation happening in mem is spooled to disk so that it doesn't continuously need to be re-evaluated. In your case - it's likely that that cartesian product got past a certain size, and the optimizer thought it would be the best way.
Every time I've seen them - they were a sign that something wasn't joined.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
December 3, 2007 at 5:20 pm
Heh... You weren't supposed to remove the TOP... thought you only wanted ten random names at a time... and I sure didn't know you had a list of over 20,000 names... thought you were setting up some sort of small test.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 4, 2007 at 11:26 am
Sorry Jeff, I do appreciate your feedback. Sometimes trying to simplify the post can confuse the issue. I am taking data from US census on names and street names and randomly piecing it together to get realistic sample data for some performance testing. It all worked great on SQL 2005 but the SQL 2000 optimizer kept getting in my way.
Thanks again for your help.
Mark
Viewing 15 posts - 1 through 15 (of 18 total)
You must be logged in to reply to this topic. Login to reply