April 18, 2010 at 4:14 pm
Stefan_G (4/18/2010)
What is your recovery model ?
Right now, I'm testing with the FULL recovery model. I'll follow that up with the SIMPLE model but most folks don't run SIMPLE recovery on databases they actually care about.
I'll post the DB creation statement with all the options as soon as I'm done testing.
Speaking of testing... the log file on the first run settled in right after my previous post... CPU has dropped to near zero and the hard disk is going nuts... I believe it's in the middle of a big ol' implicit "COMMIT". 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
April 18, 2010 at 4:29 pm
Jeff Moden (4/18/2010)
Stefan_G (4/18/2010)
What is your recovery model ?Right now, I'm testing with the FULL recovery model.
But if you are running with FULL, SELECT INTO is fully logged, so inserting a billion rows should create lots of log regardless of the method used to generate the data.
Or am I missing something here ?
I'll follow that up with the SIMPLE model but most folks don't run SIMPLE recovery on databases they actually care about.
Well, I work mostly in data warehousing, and we almost always run our databases in SIMPLE mode. We have no need for point in time recovery and we process very large amounts of data each night. I guess I am not like most folks 😉
April 18, 2010 at 5:01 pm
Stefan_G (4/18/2010)
What is your recovery model ?
I have a sample of what I ran here[/url].
I used a triple cross join to create the results. I didn't post that query back when I did my testing. My recovery model was simple. The SSC article I referenced in that post is here[/url].
Granted the post is not dedicated to this triple cross join and it's results, it was the baseline for the post.
The triple join only created a single column numbers table.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
April 18, 2010 at 5:22 pm
I used a triple cross join to create the results. I didn't post that query back when I did my testing. My recovery model was simple.
Well, on my machine I can run this query without no log usage at all (simple mode):
select n=cast(n as int)
into temp
from (
select top (1000000000) n=row_number() over(order by (select null))
from sys.system_columns c1, sys.system_columns c2, sys.system_columns c3
) t
Doesn't this query work for you ?
Maybe there was something special with the query you ran when you were writing that article.
April 18, 2010 at 5:43 pm
BWAA-HAAA.... I forgot to make room. 😛
Msg 9002, Level 17, State 4, Line 3
The transaction log for database 'TestLogGrowth' is full. To find out why space in the log cannot be reused, see the log_reuse_wait_desc column in sys.databases
No wonder it didn't make it to 40 gig... it was doing a ROLLBACK. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
April 18, 2010 at 6:45 pm
Stefan_G (4/18/2010)
I used a triple cross join to create the results. I didn't post that query back when I did my testing. My recovery model was simple.
Well, on my machine I can run this query without no log usage at all (simple mode):
select n=cast(n as int)
into temp
from (
select top (1000000000) n=row_number() over(order by (select null))
from sys.system_columns c1, sys.system_columns c2, sys.system_columns c3
) t
Doesn't this query work for you ?
Maybe there was something special with the query you ran when you were writing that article.
The query is:
select top (1000000000) number=identity(int,1,1)
Into Numbers
from sys.system_columns c1, sys.system_columns c2, sys.system_columns c3
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
April 18, 2010 at 6:48 pm
CirquedeSQLeil (4/18/2010)
Stefan_G (4/18/2010)
I used a triple cross join to create the results. I didn't post that query back when I did my testing. My recovery model was simple.
Well, on my machine I can run this query without no log usage at all (simple mode):
select n=cast(n as int)
into temp
from (
select top (1000000000) n=row_number() over(order by (select null))
from sys.system_columns c1, sys.system_columns c2, sys.system_columns c3
) t
Doesn't this query work for you ?
Maybe there was something special with the query you ran when you were writing that article.
The query is:
select top (1000000000) number=identity(int,1,1)
Into Numbers
from sys.system_columns c1, sys.system_columns c2, sys.system_columns c3
Heh... and THAT would be the problem. IE. IT DEPENDS!
I'll be back in a couple of minutes with my findings...
--Jeff Moden
Change is inevitable... Change for the better is not.
April 18, 2010 at 7:14 pm
With a bit of egg on my face (a whole omelette, actually), it would appear that IDENTITY is, in fact, the culprit and not the 3 way CROSS JOINs. Here's the test DB I used...
--===== Created new database for testing
USE [master]
GO
DROP DATABASE TestLogGrowth --Be real careful with this line
GO
/****** Object: Database [TestLogGrowth] Script Date: 04/18/2010 16:46:01 ******/
CREATE DATABASE [TestLogGrowth] ON PRIMARY
( NAME = N'TestLogGrowth', FILENAME = N'D:\SQL_Data\TestLogGrowth.mdf' , SIZE = 102400KB , MAXSIZE = UNLIMITED, FILEGROWTH = 102400KB )
LOG ON
( NAME = N'TestLogGrowth_log', FILENAME = N'D:\SQL_Log\TestLogGrowth_log.ldf' , SIZE = 102400KB , MAXSIZE = 2048GB , FILEGROWTH = 102400KB )
GO
EXEC dbo.sp_dbcmptlevel @dbname=N'TestLogGrowth', @new_cmptlevel=90
GO
IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [TestLogGrowth].[dbo].[sp_fulltext_database] @action = 'disable'
end
ALTER DATABASE [TestLogGrowth] SET ANSI_NULL_DEFAULT OFF
ALTER DATABASE [TestLogGrowth] SET ANSI_NULLS OFF
ALTER DATABASE [TestLogGrowth] SET ANSI_PADDING OFF
ALTER DATABASE [TestLogGrowth] SET ANSI_WARNINGS OFF
ALTER DATABASE [TestLogGrowth] SET ARITHABORT OFF
ALTER DATABASE [TestLogGrowth] SET AUTO_CLOSE OFF
ALTER DATABASE [TestLogGrowth] SET AUTO_CREATE_STATISTICS ON
ALTER DATABASE [TestLogGrowth] SET AUTO_SHRINK OFF
ALTER DATABASE [TestLogGrowth] SET AUTO_UPDATE_STATISTICS ON
ALTER DATABASE [TestLogGrowth] SET CURSOR_CLOSE_ON_COMMIT OFF
ALTER DATABASE [TestLogGrowth] SET CURSOR_DEFAULT GLOBAL
ALTER DATABASE [TestLogGrowth] SET CONCAT_NULL_YIELDS_NULL OFF
ALTER DATABASE [TestLogGrowth] SET NUMERIC_ROUNDABORT OFF
ALTER DATABASE [TestLogGrowth] SET QUOTED_IDENTIFIER OFF
ALTER DATABASE [TestLogGrowth] SET RECURSIVE_TRIGGERS OFF
ALTER DATABASE [TestLogGrowth] SET ENABLE_BROKER
ALTER DATABASE [TestLogGrowth] SET AUTO_UPDATE_STATISTICS_ASYNC OFF
ALTER DATABASE [TestLogGrowth] SET DATE_CORRELATION_OPTIMIZATION OFF
ALTER DATABASE [TestLogGrowth] SET TRUSTWORTHY OFF
ALTER DATABASE [TestLogGrowth] SET ALLOW_SNAPSHOT_ISOLATION OFF
ALTER DATABASE [TestLogGrowth] SET PARAMETERIZATION SIMPLE
ALTER DATABASE [TestLogGrowth] SET READ_WRITE
ALTER DATABASE [TestLogGrowth] SET RECOVERY SIMPLE --LOOKEE HERE!!!!
ALTER DATABASE [TestLogGrowth] SET MULTI_USER
ALTER DATABASE [TestLogGrowth] SET PAGE_VERIFY CHECKSUM
ALTER DATABASE [TestLogGrowth] SET DB_CHAINING OFF
USE [TestLogGrowth]
GO
Here's the 3 way CROSS JOIN IDENTITY code... the comments tell the story...
--===== This grows the log to over 40GB in full recovery mode.
-- Even when using just a two way Cross Join, it still causes
-- explosive growth wether in the FULL or SIMPLE recovery modes
-- although the "damage" is a lot more limited with a two way
-- Cross Join because of the limited row count.
-- The importance here is that a lot of people still have SQL
-- server 2000 so they need to be really careful when using this
-- method. The work around is, of course, to have a permanent Tally
-- table in SQL Server 2000 or less. Since it appears that IDENTITY
-- is the culprit here, the only work around would appear to be one of
-- the old fashioned "powers of 10" Cross Joins in a FROM clause.
USE TestLogGrowth
SELECT TOP (1000000000)
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM sys.System_Columns c1, sys.System_Columns c2, sys.System_Columns c3
GO
And here's the 3 way CROSS JOIN ROW_NUMBER code. Again, the comments tell the story...
--===== This method causes no log growth anywhere when in the SIMPLE recovery
-- mode and will only grow the ldf file as much as it grows the mdf file
-- when in the FULL recovery mode. Of course, this method just isn't
-- available to SQL Server 2000 users.
SELECT TOP (1000000000)
CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) AS N
INTO dbo.Tally
FROM sys.System_Columns c1, sys.System_Columns c2, sys.System_Columns c3
GO
--Jeff Moden
Change is inevitable... Change for the better is not.
April 18, 2010 at 8:23 pm
Well, there we have it. Thanks for pointing it all out Stefan and Jeff for testing.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
April 18, 2010 at 8:43 pm
As a glimmer of hope for the folks still running SQL Server 2000, the following doesn't cause any unnecessary log growth, either...
DECLARE @max-2 INT
SET @max-2 = 100000000
SELECT t.N
INTO dbo.BigTally
FROM (
SELECT t2.N*10000 + t1.N + 1 AS N
FROM dbo.Tally t1,
dbo.Tally t2
WHERE t2.N <= @max-2/10000
AND t1.N < 10000
) t
WHERE t.N <= @max-2
The Tally table that I use (as of a couple of months ago) has values from 0 to 11,000. The limit of 10,000 in the above code is to make the math a bit easier for a maximum of 100,000,000. Considering that SQL Server 2000 doesn't have VARCHAR(MAX), I'm hoping that's enough scalability for most folks. Of course, a 0 to 100,000 row Tally table would allow for number generation up to 10,000,000,000 which is more than the INT datatype can handle.
Here's the code that proves the code above works correctly... it does take a while to run on 100,000,000 rows, though...
--===== Check for the obvious
SELECT MIN(N),MAX(N),COUNT(*) FROM dbo.BigTally
--===== Check for dupes
SELECT N, COUNT(*)
FROM dbo.BigTally
GROUP BY N
HAVING COUNT(*) > 1
--Jeff Moden
Change is inevitable... Change for the better is not.
April 18, 2010 at 9:39 pm
Jeff Moden (4/18/2010)
With a bit of egg on my face (a whole omelette, actually), it would appear that IDENTITY is, in fact, the culprit and not the 3 way CROSS JOINs. Here's the test DB I used...
--===== Created new database for testing
USE [master]
GO
DROP DATABASE TestLogGrowth --Be real careful with this line
GO
/****** Object: Database [TestLogGrowth] Script Date: 04/18/2010 16:46:01 ******/
CREATE DATABASE [TestLogGrowth] ON PRIMARY
( NAME = N'TestLogGrowth', FILENAME = N'D:\SQL_Data\TestLogGrowth.mdf' , SIZE = 102400KB , MAXSIZE = UNLIMITED, FILEGROWTH = 102400KB )
LOG ON
( NAME = N'TestLogGrowth_log', FILENAME = N'D:\SQL_Log\TestLogGrowth_log.ldf' , SIZE = 102400KB , MAXSIZE = 2048GB , FILEGROWTH = 102400KB )
GO
EXEC dbo.sp_dbcmptlevel @dbname=N'TestLogGrowth', @new_cmptlevel=90
GO
IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [TestLogGrowth].[dbo].[sp_fulltext_database] @action = 'disable'
end
ALTER DATABASE [TestLogGrowth] SET ANSI_NULL_DEFAULT OFF
ALTER DATABASE [TestLogGrowth] SET ANSI_NULLS OFF
ALTER DATABASE [TestLogGrowth] SET ANSI_PADDING OFF
ALTER DATABASE [TestLogGrowth] SET ANSI_WARNINGS OFF
ALTER DATABASE [TestLogGrowth] SET ARITHABORT OFF
ALTER DATABASE [TestLogGrowth] SET AUTO_CLOSE OFF
ALTER DATABASE [TestLogGrowth] SET AUTO_CREATE_STATISTICS ON
ALTER DATABASE [TestLogGrowth] SET AUTO_SHRINK OFF
ALTER DATABASE [TestLogGrowth] SET AUTO_UPDATE_STATISTICS ON
ALTER DATABASE [TestLogGrowth] SET CURSOR_CLOSE_ON_COMMIT OFF
ALTER DATABASE [TestLogGrowth] SET CURSOR_DEFAULT GLOBAL
ALTER DATABASE [TestLogGrowth] SET CONCAT_NULL_YIELDS_NULL OFF
ALTER DATABASE [TestLogGrowth] SET NUMERIC_ROUNDABORT OFF
ALTER DATABASE [TestLogGrowth] SET QUOTED_IDENTIFIER OFF
ALTER DATABASE [TestLogGrowth] SET RECURSIVE_TRIGGERS OFF
ALTER DATABASE [TestLogGrowth] SET ENABLE_BROKER
ALTER DATABASE [TestLogGrowth] SET AUTO_UPDATE_STATISTICS_ASYNC OFF
ALTER DATABASE [TestLogGrowth] SET DATE_CORRELATION_OPTIMIZATION OFF
ALTER DATABASE [TestLogGrowth] SET TRUSTWORTHY OFF
ALTER DATABASE [TestLogGrowth] SET ALLOW_SNAPSHOT_ISOLATION OFF
ALTER DATABASE [TestLogGrowth] SET PARAMETERIZATION SIMPLE
ALTER DATABASE [TestLogGrowth] SET READ_WRITE
ALTER DATABASE [TestLogGrowth] SET RECOVERY SIMPLE --LOOKEE HERE!!!!
ALTER DATABASE [TestLogGrowth] SET MULTI_USER
ALTER DATABASE [TestLogGrowth] SET PAGE_VERIFY CHECKSUM
ALTER DATABASE [TestLogGrowth] SET DB_CHAINING OFF
USE [TestLogGrowth]
GO
Here's the 3 way CROSS JOIN IDENTITY code... the comments tell the story...
--===== This grows the log to over 40GB in full recovery mode.
-- Even when using just a two way Cross Join, it still causes
-- explosive growth wether in the FULL or SIMPLE recovery modes
-- although the "damage" is a lot more limited with a two way
-- Cross Join because of the limited row count.
-- The importance here is that a lot of people still have SQL
-- server 2000 so they need to be really careful when using this
-- method. The work around is, of course, to have a permanent Tally
-- table in SQL Server 2000 or less. Since it appears that IDENTITY
-- is the culprit here, the only work around would appear to be one of
-- the old fashioned "powers of 10" Cross Joins in a FROM clause.
USE TestLogGrowth
SELECT TOP (1000000000)
IDENTITY(INT,1,1) AS N
INTO dbo.Tally
FROM sys.System_Columns c1, sys.System_Columns c2, sys.System_Columns c3
GO
And here's the 3 way CROSS JOIN ROW_NUMBER code. Again, the comments tell the story...
--===== This method causes no log growth anywhere when in the SIMPLE recovery
-- mode and will only grow the ldf file as much as it grows the mdf file
-- when in the FULL recovery mode. Of course, this method just isn't
-- available to SQL Server 2000 users.
SELECT TOP (1000000000)
CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT) AS N
INTO dbo.Tally
FROM sys.System_Columns c1, sys.System_Columns c2, sys.System_Columns c3
GO
One additional piece: in order for the "no growth" to happen, you also need to make sure to use the
ORDER BY (SELECT NULL).
It took me some time to see what was different between my case and Stefan's and that was it. Using a phycial column in the ORDER BY, (even the UCI column) would force the materialization.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
April 19, 2010 at 2:50 am
Jeff Moden (4/18/2010)
With a bit of egg on my face (a whole omelette, actually), it would appear that IDENTITY is, in fact, the culprit and not the 3 way CROSS JOINs.
One additional piece: in order for the "no growth" to happen, you also need to make sure to use the
ORDER BY (SELECT NULL).
Excellent! Now we know what the problem was. We also know that for SQL 2005 and above we should use something like this to generate numbers.
I have already created the following function in my Common database:
-- Generate numbers between @start and @end
create function [dbo].[GetRange](@start int, @end int) returns table as
return
select top (@end-@start+1) (row_number() over(order by (select null)))-1+@start as n
from sys.system_columns c1
cross join sys.system_columns c2
cross join sys.system_columns c3
That function can be used as a dynamic Tally table with excellent performance.
It easily covers the whole range of an int.
Thank you Jeff and Matt for helping to clear this up.
/SG
April 19, 2010 at 4:57 am
Jeff Moden (4/18/2010)
Right now, I'm testing with the FULL recovery model. I'll follow that up with the SIMPLE model but most folks don't run SIMPLE recovery on databases they actually care about.
This has got to be one of those very few times I disagree with Jeff.
There are a lot of cases where simple recovery model is appropriate. Almost every case where a database is effectively read only and is not manipulated regularly by a DBA should have simple recovery model. That includes all cases where the only updates are periodic upgrades to the data shipped by the original provider (which is a lot of databases; although that field employs a tiny minority of DBAs and other DB people it provides a lot of databases).
In addition, some apps which ship with MSDE or SQLS Express embedded use simple recovery model (including some which probably should be using FULL, but the supplier doesn't provide the user with access to do backups let alone to change the recovery model). So although the user may actually care about the data he is forced to use the simple recovery model (or break the terms of his license and hack into the thing).
Tom
April 19, 2010 at 7:57 am
I personally prefer this version (modified code from previous post) as it has zero IO and is quicker on my machine by 13 seconds for 1,000,000,000 rows.
create function [dbo].[GetRange](@start int, @end int) returns table as
return
select top (@end-@start+1) (row_number() over(order by (select null)))-1+@start as n
FROM
(SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0) A(A) ,
(SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0) B(A) ,
(SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0) C(A) ,
(SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0) D(A) ,
(SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0) E(A) ,
(SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0) F(A) ,
(SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0) G(A) ,
(SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0) H(A) ,
(SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0) I(A) ,
(SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0) J(A) ,
(SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0 union all SELECT 0) K(A)
MM
select geometry::STGeomFromWKB(0x0106000000020000000103000000010000000B0000001000000000000840000000000000003DD8CCCCCCCCCC0840000000000000003DD8CCCCCCCCCC08408014AE47E17AFC3F040000000000104000CDCCCCCCCCEC3F9C999999999913408014AE47E17AFC3F9C99999999991340000000000000003D0000000000001440000000000000003D000000000000144000000000000000400400000000001040000000000000F03F100000000000084000000000000000401000000000000840000000000000003D0103000000010000000B000000000000000000143D000000000000003D009E99999999B93F000000000000003D009E99999999B93F8014AE47E17AFC3F400000000000F03F00CDCCCCCCCCEC3FA06666666666FE3F8014AE47E17AFC3FA06666666666FE3F000000000000003D1800000000000040000000000000003D18000000000000400000000000000040400000000000F03F000000000000F03F000000000000143D0000000000000040000000000000143D000000000000003D, 0);
April 19, 2010 at 8:18 am
Tom.Thomson (4/19/2010)
Jeff Moden (4/18/2010)
Right now, I'm testing with the FULL recovery model. I'll follow that up with the SIMPLE model but most folks don't run SIMPLE recovery on databases they actually care about.This has got to be one of those very few times I disagree with Jeff.
There are a lot of cases where simple recovery model is appropriate. Almost every case where a database is effectively read only and is not manipulated regularly by a DBA should have simple recovery model. That includes all cases where the only updates are periodic upgrades to the data shipped by the original provider (which is a lot of databases; although that field employs a tiny minority of DBAs and other DB people it provides a lot of databases).
In addition, some apps which ship with MSDE or SQLS Express embedded use simple recovery model (including some which probably should be using FULL, but the supplier doesn't provide the user with access to do backups let alone to change the recovery model). So although the user may actually care about the data he is forced to use the simple recovery model (or break the terms of his license and hack into the thing).
I didn't mean for it to sound harsh on my part. Let me ask though simply because I lead a life sheltered by large batch jobs where only staging tables are in a DB with the SIMPLE recovery mode... would you say that most production databases that folks build are in the SIMPLE recovery mode or BULK INSERT/ FULL recovery mode?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 331 through 345 (of 363 total)
You must be logged in to reply to this topic. Login to reply