January 12, 2015 at 7:30 pm
I am trying to load data in batches on 10 K. The select statement at the bottom is taking about 60 % of the cost, is there better way to do this. I should be able to control how many records i want in the table.
DECLARE @BatchSize INT = 10000
WHILE 1 = 1
BEGIN
INSERT INTO dbo.PtActs
SELECT TOP(@BatchSize) NEWID( ) ,
a.HlFK ,
CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,
dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,
--a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,
b.DDate ,
a.RDate ,
a.LastName ,
b.FirstName ,
a.MiddleName ,
b.Suffix ,
a.Addr1 ,
b.Addr2 ,
b.City ,
a.State ,
a.Zip ,
a.County ,
b.Phone1 ,
a.Phone2 ,
b.DOB ,
a.SN ,
a.Gender ,
a.DOD ,
a.MNum ,
b.MedNum ,
a.TotalCharges ,
a.CurrentBalance ,
b.CurrentStatus ,
a.CreateDate ,
a.UpdateDate ,
b.StatusDate ,
a.ClFK ,
a.DeChecksum ,
b.DemDate ,
b.TrgUpdateDate ,
a.MoNum ,
b.UpFK ,
a.Fde ,
a.Tats ,
b.TAdjuts ,
b.MHC ,
a.HFK
FROM dbo.PtActs a
join dbo.PtActs c
on a.PtIPK=c.PtIPK
and a.HlFK=c.HlFK
CROSS JOIN dbo.PtActs b
IF (select COUNT(*) from PtActs with(nolock)) > 10050000 BREAK
END
January 12, 2015 at 8:14 pm
A TOP without an ORDER BY means you could see data duplication or missing data. This is because you're not guaranteed to get the results back in any particular order, so a TOP operation could see different data from one execution to the next. That's the one thing I'd do.
Other than that, it's hard to know what's happening. Your basic query doesn't look overly crazy or complex and there's not an obvious tuning point. Can you post the execution plan?
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 12, 2015 at 9:24 pm
curious_sqldba (1/12/2015)
I am trying to load data in batches on 10 K. The select statement at the bottom is taking about 60 % of the cost, is there better way to do this. I should be able to control how many records i want in the table.
DECLARE @BatchSize INT = 10000
WHILE 1 = 1
BEGIN
INSERT INTO dbo.PtActs
SELECT TOP(@BatchSize) NEWID( ) ,
a.HlFK ,
CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,
dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,
--a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,
b.DDate ,
a.RDate ,
a.LastName ,
b.FirstName ,
a.MiddleName ,
b.Suffix ,
a.Addr1 ,
b.Addr2 ,
b.City ,
a.State ,
a.Zip ,
a.County ,
b.Phone1 ,
a.Phone2 ,
b.DOB ,
a.SN ,
a.Gender ,
a.DOD ,
a.MNum ,
b.MedNum ,
a.TotalCharges ,
a.CurrentBalance ,
b.CurrentStatus ,
a.CreateDate ,
a.UpdateDate ,
b.StatusDate ,
a.ClFK ,
a.DeChecksum ,
b.DemDate ,
b.TrgUpdateDate ,
a.MoNum ,
b.UpFK ,
a.Fde ,
a.Tats ,
b.TAdjuts ,
b.MHC ,
a.HFK
FROM dbo.PtActs a
join dbo.PtActs c
on a.PtIPK=c.PtIPK
and a.HlFK=c.HlFK
CROSS JOIN dbo.PtActs b
IF (select COUNT(*) from PtActs with(nolock)) > 10050000 BREAK
END
WHY do you have a CROSS JOIN of the PtActs table? You're not just loading data... you're generating it and it looks to be seriously out of control. You also have a self join of the PtActs table that appears (looking at the perfectly equal criteria) to be doing nothing but taking twice the amount of resources as just selecting from the table once. Worse than that, you're then turning around and inserting all that generated data back into the original table in a loop.
You say you're "loading data" but what is that you really want to do? What is the actual task? If it's to generate more than 10 million rows of mostly duplicated data, you've succeeded, albeit in a bit of a slow fashion. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
January 13, 2015 at 8:36 am
Jeff Moden (1/12/2015)
curious_sqldba (1/12/2015)
I am trying to load data in batches on 10 K. The select statement at the bottom is taking about 60 % of the cost, is there better way to do this. I should be able to control how many records i want in the table.
DECLARE @BatchSize INT = 10000
WHILE 1 = 1
BEGIN
INSERT INTO dbo.PtActs
SELECT TOP(@BatchSize) NEWID( ) ,
a.HlFK ,
CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,
dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,
--a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,
b.DDate ,
a.RDate ,
a.LastName ,
b.FirstName ,
a.MiddleName ,
b.Suffix ,
a.Addr1 ,
b.Addr2 ,
b.City ,
a.State ,
a.Zip ,
a.County ,
b.Phone1 ,
a.Phone2 ,
b.DOB ,
a.SN ,
a.Gender ,
a.DOD ,
a.MNum ,
b.MedNum ,
a.TotalCharges ,
a.CurrentBalance ,
b.CurrentStatus ,
a.CreateDate ,
a.UpdateDate ,
b.StatusDate ,
a.ClFK ,
a.DeChecksum ,
b.DemDate ,
b.TrgUpdateDate ,
a.MoNum ,
b.UpFK ,
a.Fde ,
a.Tats ,
b.TAdjuts ,
b.MHC ,
a.HFK
FROM dbo.PtActs a
join dbo.PtActs c
on a.PtIPK=c.PtIPK
and a.HlFK=c.HlFK
CROSS JOIN dbo.PtActs b
IF (select COUNT(*) from PtActs with(nolock)) > 10050000 BREAK
END
WHY do you have a CROSS JOIN of the PtActs table? You're not just loading data... you're generating it and it looks to be seriously out of control. You also have a self join of the PtActs table that appears (looking at the perfectly equal criteria) to be doing nothing but taking twice the amount of resources as just selecting from the table once. Worse than that, you're then turning around and inserting all that generated data back into the original table in a loop.
You say you're "loading data" but what is that you really want to do? What is the actual task? If it's to generate more than 10 million rows of mostly duplicated data, you've succeeded, albeit in a bit of a slow fashion. 😉
That is exactly what i am trying to do, load up some dummy data hopefully in a faster way possible
January 13, 2015 at 10:45 am
You don't need to re-count the rows every time.
DECLARE @BatchSize int
DECLARE @RowRount int
DECLARE @TableRowCount int
DECLARE @TableRowLimit int
SET @BatchSize = 10000
SELECT @TableRowCount = COUNT(*)
FROM PtActs WITH (NOLOCK)
SET @TableRowLimit = 10050000
WHILE 1 = 1
BEGIN
INSERT INTO dbo.PtActs
SELECT TOP(@BatchSize) NEWID( ) ,
a.HlFK ,
CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,
dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,
--a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,
b.DDate ,
a.RDate ,
a.LastName ,
b.FirstName ,
a.MiddleName ,
b.Suffix ,
a.Addr1 ,
b.Addr2 ,
b.City ,
a.State ,
a.Zip ,
a.County ,
b.Phone1 ,
a.Phone2 ,
b.DOB ,
a.SN ,
a.Gender ,
a.DOD ,
a.MNum ,
b.MedNum ,
a.TotalCharges ,
a.CurrentBalance ,
b.CurrentStatus ,
a.CreateDate ,
a.UpdateDate ,
b.StatusDate ,
a.ClFK ,
a.DeChecksum ,
b.DemDate ,
b.TrgUpdateDate ,
a.MoNum ,
b.UpFK ,
a.Fde ,
a.Tats ,
b.TAdjuts ,
b.MHC ,
a.HFK
FROM dbo.PtActs a
join dbo.PtActs c
on a.PtIPK=c.PtIPK
and a.HlFK=c.HlFK
SET @RowRount = @@ROWCOUNT
SET @TableRowCount = @TableRowCount + @RowRount
IF @TableRowCount >= @TableRowLimit
BREAK
END
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 13, 2015 at 1:47 pm
ScottPletcher (1/13/2015)
You don't need to re-count the rows every time.
DECLARE @BatchSize int
DECLARE @RowRount int
DECLARE @TableRowCount int
DECLARE @TableRowLimit int
SET @BatchSize = 10000
SELECT @TableRowCount = COUNT(*)
FROM PtActs WITH (NOLOCK)
SET @TableRowLimit = 10050000
WHILE 1 = 1
BEGIN
INSERT INTO dbo.PtActs
SELECT TOP(@BatchSize) NEWID( ) ,
a.HlFK ,
CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,
dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,
--a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,
b.DDate ,
a.RDate ,
a.LastName ,
b.FirstName ,
a.MiddleName ,
b.Suffix ,
a.Addr1 ,
b.Addr2 ,
b.City ,
a.State ,
a.Zip ,
a.County ,
b.Phone1 ,
a.Phone2 ,
b.DOB ,
a.SN ,
a.Gender ,
a.DOD ,
a.MNum ,
b.MedNum ,
a.TotalCharges ,
a.CurrentBalance ,
b.CurrentStatus ,
a.CreateDate ,
a.UpdateDate ,
b.StatusDate ,
a.ClFK ,
a.DeChecksum ,
b.DemDate ,
b.TrgUpdateDate ,
a.MoNum ,
b.UpFK ,
a.Fde ,
a.Tats ,
b.TAdjuts ,
b.MHC ,
a.HFK
FROM dbo.PtActs a
join dbo.PtActs c
on a.PtIPK=c.PtIPK
and a.HlFK=c.HlFK
SET @RowRount = @@ROWCOUNT
SET @TableRowCount = @TableRowCount + @RowRount
IF @TableRowCount >= @TableRowLimit
BREAK
END
Tried took same time, count(*) was 70% of the cost. Thanks
January 13, 2015 at 1:59 pm
curious_sqldba (1/13/2015)
ScottPletcher (1/13/2015)
You don't need to re-count the rows every time.
DECLARE @BatchSize int
DECLARE @RowRount int
DECLARE @TableRowCount int
DECLARE @TableRowLimit int
SET @BatchSize = 10000
SELECT @TableRowCount = COUNT(*)
FROM PtActs WITH (NOLOCK)
SET @TableRowLimit = 10050000
WHILE 1 = 1
BEGIN
INSERT INTO dbo.PtActs
SELECT TOP(@BatchSize) NEWID( ) ,
a.HlFK ,
CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,
dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,
--a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,
b.DDate ,
a.RDate ,
a.LastName ,
b.FirstName ,
a.MiddleName ,
b.Suffix ,
a.Addr1 ,
b.Addr2 ,
b.City ,
a.State ,
a.Zip ,
a.County ,
b.Phone1 ,
a.Phone2 ,
b.DOB ,
a.SN ,
a.Gender ,
a.DOD ,
a.MNum ,
b.MedNum ,
a.TotalCharges ,
a.CurrentBalance ,
b.CurrentStatus ,
a.CreateDate ,
a.UpdateDate ,
b.StatusDate ,
a.ClFK ,
a.DeChecksum ,
b.DemDate ,
b.TrgUpdateDate ,
a.MoNum ,
b.UpFK ,
a.Fde ,
a.Tats ,
b.TAdjuts ,
b.MHC ,
a.HFK
FROM dbo.PtActs a
join dbo.PtActs c
on a.PtIPK=c.PtIPK
and a.HlFK=c.HlFK
SET @RowRount = @@ROWCOUNT
SET @TableRowCount = @TableRowCount + @RowRount
IF @TableRowCount >= @TableRowLimit
BREAK
END
Tried took same time, count(*) was 70% of the cost. Thanks
Odd. I removed the CROSS JOIN and the multiple counts from the table. The overall time should not be the same barring some exceptional activity or perhaps slowness in writing the log records. Is the db in SIMPLE recovery model? If not, did you pre-allocate log space for all rows to be INSERTed?
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
January 13, 2015 at 2:53 pm
ScottPletcher (1/13/2015)
curious_sqldba (1/13/2015)
ScottPletcher (1/13/2015)
You don't need to re-count the rows every time.
DECLARE @BatchSize int
DECLARE @RowRount int
DECLARE @TableRowCount int
DECLARE @TableRowLimit int
SET @BatchSize = 10000
SELECT @TableRowCount = COUNT(*)
FROM PtActs WITH (NOLOCK)
SET @TableRowLimit = 10050000
WHILE 1 = 1
BEGIN
INSERT INTO dbo.PtActs
SELECT TOP(@BatchSize) NEWID( ) ,
a.HlFK ,
CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,
dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,
--a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,
b.DDate ,
a.RDate ,
a.LastName ,
b.FirstName ,
a.MiddleName ,
b.Suffix ,
a.Addr1 ,
b.Addr2 ,
b.City ,
a.State ,
a.Zip ,
a.County ,
b.Phone1 ,
a.Phone2 ,
b.DOB ,
a.SN ,
a.Gender ,
a.DOD ,
a.MNum ,
b.MedNum ,
a.TotalCharges ,
a.CurrentBalance ,
b.CurrentStatus ,
a.CreateDate ,
a.UpdateDate ,
b.StatusDate ,
a.ClFK ,
a.DeChecksum ,
b.DemDate ,
b.TrgUpdateDate ,
a.MoNum ,
b.UpFK ,
a.Fde ,
a.Tats ,
b.TAdjuts ,
b.MHC ,
a.HFK
FROM dbo.PtActs a
join dbo.PtActs c
on a.PtIPK=c.PtIPK
and a.HlFK=c.HlFK
SET @RowRount = @@ROWCOUNT
SET @TableRowCount = @TableRowCount + @RowRount
IF @TableRowCount >= @TableRowLimit
BREAK
END
Tried took same time, count(*) was 70% of the cost. Thanks
Odd. I removed the CROSS JOIN and the multiple counts from the table. The overall time should not be the same barring some exceptional activity or perhaps slowness in writing the log records. Is the db in SIMPLE recovery model? If not, did you pre-allocate log space for all rows to be INSERTed?
i do need that cross join because i want to have scrambled data...thats how i tested your query
January 14, 2015 at 12:40 am
Count the rows added to the table. How many seed rows (rows you start with)?
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
January 14, 2015 at 11:56 am
Quick though, unless you have an under-performing/resource starved system, 10000 sounds like a very small number for an insert batch size, any particular reason for this number? Cannot see it matching anything in terms of rows/pages/allocation units or such.
😎
There is a way of achieving this more efficiently by querying the sys.partitions view instead of the count, here is a quick example.
USE tempdb;
GO
SET NOCOUNT ON;
IF OBJECT_ID(N'dbo.TBL_BATCH_INSERT') IS NOT NULL DROP TABLE dbo.TBL_BATCH_INSERT;
CREATE TABLE dbo.TBL_BATCH_INSERT
(
BTCIN_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_BATCH_INSERT_BTCIN_ID PRIMARY KEY CLUSTERED
,BTCIN_NUM BIGINT NOT NULL
);
DECLARE @BatchSize INT = 10000;
DECLARE @END_COUNT INT = 10050000;
DECLARE @TSIZE INT = 0;
WHILE @TSIZE < @END_COUNT
BEGIN
;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))
, NUMS(N) AS (SELECT TOP (@BatchSize) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)
INSERT INTO dbo.TBL_BATCH_INSERT(BTCIN_NUM)
SELECT CHECKSUM(NEWID())
FROM NUMS NM
SELECT
@TSIZE = SP.rows
FROM sys.partitions SP
WHERE SP.object_id = OBJECT_ID(N'dbo.TBL_BATCH_INSERT')
AND SP.index_id = 1;
END
January 14, 2015 at 1:18 pm
curious_sqldba (1/13/2015)
Tried took same time, count(*) was 70% of the cost. Thanks
cost != time
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply