Is there a better way to write this query.

  • I am trying to load data in batches on 10 K. The select statement at the bottom is taking about 60 % of the cost, is there better way to do this. I should be able to control how many records i want in the table.

    DECLARE @BatchSize INT = 10000

    WHILE 1 = 1

    BEGIN

    INSERT INTO dbo.PtActs

    SELECT TOP(@BatchSize) NEWID( ) ,

    a.HlFK ,

    CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,

    dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,

    --a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,

    b.DDate ,

    a.RDate ,

    a.LastName ,

    b.FirstName ,

    a.MiddleName ,

    b.Suffix ,

    a.Addr1 ,

    b.Addr2 ,

    b.City ,

    a.State ,

    a.Zip ,

    a.County ,

    b.Phone1 ,

    a.Phone2 ,

    b.DOB ,

    a.SN ,

    a.Gender ,

    a.DOD ,

    a.MNum ,

    b.MedNum ,

    a.TotalCharges ,

    a.CurrentBalance ,

    b.CurrentStatus ,

    a.CreateDate ,

    a.UpdateDate ,

    b.StatusDate ,

    a.ClFK ,

    a.DeChecksum ,

    b.DemDate ,

    b.TrgUpdateDate ,

    a.MoNum ,

    b.UpFK ,

    a.Fde ,

    a.Tats ,

    b.TAdjuts ,

    b.MHC ,

    a.HFK

    FROM dbo.PtActs a

    join dbo.PtActs c

    on a.PtIPK=c.PtIPK

    and a.HlFK=c.HlFK

    CROSS JOIN dbo.PtActs b

    IF (select COUNT(*) from PtActs with(nolock)) > 10050000 BREAK

    END

  • A TOP without an ORDER BY means you could see data duplication or missing data. This is because you're not guaranteed to get the results back in any particular order, so a TOP operation could see different data from one execution to the next. That's the one thing I'd do.

    Other than that, it's hard to know what's happening. Your basic query doesn't look overly crazy or complex and there's not an obvious tuning point. Can you post the execution plan?

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • curious_sqldba (1/12/2015)


    I am trying to load data in batches on 10 K. The select statement at the bottom is taking about 60 % of the cost, is there better way to do this. I should be able to control how many records i want in the table.

    DECLARE @BatchSize INT = 10000

    WHILE 1 = 1

    BEGIN

    INSERT INTO dbo.PtActs

    SELECT TOP(@BatchSize) NEWID( ) ,

    a.HlFK ,

    CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,

    dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,

    --a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,

    b.DDate ,

    a.RDate ,

    a.LastName ,

    b.FirstName ,

    a.MiddleName ,

    b.Suffix ,

    a.Addr1 ,

    b.Addr2 ,

    b.City ,

    a.State ,

    a.Zip ,

    a.County ,

    b.Phone1 ,

    a.Phone2 ,

    b.DOB ,

    a.SN ,

    a.Gender ,

    a.DOD ,

    a.MNum ,

    b.MedNum ,

    a.TotalCharges ,

    a.CurrentBalance ,

    b.CurrentStatus ,

    a.CreateDate ,

    a.UpdateDate ,

    b.StatusDate ,

    a.ClFK ,

    a.DeChecksum ,

    b.DemDate ,

    b.TrgUpdateDate ,

    a.MoNum ,

    b.UpFK ,

    a.Fde ,

    a.Tats ,

    b.TAdjuts ,

    b.MHC ,

    a.HFK

    FROM dbo.PtActs a

    join dbo.PtActs c

    on a.PtIPK=c.PtIPK

    and a.HlFK=c.HlFK

    CROSS JOIN dbo.PtActs b

    IF (select COUNT(*) from PtActs with(nolock)) > 10050000 BREAK

    END

    WHY do you have a CROSS JOIN of the PtActs table? You're not just loading data... you're generating it and it looks to be seriously out of control. You also have a self join of the PtActs table that appears (looking at the perfectly equal criteria) to be doing nothing but taking twice the amount of resources as just selecting from the table once. Worse than that, you're then turning around and inserting all that generated data back into the original table in a loop.

    You say you're "loading data" but what is that you really want to do? What is the actual task? If it's to generate more than 10 million rows of mostly duplicated data, you've succeeded, albeit in a bit of a slow fashion. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (1/12/2015)


    curious_sqldba (1/12/2015)


    I am trying to load data in batches on 10 K. The select statement at the bottom is taking about 60 % of the cost, is there better way to do this. I should be able to control how many records i want in the table.

    DECLARE @BatchSize INT = 10000

    WHILE 1 = 1

    BEGIN

    INSERT INTO dbo.PtActs

    SELECT TOP(@BatchSize) NEWID( ) ,

    a.HlFK ,

    CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,

    dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,

    --a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,

    b.DDate ,

    a.RDate ,

    a.LastName ,

    b.FirstName ,

    a.MiddleName ,

    b.Suffix ,

    a.Addr1 ,

    b.Addr2 ,

    b.City ,

    a.State ,

    a.Zip ,

    a.County ,

    b.Phone1 ,

    a.Phone2 ,

    b.DOB ,

    a.SN ,

    a.Gender ,

    a.DOD ,

    a.MNum ,

    b.MedNum ,

    a.TotalCharges ,

    a.CurrentBalance ,

    b.CurrentStatus ,

    a.CreateDate ,

    a.UpdateDate ,

    b.StatusDate ,

    a.ClFK ,

    a.DeChecksum ,

    b.DemDate ,

    b.TrgUpdateDate ,

    a.MoNum ,

    b.UpFK ,

    a.Fde ,

    a.Tats ,

    b.TAdjuts ,

    b.MHC ,

    a.HFK

    FROM dbo.PtActs a

    join dbo.PtActs c

    on a.PtIPK=c.PtIPK

    and a.HlFK=c.HlFK

    CROSS JOIN dbo.PtActs b

    IF (select COUNT(*) from PtActs with(nolock)) > 10050000 BREAK

    END

    WHY do you have a CROSS JOIN of the PtActs table? You're not just loading data... you're generating it and it looks to be seriously out of control. You also have a self join of the PtActs table that appears (looking at the perfectly equal criteria) to be doing nothing but taking twice the amount of resources as just selecting from the table once. Worse than that, you're then turning around and inserting all that generated data back into the original table in a loop.

    You say you're "loading data" but what is that you really want to do? What is the actual task? If it's to generate more than 10 million rows of mostly duplicated data, you've succeeded, albeit in a bit of a slow fashion. 😉

    That is exactly what i am trying to do, load up some dummy data hopefully in a faster way possible

  • You don't need to re-count the rows every time.

    DECLARE @BatchSize int

    DECLARE @RowRount int

    DECLARE @TableRowCount int

    DECLARE @TableRowLimit int

    SET @BatchSize = 10000

    SELECT @TableRowCount = COUNT(*)

    FROM PtActs WITH (NOLOCK)

    SET @TableRowLimit = 10050000

    WHILE 1 = 1

    BEGIN

    INSERT INTO dbo.PtActs

    SELECT TOP(@BatchSize) NEWID( ) ,

    a.HlFK ,

    CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,

    dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,

    --a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,

    b.DDate ,

    a.RDate ,

    a.LastName ,

    b.FirstName ,

    a.MiddleName ,

    b.Suffix ,

    a.Addr1 ,

    b.Addr2 ,

    b.City ,

    a.State ,

    a.Zip ,

    a.County ,

    b.Phone1 ,

    a.Phone2 ,

    b.DOB ,

    a.SN ,

    a.Gender ,

    a.DOD ,

    a.MNum ,

    b.MedNum ,

    a.TotalCharges ,

    a.CurrentBalance ,

    b.CurrentStatus ,

    a.CreateDate ,

    a.UpdateDate ,

    b.StatusDate ,

    a.ClFK ,

    a.DeChecksum ,

    b.DemDate ,

    b.TrgUpdateDate ,

    a.MoNum ,

    b.UpFK ,

    a.Fde ,

    a.Tats ,

    b.TAdjuts ,

    b.MHC ,

    a.HFK

    FROM dbo.PtActs a

    join dbo.PtActs c

    on a.PtIPK=c.PtIPK

    and a.HlFK=c.HlFK

    SET @RowRount = @@ROWCOUNT

    SET @TableRowCount = @TableRowCount + @RowRount

    IF @TableRowCount >= @TableRowLimit

    BREAK

    END

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (1/13/2015)


    You don't need to re-count the rows every time.

    DECLARE @BatchSize int

    DECLARE @RowRount int

    DECLARE @TableRowCount int

    DECLARE @TableRowLimit int

    SET @BatchSize = 10000

    SELECT @TableRowCount = COUNT(*)

    FROM PtActs WITH (NOLOCK)

    SET @TableRowLimit = 10050000

    WHILE 1 = 1

    BEGIN

    INSERT INTO dbo.PtActs

    SELECT TOP(@BatchSize) NEWID( ) ,

    a.HlFK ,

    CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,

    dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,

    --a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,

    b.DDate ,

    a.RDate ,

    a.LastName ,

    b.FirstName ,

    a.MiddleName ,

    b.Suffix ,

    a.Addr1 ,

    b.Addr2 ,

    b.City ,

    a.State ,

    a.Zip ,

    a.County ,

    b.Phone1 ,

    a.Phone2 ,

    b.DOB ,

    a.SN ,

    a.Gender ,

    a.DOD ,

    a.MNum ,

    b.MedNum ,

    a.TotalCharges ,

    a.CurrentBalance ,

    b.CurrentStatus ,

    a.CreateDate ,

    a.UpdateDate ,

    b.StatusDate ,

    a.ClFK ,

    a.DeChecksum ,

    b.DemDate ,

    b.TrgUpdateDate ,

    a.MoNum ,

    b.UpFK ,

    a.Fde ,

    a.Tats ,

    b.TAdjuts ,

    b.MHC ,

    a.HFK

    FROM dbo.PtActs a

    join dbo.PtActs c

    on a.PtIPK=c.PtIPK

    and a.HlFK=c.HlFK

    SET @RowRount = @@ROWCOUNT

    SET @TableRowCount = @TableRowCount + @RowRount

    IF @TableRowCount >= @TableRowLimit

    BREAK

    END

    Tried took same time, count(*) was 70% of the cost. Thanks

  • curious_sqldba (1/13/2015)


    ScottPletcher (1/13/2015)


    You don't need to re-count the rows every time.

    DECLARE @BatchSize int

    DECLARE @RowRount int

    DECLARE @TableRowCount int

    DECLARE @TableRowLimit int

    SET @BatchSize = 10000

    SELECT @TableRowCount = COUNT(*)

    FROM PtActs WITH (NOLOCK)

    SET @TableRowLimit = 10050000

    WHILE 1 = 1

    BEGIN

    INSERT INTO dbo.PtActs

    SELECT TOP(@BatchSize) NEWID( ) ,

    a.HlFK ,

    CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,

    dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,

    --a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,

    b.DDate ,

    a.RDate ,

    a.LastName ,

    b.FirstName ,

    a.MiddleName ,

    b.Suffix ,

    a.Addr1 ,

    b.Addr2 ,

    b.City ,

    a.State ,

    a.Zip ,

    a.County ,

    b.Phone1 ,

    a.Phone2 ,

    b.DOB ,

    a.SN ,

    a.Gender ,

    a.DOD ,

    a.MNum ,

    b.MedNum ,

    a.TotalCharges ,

    a.CurrentBalance ,

    b.CurrentStatus ,

    a.CreateDate ,

    a.UpdateDate ,

    b.StatusDate ,

    a.ClFK ,

    a.DeChecksum ,

    b.DemDate ,

    b.TrgUpdateDate ,

    a.MoNum ,

    b.UpFK ,

    a.Fde ,

    a.Tats ,

    b.TAdjuts ,

    b.MHC ,

    a.HFK

    FROM dbo.PtActs a

    join dbo.PtActs c

    on a.PtIPK=c.PtIPK

    and a.HlFK=c.HlFK

    SET @RowRount = @@ROWCOUNT

    SET @TableRowCount = @TableRowCount + @RowRount

    IF @TableRowCount >= @TableRowLimit

    BREAK

    END

    Tried took same time, count(*) was 70% of the cost. Thanks

    Odd. I removed the CROSS JOIN and the multiple counts from the table. The overall time should not be the same barring some exceptional activity or perhaps slowness in writing the log records. Is the db in SIMPLE recovery model? If not, did you pre-allocate log space for all rows to be INSERTed?

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (1/13/2015)


    curious_sqldba (1/13/2015)


    ScottPletcher (1/13/2015)


    You don't need to re-count the rows every time.

    DECLARE @BatchSize int

    DECLARE @RowRount int

    DECLARE @TableRowCount int

    DECLARE @TableRowLimit int

    SET @BatchSize = 10000

    SELECT @TableRowCount = COUNT(*)

    FROM PtActs WITH (NOLOCK)

    SET @TableRowLimit = 10050000

    WHILE 1 = 1

    BEGIN

    INSERT INTO dbo.PtActs

    SELECT TOP(@BatchSize) NEWID( ) ,

    a.HlFK ,

    CAST(LEFT(CAST(ABS(CAST(CAST(NEWID() as BINARY(10)) as int)) as varchar(max)) + '00000000',9) as varchar(max)) ,

    dateadd(day,rand(checksum(newid()))*(1+datediff(day, a.AdmitDate, a.DOB)),a.AdmitDate) Admitdate,

    --a.DOB + cast(right(CHECKSUM(a.PIPK),2)as int) AdmitDate ,

    b.DDate ,

    a.RDate ,

    a.LastName ,

    b.FirstName ,

    a.MiddleName ,

    b.Suffix ,

    a.Addr1 ,

    b.Addr2 ,

    b.City ,

    a.State ,

    a.Zip ,

    a.County ,

    b.Phone1 ,

    a.Phone2 ,

    b.DOB ,

    a.SN ,

    a.Gender ,

    a.DOD ,

    a.MNum ,

    b.MedNum ,

    a.TotalCharges ,

    a.CurrentBalance ,

    b.CurrentStatus ,

    a.CreateDate ,

    a.UpdateDate ,

    b.StatusDate ,

    a.ClFK ,

    a.DeChecksum ,

    b.DemDate ,

    b.TrgUpdateDate ,

    a.MoNum ,

    b.UpFK ,

    a.Fde ,

    a.Tats ,

    b.TAdjuts ,

    b.MHC ,

    a.HFK

    FROM dbo.PtActs a

    join dbo.PtActs c

    on a.PtIPK=c.PtIPK

    and a.HlFK=c.HlFK

    SET @RowRount = @@ROWCOUNT

    SET @TableRowCount = @TableRowCount + @RowRount

    IF @TableRowCount >= @TableRowLimit

    BREAK

    END

    Tried took same time, count(*) was 70% of the cost. Thanks

    Odd. I removed the CROSS JOIN and the multiple counts from the table. The overall time should not be the same barring some exceptional activity or perhaps slowness in writing the log records. Is the db in SIMPLE recovery model? If not, did you pre-allocate log space for all rows to be INSERTed?

    i do need that cross join because i want to have scrambled data...thats how i tested your query

  • Count the rows added to the table. How many seed rows (rows you start with)?


    [font="Arial"]Low-hanging fruit picker and defender of the moggies[/font]

    For better assistance in answering your questions, please read this[/url].


    Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

    Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

  • Quick though, unless you have an under-performing/resource starved system, 10000 sounds like a very small number for an insert batch size, any particular reason for this number? Cannot see it matching anything in terms of rows/pages/allocation units or such.

    😎

    There is a way of achieving this more efficiently by querying the sys.partitions view instead of the count, here is a quick example.

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.TBL_BATCH_INSERT') IS NOT NULL DROP TABLE dbo.TBL_BATCH_INSERT;

    CREATE TABLE dbo.TBL_BATCH_INSERT

    (

    BTCIN_ID INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_DBO_TBL_BATCH_INSERT_BTCIN_ID PRIMARY KEY CLUSTERED

    ,BTCIN_NUM BIGINT NOT NULL

    );

    DECLARE @BatchSize INT = 10000;

    DECLARE @END_COUNT INT = 10050000;

    DECLARE @TSIZE INT = 0;

    WHILE @TSIZE < @END_COUNT

    BEGIN

    ;WITH T(N) AS (SELECT N FROM (VALUES (NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL)) AS X(N))

    , NUMS(N) AS (SELECT TOP (@BatchSize) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS N FROM T T1,T T2,T T3,T T4,T T5,T T6,T T7,T T8,T T9)

    INSERT INTO dbo.TBL_BATCH_INSERT(BTCIN_NUM)

    SELECT CHECKSUM(NEWID())

    FROM NUMS NM

    SELECT

    @TSIZE = SP.rows

    FROM sys.partitions SP

    WHERE SP.object_id = OBJECT_ID(N'dbo.TBL_BATCH_INSERT')

    AND SP.index_id = 1;

    END

  • curious_sqldba (1/13/2015)


    Tried took same time, count(*) was 70% of the cost. Thanks

    cost != time


    Alex Suprun

Viewing 11 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply