Insert Faster way

Question

Insert Faster way

Viewing 8 posts - 16 through 23 (of 23 total)

You must be logged in to reply to this topic. Login to reply

RGP SSCarpal Tunnel Points: 4381 More actions · Answer 1

Jeff Moden - Saturday, January 5, 2019 9:05 AM
RGP - Saturday, January 5, 2019 3:37 AM
Jeff Moden - Friday, January 4, 2019 11:55 PM
p.p.s. BTW, what's the data type of that JOB_INFO column?
Thanks Jeff, & I really appreciate your time on this, I'll test this out soon ...[JOB_INFO] [nvarchar](400) NULL, for your reference DDL of table is:
.[JOB]( [JOB_ID] [bigint] IDENTITY(1,1) NOT NULL, [JOB_DEFINITION_ID] [bigint] NOT NULL, [PROCESS_ID] [uniqueidentifier] NULL, [PARENT_JOB_ID] [bigint] NULL, [PREVIOUS_BATCH_ID] [uniqueidentifier] NULL, [FROM_ID] [nvarchar](400) NOT NULL, [FROM_SOURCE_ID] [bigint] NULL, [FROM_SESSION_ID] [bigint] NULL, [TO_ID] [nvarchar](400) NULL, [TO_SOURCE_ID] [bigint] NULL, [TO_SESSION_ID] [bigint] NULL, [JOB_STATUS] [bigint] NOT NULL, [JOB_INFO] [nvarchar](400) NULL, [LOCKED_BY] [uniqueidentifier] NULL, [UDATE] [datetime] NOT NULL, [CDATE] [datetime] NOT NULL ) ON [PRIMARY]
GO
ALTER TABLE [dbo].[JOB] ADD DEFAULT (getdate()) FOR [UDATE] GO
ALTER TABLE [dbo].[JOB] ADD DEFAULT (getdate()) FOR [CDATE] GO
"]
Ok... thanks. Even more concerning than the Job_Info column is the fact that the FROM_ID and TO_ID columns are defined as NVARCHAR(400) and not just for copy performance. It's unusual for "ID" columns to have such a wide variable width and if they really are used as "IDs" in joins and the like, I can see there being some real performance issues in many other areas.
Shifting gears back to the original problem, obviously the data will take longer to transfer than my 24 second example simply because the byte size of most columns in the real table is at least double that which I tested with. Because of the NVARCHAR(400) columns, I wouldn't be surprised if the copy took up to 4 minutes or so. Of course, that's a total swag on my part but just want to set expectations.

For the recommended query, I got a message like:
Msg 544, Level 16, State 1, Line 15
Cannot insert explicit value for identity column in table 'JOB' when IDENTITY_INSERT is set to OFF.

Do you think I need to enable the Indentity_Insert ON option ?

Query :
--=====================================================================================================================
--  We're ready for the faster way to do inserts from the source table.
--  This to 00:00:24 (hh:mi:ss) on my laptop. Yeah... you're reading that correctly... 24 seconds.
--=====================================================================================================================
--===== Ensure that we're in the Bulk-Logged Recovery Model.
-- You'll need to change the name of the database here.
ALTER DATABASE [Test] SET RECOVERY BULK_LOGGED WITH NO_WAIT
;
--===== You said the target table would be empty. Let's make sure that's true although it's not necessary. You can
-- still get "Minimal Logging" even with rows in the table and a Clustered Index (only) since 2008.
   TRUNCATE TABLE Test.dbo.JOB
;
--===== Now, it's simple and fast and "Minimally Logged" when we transfer data to the target table.
INSERT INTO Test.dbo.JOB WITH (TABLOCK) --Required for "Minimal Logging"
   (
    JOB_ID, JOB_DEFINITION_ID, PROCESS_ID, PARENT_JOB_ID, PREVIOUS_BATCH_ID, FROM_ID, FROM_SOURCE_ID
   , FROM_SESSION_ID, TO_ID, TO_SOURCE_ID, TO_SESSION_ID, JOB_STATUS, JOB_INFO, LOCKED_BY, UDATE, CDATE
   )
SELECT JOB_ID
   ,JOB_DEFINITION_ID
   ,PROCESS_ID
   ,PARENT_JOB_ID
   ,PREVIOUS_BATCH_ID
   ,FROM_ID
   ,FROM_SOURCE_ID
   ,FROM_SESSION_ID
   ,TO_ID
   ,TO_SOURCE_ID
   ,TO_SESSION_ID
   ,JOB_STATUS
   ,JOB_INFO
   ,LOCKED_BY
   ,UDATE
   ,CDATE
FROM Admin.dbo.Job
ORDER BY JOB_ID --Because the Clustered Indexes are identical, this isn't necessary. It's just a safeguard for "Minimal Logging".
OPTION (RECOMPILE) --Helps guarantee "Minimal Logging"
;
--===== If your database started out in the FULL Recovery Model, change it back to that.
-- You'll need to change the name of the database here.
ALTER DATABASE [Test] SET RECOVERY FULL WITH NO_WAIT
;

RGP SSCarpal Tunnel Points: 4381 More actions · Answer 2

RGP - Monday, January 7, 2019 8:26 AM
Jeff Moden - Saturday, January 5, 2019 9:05 AM
RGP - Saturday, January 5, 2019 3:37 AM
Jeff Moden - Friday, January 4, 2019 11:55 PM
p.p.s. BTW, what's the data type of that JOB_INFO  column?
Thanks Jeff, & I really appreciate your time on this, I'll test this out soon ...[JOB_INFO] [nvarchar](400) NULL, for your reference DDL of table is:
.[JOB](     [JOB_ID] [bigint] IDENTITY(1,1) NOT NULL,     [JOB_DEFINITION_ID] [bigint] NOT NULL,     [PROCESS_ID] [uniqueidentifier] NULL,     [PARENT_JOB_ID] [bigint] NULL,     [PREVIOUS_BATCH_ID] [uniqueidentifier] NULL,     [FROM_ID] [nvarchar](400) NOT NULL,     [FROM_SOURCE_ID] [bigint] NULL,     [FROM_SESSION_ID] [bigint] NULL,     [TO_ID] [nvarchar](400) NULL,     [TO_SOURCE_ID] [bigint] NULL,     [TO_SESSION_ID] [bigint] NULL,     [JOB_STATUS] [bigint] NOT NULL,     [JOB_INFO] [nvarchar](400) NULL,     [LOCKED_BY] [uniqueidentifier] NULL,     [UDATE] [datetime] NOT NULL,     [CDATE] [datetime] NOT NULL ) ON [PRIMARY]
GO
ALTER TABLE [dbo].[JOB] ADD DEFAULT (getdate()) FOR [UDATE] GO
ALTER TABLE [dbo].[JOB] ADD DEFAULT (getdate()) FOR [CDATE] GO
"]
Ok... thanks. Even more concerning than the Job_Info column is the fact that the FROM_ID and TO_ID columns are defined as NVARCHAR(400) and not just for copy performance. It's unusual for "ID" columns to have such a wide variable width and if they really are used as "IDs" in joins and the like, I can see there being some real performance issues in many other areas.
Shifting gears back to the original problem, obviously the data will take longer to transfer than my 24 second example simply because the byte size of most columns in the real table is at least double that which I tested with. Because of the NVARCHAR(400) columns, I wouldn't be surprised if the copy took up to 4 minutes or so. Of course, that's a total swag on my part but just want to set expectations.
For the recommended query, I got a message like:
Msg 544, Level 16, State 1, Line 15
Cannot insert explicit value for identity column in table 'JOB' when IDENTITY_INSERT is set to OFF.
Do you think I need to enable the Indentity_Insert ON option ?
Query :
--=====================================================================================================================
--  We're ready for the faster way to do inserts from the source table.
--  This to 00:00:24 (hh:mi:ss) on my laptop. Yeah... you're reading that correctly... 24 seconds.
--=====================================================================================================================
--===== Ensure that we're in the Bulk-Logged Recovery Model.
-- You'll need to change the name of the database here.
ALTER DATABASE [Test] SET RECOVERY BULK_LOGGED WITH NO_WAIT
;
--===== You said the target table would be empty. Let's make sure that's true although it's not necessary. You can
-- still get "Minimal Logging" even with rows in the table and a Clustered Index (only) since 2008.
   TRUNCATE TABLE Test.dbo.JOB
;
--===== Now, it's simple and fast and "Minimally Logged" when we transfer data to the target table.
INSERT INTO Test.dbo.JOB WITH (TABLOCK) --Required for "Minimal Logging"
   (
    JOB_ID, JOB_DEFINITION_ID, PROCESS_ID, PARENT_JOB_ID, PREVIOUS_BATCH_ID, FROM_ID, FROM_SOURCE_ID
   , FROM_SESSION_ID, TO_ID, TO_SOURCE_ID, TO_SESSION_ID, JOB_STATUS, JOB_INFO, LOCKED_BY, UDATE, CDATE
   )
SELECT JOB_ID
   ,JOB_DEFINITION_ID
   ,PROCESS_ID
   ,PARENT_JOB_ID
   ,PREVIOUS_BATCH_ID
   ,FROM_ID
   ,FROM_SOURCE_ID
   ,FROM_SESSION_ID
   ,TO_ID
   ,TO_SOURCE_ID
   ,TO_SESSION_ID
   ,JOB_STATUS
   ,JOB_INFO
   ,LOCKED_BY
   ,UDATE
   ,CDATE
FROM Admin.dbo.Job
ORDER BY JOB_ID --Because the Clustered Indexes are identical, this isn't necessary. It's just a safeguard for "Minimal Logging".
OPTION (RECOMPILE) --Helps guarantee "Minimal Logging"
;
--===== If your database started out in the FULL Recovery Model, change it back to that.
-- You'll need to change the name of the database here.
ALTER DATABASE [Test] SET RECOVERY FULL WITH NO_WAIT
;

Perfectly completed the insert in 5 mins!! ... I enabled set identity_insert ON...
Thank you Jeff for your continuous support in helping resolve this and making my Insert faster 🙂

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 3

RGP - Monday, January 7, 2019 8:58 AM
Perfectly completed the insert in 5 mins!! ... I enabled set identity_insert ON...
Thank you Jeff for your continuous support in helping resolve this and making my Insert faster 🙂

How long was it taking before you rewrote it?

RGP SSCarpal Tunnel Points: 4381 More actions · Answer 4

Jonathan AC Roberts - Monday, January 7, 2019 9:47 AM
RGP - Monday, January 7, 2019 8:58 AM
Perfectly completed the insert in 5 mins!! ... I enabled set identity_insert ON...
Thank you Jeff for your continuous support in helping resolve this and making my Insert faster 🙂
How long was it taking before you rewrote it?

It was taking around 15 mins

Jonathan AC Roberts SSCoach Points: 18408 More actions · Answer 5

RGP - Monday, January 7, 2019 10:21 AM
Jonathan AC Roberts - Monday, January 7, 2019 9:47 AM
RGP - Monday, January 7, 2019 8:58 AM
Perfectly completed the insert in 5 mins!! ... I enabled set identity_insert ON...
Thank you Jeff for your continuous support in helping resolve this and making my Insert faster 🙂
How long was it taking before you rewrote it?
It was taking around 15 mins

That's great, a 3 times faster improvement.

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 6

RGP - Monday, January 7, 2019 10:21 AM
Jonathan AC Roberts - Monday, January 7, 2019 9:47 AM
RGP - Monday, January 7, 2019 8:58 AM
Perfectly completed the insert in 5 mins!! ... I enabled set identity_insert ON...
Thank you Jeff for your continuous support in helping resolve this and making my Insert faster 🙂
How long was it taking before you rewrote it?
It was taking around 15 mins

If there were no IDENTITY column, it would be faster still because there would be no whole table sort in TempDB.

You could still have the data from the identity column but, since it's a copy of a table, you don't need the identity property on the target table.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

p.s. You can also keep the IDENTITY column without a sort by loading up an identical staging table that has no IDENTITY column and then SWITCH it into the table that does have the IDENTITY column, provide that the target table is empty when you do the SWITCH or the target "table" is a partition of a larger table. That will avoid the penalty of the sort in TempDB.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

RGP SSCarpal Tunnel Points: 4381 More actions · Answer 8

RGP

SSCarpal Tunnel

Points: 4381

January 7, 2019 at 12:18 pm

#2018211

Thanks, that’s a good suggestion to try out!