January 19, 2024 at 9:14 am
Assume I have a table with a clustered datetime-index.
Assume I'm going to insert lots of data. The data to be inserted is not in any specific order.
Would the INSERT go faster if the data to be inserted is sorted by the datetime column?
I am just guessing here but inserting random data here would increase the chances of alot more pagesplits and such instead of the data already would be sorted?
January 19, 2024 at 2:50 pm
If I remember right, the BEST way to do the inserts is by inserting at the END of the index. So if your data is ordered by the clustered index, that'll give you the best performance. Another performance trick I've found is that if you disable the non clustered indexes and then rebuild them after the insert, you can get a performance boost. NOT always, but sometimes.
BUT My opinion - test it. If you don't know for sure, try it out.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
January 19, 2024 at 3:18 pm
Assume I have a table with a clustered datetime-index. Assume I'm going to insert lots of data. The data to be inserted is not in any specific order. Would the INSERT go faster if the data to be inserted is sorted by the datetime column?
I am just guessing here but inserting random data here would increase the chances of alot more pagesplits and such instead of the data already would be sorted?
The correct answer is "It Depends". If you do index maintenance correctly (NOT ever using REORGANIZE or the 5/30 supposed "Best Practice"), the use of random values can prevent page-splits in a lot of ways that most people can't even dream of because of all the misinformation that's out there for more than 2 decades. It also depends on whether or not you're inserting batches of data or single rows, etc, etc.
If it's a new or freshly truncated table that you'll be inserting into and it has only a clustered index, then you can actually get some serious performance and super low log file usage with no page-splits by following the requirements for "Minimal Logging", but it only works on the first insert to the new/freshly truncated table regardless of batch size.
Also, will there be any "updates" after the inserts? An "Ever Increasing" keyed index can be a real sore-spot for page splits there where a "random keyed" index will nicely absorb those.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 19, 2024 at 3:47 pm
You could/should test it, here is an example of a test:
set statistics io, time off
set nocount on
go
drop table if exists #a;
create table #a(Id int, Date date, col1 varchar(50),col2 varchar(50));
go
drop table if exists #b;
create table #b(Id int, Date date, col1 varchar(50),col2 varchar(50));
create clustered index ix_#b_1 on #b(date, col1);
go
drop table if exists #c;
create table #c(Id int, Date date, col1 varchar(50),col2 varchar(50));
create clustered index ix_#c_1 on #c(date, col1);
go
insert into #a
SELECT
Id = value,
RandomDate = DATEADD(day, (ABS(CHECKSUM(NEWID())) % 65530), 0),
RandomString = CAST(NEWID() AS NVARCHAR(255)),
RandomString2 = CAST(NEWID() AS NVARCHAR(255))
FROM generate_series(1, 10000000, 1);
go
set statistics io, time on;
set nocount off;
go
print '****************************** Insert in any order'
insert into #b
SELECT *
FROM #a;
go
print '****************************** Insert in order'
insert into #c
SELECT *
FROM #b
order by date, col1;
set statistics io, time off
Results:
****************************** Insert in any order
SQL Server Execution Times:
CPU time = 35313 ms, elapsed time = 18088 ms.
****************************** Insert in order
SQL Server Execution Times:
CPU time = 6312 ms, elapsed time = 6311 ms.
So in that example on my machine it is about 3 times faster to insert in the order of the clustered index
January 20, 2024 at 3:17 pm
Be careful when testing. In this case, the OP hasn't defined what he means by "Assume I'm going to insert lots of data". Does that mean for just one, multi-row insert or a whole lot of single row inserts or what?
We need for the OP to identify that and also answer the questions in my previous post above.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 20, 2024 at 3:47 pm
Most likely scenario is probably a script with lots of INSERTs.
January 20, 2024 at 4:26 pm
Most likely scenario is probably a script with lots of INSERTs.
Well you can test that too. I edited the original script I provided to do multiple inserts:
set statistics io, time off
set nocount on
go
drop table if exists #a;
create table #a(Id int, Date date, col1 varchar(50),col2 varchar(50));
go
drop table if exists #b;
create table #b(Id int, Date date, col1 varchar(50),col2 varchar(50));
create clustered index ix_#b_1 on #b(date, col1);
go
drop table if exists #c;
create table #c(Id int, Date date, col1 varchar(50),col2 varchar(50));
create clustered index ix_#c_1 on #c(date, col1);
go
insert into #a
SELECT Id = value,
RandomDate = DATEADD(day, (ABS(CHECKSUM(NEWID())) % 65530), 0),
RandomString = CAST(NEWID() AS NVARCHAR(255)),
RandomString2 = CAST(NEWID() AS NVARCHAR(255))
FROM generate_series(1, 100000, 1);
go
declare @id int, @date as date, @r1 as nvarchar(255), @r2 as nvarchar(255);
declare @startdate datetime2(7) = GETDATE();
declare cursor_a cursor for select * from #a
open cursor_a
fetch next from cursor_a into @id, @date, @r1, @r2;
while @@FETCH_STATUS = 0 begin
insert into #b values (@id,@date,@r1, @r2);
fetch next from cursor_a into @id, @date, @r1, @r2;
end
close cursor_a;
deallocate cursor_a;
print concat('Insert in any order ', datediff(ms,@startdate, getdate()), ' ms');
go
declare @id int, @date as date, @r1 as nvarchar(255), @r2 as nvarchar(255);
declare @startdate datetime2(7) = GETDATE();
declare cursor_b cursor for select * from #b order by date, col1;
open cursor_b
fetch next from cursor_b into @id, @date, @r1, @r2;
while @@FETCH_STATUS = 0 begin
insert into #c values (@id,@date,@r1, @r2);
fetch next from cursor_b into @id, @date, @r1, @r2;
end
close cursor_b;
deallocate cursor_b;
print concat('Insert in order ', datediff(ms,@startdate, getdate()), ' ms');
Results:
Insert in any order 5510 ms
Insert in order 5276 ms
So in this case it's not that significant which method you choose as most of the processing is running the lines of code.
January 20, 2024 at 5:37 pm
if it is a script you are generating from another system I advise you to do blocks of inserts each inserting a good chunk of rows using VALUES as per example below
insert into TableName with (tablock) -- may help
(Column1
, Column2
)
select Id
, SomeName
from (values
(1, 'A')
, (2, 'B')
, (3, 'C')
) x (Id, SomeName)
January 20, 2024 at 5:42 pm
Well you can test that too.
My point is... the OP hasn't identified the scenario accurately enough to even try to determine which kind of testing needs to be done.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 20, 2024 at 6:50 pm
January 22, 2024 at 4:57 am
Jeff Moden wrote:Jonathan AC Roberts wrote:Well you can test that too.
My point is... the OP hasn't identified the scenario accurately enough to even try to determine which kind of testing needs to be done.
The OP identified:
oRBIT wrote:Most likely scenario is probably a script with lots of INSERTs.
Lordy. My apologies. I'd missed that reply. I'll go back and look again. Thanks, Jonathan.
Still... does that actually mean just a row or two with a lot of INSERTs or is he calling rows "Inserts" and there'd only be one?
Guess I'll go with a lot of executions of an INSERT statement.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply