September 22, 2022 at 8:42 am
Hi everyone,
I can provide more information to understand my case better if its missing from above and if required I can provide the sql queries I am executing.
My question is basically how can I improve this timing. Index scan is being used when selecting data for inserting because it is supposed to carry all data to permanent tables. I am not using where or any other clause.
Number of rows in each file is upto 100,000,0.
September 22, 2022 at 9:14 am
Are you inserting from the temporary tables to the same 'permanent' table in each case? Or are there different targets?
100,000,0 is not a number – please clarify.
I'd suggest doing the initial inserts into permanent 'staging' tables, rather than temporary tables. TRUNCATE the tables before doing the initial inserts and call dedicated stored procedures to move the data from the staging tables to the permanent table.
As you are inserting all data, indexes will slow things down rather than speed them up.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
September 22, 2022 at 9:25 am
Are you inserting from the temporary tables to the same 'permanent' table in each case? Or are there different targets?
100,000,0 is not a number – please clarify.
I'd suggest doing the initial inserts into permanent 'staging' tables, rather than temporary tables. TRUNCATE the tables before doing the initial inserts and call dedicated stored procedures to move the data from the staging tables to the permanent table.
As you are inserting all data, indexes will slow things down rather than speed them up.
Are you inserting from the temporary tables to the same 'permanent' table in each case? Or are there different targets?
Yes same table.
Like I have a table named "Individual". I have created the exact same temporary table with same columns named "Individual_TEMP".
First all the new rows goes into TEMP table, there I change the things I want to and then move the data to Individual using
"Insert into individual <column names>
select <column names>
from Individual_TEMP"
And then I use "delete from Individual_TEMP" to clear the TEMP table before loading the new file.
I'd suggest doing the initial inserts into permanent 'staging' tables, rather than temporary tables. TRUNCATE the tables before doing the initial inserts and call dedicated stored procedures to move the data from the staging tables to the permanent table.
These are staging tables, I was just using the term temporary as data is moved to them temporary.
Can you please tell me which dedicated sps you are talking about?
As you are inserting all data, indexes will slow things down rather than speed them up.
If I delete all the indexes from permanent tables and create them after all files data is uploaded, will that help?
September 22, 2022 at 9:44 am
Referring to staging tables as temporary tables is confusing!
For each of the staging tables, create a stored procedure. This stored proc should take care of 'changing the things you want' and inserting to the target table.
The pattern I would use would be as follows:
You should perform testing on disabling indexes before inserting and then re-enabling them. I would expect an improvement in performance by doing this. If you do disable indexes, you may also find that running part (3) in parallel gives you performance gains.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
September 22, 2022 at 10:15 am
Referring to staging tables as temporary tables is confusing!
For each of the staging tables, create a stored procedure. This stored proc should take care of 'changing the things you want' and inserting to the target table.
The pattern I would use would be as follows:
- TRUNCATE all staging tables involved in the job. (TRUNCATE is significantly faster than DELETE.)
- Import from XML files to staging tables. Importing several files in parallel is likely to give you a performance lift, though testing is required to be sure.
- Call your stored procedures in sequence.
You should perform testing on disabling indexes before inserting and then re-enabling them. I would expect an improvement in performance by doing this. If you do disable indexes, you may also find that running part (3) in parallel gives you performance gains.
Ok I will all the things you mentioned and report back.
One more question though. When we assign primary key to a column it creates clustered index by default on that column but I know that in future I will be using a different column or columns to select or update my data.
So is it normal to delete the primary key clustered index to create your own clustered index on your desired columns?
September 22, 2022 at 11:03 am
One more question though. When we assign primary key to a column it creates clustered index by default on that column but I know that in future I will be using a different column or columns to select or update my data.
So is it normal to delete the primary key clustered index to create your own clustered index on your desired columns?
When you create a PK, you have the option of making it clustered or nonclustered.
Without seeing some DDL, I cannot advise on the suitability of creating a clustered PK on multiple columns.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
September 22, 2022 at 11:06 am
One more question though. When we assign primary key to a column it creates clustered index by default on that column but I know that in future I will be using a different column or columns to select or update my data.
So is it normal to delete the primary key clustered index to create your own clustered index on your desired columns?
no its not normal to delete the PK - what is normal is to define the PK as non clustered in the first place!! as per example below.
create table xx (f1 int, f2 int)
alter table xx add constraint pk_xx primary key (f1) non clustered
create clustered index CI_xx on xx (f2)
September 23, 2022 at 7:55 am
This was removed by the editor as SPAM
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply