July 28, 2009 at 12:08 pm
I have a table with 107 Million rows in SQL2005. This table grows around 400, 000 rows everyday. I insert data from a flat file through SSIS (automatic process that runs every morning).
My SSIS process does:
- Read the Flat File (itβs size 35MB)
- Transform the data
- Insert data into a Bulk table
- Run a stored procedure to insert the data from the Bulk table and other join tables into the destination table (that one that has the 107 Million rows)
In order to improve performance in the insertion into the table, Iβm planning to do a partition.
My question is:
What is the best partition I should apply to this case if I insert data every day?
Should I improve some process in my SSIS?
Regards,
July 28, 2009 at 1:04 pm
A couple questions for you:
1. Does your table have a clustered index on it? If so, what column/datatype?
2. Why load the data twice, once into a staging table, once into your destination table? Did you do any benchmarking between this method and performing your lookups/joins within SSIS prior to loading the data?
3. What type of DB is this, OLTP, OLAP?
July 28, 2009 at 1:12 pm
A couple questions for you:
1. Does your table have a clustered index on it? If so, what column/datatype?
2. Why load the data twice, once into a staging table, once into your destination table? Did you do any benchmarking between this method and performing your lookups/joins within SSIS prior to loading the data?
3. What type of DB is this, OLTP, OLAP?
Hi,
1. My destination table has a clustered index (Id, date)
2. I would love to do it in one step, the Stored procedure join some tables and the staging table to get the columns that I need to insert into the destination table, I dont do to much transformation from the flat file. I just dont know how to do it in SSIS
3. OLAP
π
July 28, 2009 at 1:28 pm
Is this import the only way data is getting inserted into your DB? How long does it take to import your 400,000 rows?
July 28, 2009 at 1:37 pm
Tables are partitioned for just two possible reasons: 1) To help during purging and/or, 2) To help during quering.
If a partitioning strategy does not matches at least one of the above mentioned reasons the strategy is wrong and if deployed would just add overhead and cause additional problems.
When facing the desicion of partitioning a table ask yourself...
a) Is there a purging or archiving strategy I could help by resorting to partitioning?
b) Are most queries having a general predicate that would perform better by resorting to partitioning?
Once you get those answers you can design and evaluate a partitioning strategy -not before.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.July 28, 2009 at 1:38 pm
This is the way i want to do it now...
Right now we have a Bulk insert procedure on a stored procedure.
It takes around 120secodns cause the table is not partitioned yet (I guess)
July 28, 2009 at 1:44 pm
Thank you Paul,
I need to query and load data faster. I have read about partitioned tables to load data into that table that has millions of records..
What to do you think ?
July 28, 2009 at 1:46 pm
MTY (7/28/2009)
It takes around 120secodns cause the table is not partitioned yet (I guess)
So you have a daily ETL process that runs for 120 seconds?
Imagine you work for a week or two and get your process to perform twice as faster then it will run in 60 seconds.
Your net gain is 60 seconds out of the 86,400 seconds you have on that day... is it worth it?
On the other hand, such partitioning strategy is helping neither Purging/Archiving nor Quering so I wouldn't consider it.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.July 28, 2009 at 1:58 pm
What do you think John
July 28, 2009 at 2:05 pm
PaulB (7/28/2009)
Tables are partitioned for just two possible reasons: 1) To help during purging and/or, 2) To help during quering.If a partitioning strategy does not matches at least one of the above mentioned reasons the strategy is wrong and if deployed would just add overhead and cause additional problems.
When facing the desicion of partitioning a table ask yourself...
a) Is there a purging or archiving strategy I could help by resorting to partitioning?
b) Are most queries having a general predicate that would perform better by resorting to partitioning?
Once you get those answers you can design and evaluate a partitioning strategy -not before.
I'd have to disagree. Using partitioning to help with purge performance takes advantage of partition switching which can be just as advantageous when used with importing data. Switching data into a table versus inserting data into a table can be much better in terms of performance just as switching data out of a table typically performs better than deleting data out of a table.
The challenge that you run into when using partition switching with data imports is that there needs to be some strict constraints build into your table that ensures that there is always an empty partition for the imported data to land in prior to the switch. This means that you should either have all imports for the day/hour/minute (or whatever your partition grain is) happen in one process or you need to make your import process smart enough to diagnose and manage the partitions. This is very doable, but takes a good bit of work along with a solid understanding of the under-the-hood partitioning functionality.
Your net gain is 60 seconds out of the 86,400 seconds you have on that day... is it worth it?
If this makes the difference in a SLA or delivery time commitment, absolutely!
Is the 120 seconds a problem for your business?
July 28, 2009 at 2:13 pm
Actually is for my boss. He doesnt want to run a Bulk Insert with SQL. He prefers to run a dtexec package to use PC resources instead of SQL.. I see this is the same.. What do you think ?
July 28, 2009 at 2:28 pm
If you are staging the data in a table and then using a stored procedure to 'link' those rows to related data and insert them into your destination table, then you are putting the processing load in the database engine.
The only way to minimize the load on the DB engine is to revamp your SSIS package to perform all of the data processing within the SSIS serivce and insert the rows one time into the DB (which happens via bulk insert if you use fast-load option anyways).
So back to your admission that you did this in a SP because you did not know how to do it in SSIS. I can give you some SSIS pointers if you can give me a good examlpe of what you are currently doing w/ your data after loading it into your staging table.
July 28, 2009 at 2:45 pm
Thank you so much John.. This is my tored procedure:
Let me know if that makes sense and how i can do it on my SSIS... π
-- insert ID into a table om database from bulk table join another table
INSERT Table1(ID)
SELECT b.ID
FROM BulkTable(staging table)b with(nolock)
left outer join Table1 a with(nolock) on a.ID = b.ID
WHERE a.ID is null
INSERTTable1 (ID, ID2)
SELECTa. ID, 0
FROM Table1 a with(nolock)
left outer join Table1 a with(nolock) on at.ID = a.ID
WHERE at.ID is null
-- insert data into destination table
INSERT TableDestination
(
ID
, Date
, Hub
, Date2
, State
, ST_ID
, M_ID
, machine
, Date3
, Date4
)
SELECTa.ID
, ab.Date
, ab.Hub
, Date2
, convert(int,[State]) , convert(int, ST_ID) , mso.M_ID, Machine
, convert(datetime, ab.Date3)
, convert(datetime, (ab.Date4 + ' ' + ab.Time4))
FROM Table1 a with(nolock)
join BulkTable(staging table)ab with(tablock) on a.ID = ab.ID
join dbo.MSO mso with(tablock) on ab.MSO = mso.Name
Truncate table BulkTable(staging table)
July 28, 2009 at 2:59 pm
What is the deal with table1? Is this a temp table, table variable? What are you trying to accomplish with it?
July 28, 2009 at 3:07 pm
What is the deal with table1? Is this a temp table, table variable? What are you trying to accomplish with it?
Table1 is a phisycal table, I need to know what data isnt there comparing this with the new Bulktable and then Insert the information into Table 1 (I have to do this first to continue with the destination table)
Then I do the same process comparing Table1 with another table. (its a one to many relationship)
Finally to insert into the destination table I take the info from Table1 join BulkTable
make sense ?
Viewing 15 posts - 1 through 15 (of 45 total)
You must be logged in to reply to this topic. Login to reply