October 29, 2007 at 4:34 am
Peter Gadsby (10/26/2007)
It seems that indexing dropping/creation is causing an issue
Do you need to drop all the indexes and re-create them? Have you run though a test scenario with at least clustered indexes in place?
Peter Gadsby (10/26/2007)
Hi,The amount of data increased a very small amount, the inital part of the ETL does transfer data over the network from Oracle to SQL Server, but this part of the process has not increased in time. The processing that has increased are the stored procedures (called from DTS) that are running on the SQL Server end.
I'm thinking it is probably an I/O bottleneck as there are only 4 disks at RAID-5 as the memory/CPU utilisation doesn't seem very high when the process is running.
The machine has 2 processors hyperthreaded and I don't see much parellel processing when doing sp_who2 which I find a little strange.
Inefficient indexing is quite likely the problem here. Best advice is to pull apart each stored proc and ensure it has appropriate, efficient indexes to support the read/write activity that each statement performs.
As a couple of posters have already alluded to, lack of appropriate indexes is a design flaw that I see on a daily basis.
--------------------
Colt 45 - the original point and click interface
October 29, 2007 at 4:36 am
Also, how big are the databases involved in the ETL process?
Anything over 2GB and you'll definitely have an I/O bottleneck. Another design issue that the architect should have planned for in the hardware specifications.
--------------------
Colt 45 - the original point and click interface
October 29, 2007 at 4:38 am
Peter Gadsby (10/29/2007)
Update process1. Drop indexes on another TRANSFORM table
2. Create new indexes on other TRANSFORM Table
3. Add indexes on TRANSFORM_TABLE
4. Perform a number of updates from other TRANSFORM table.
I'm not sure why you're dropping indexes just to recreate them. Or am I reading that wrong?
Have you tried, as a comparison, removing all the drop index/create index from the process and running it?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 29, 2007 at 4:51 am
Hi Gail,
The indexes are dropped because a different set of indexes are created. The reason for this was so that the process didn't fail when the indexes were created if they already existed.
The process could have checked for the existence of indexes before doing this, which would improve performance,but still doesn't solve the actual problem which is that the process has got significantly slower.
The database in totol is about 100Gb by the way.
Peter Gadsby
Business Intelligence Consultant
www.hgconsult.co.uk
October 29, 2007 at 5:25 am
The database in totol is about 100Gb by the way
Ok that right there is going to give you a very large bottleneck.
I'm betting your Page Life Expectancy is less than 100, or thereabouts, as SQL is constantly shuffling data out to disk to make room for other data.
How much data is updated during each ETL run?
--------------------
Colt 45 - the original point and click interface
October 29, 2007 at 5:31 am
The ETL performs (generally) a full refresh of data, so NO indexes are updated
The indexes are dropped prior the the insert and then after the insert the indexes are created. An set of updates are then performed.
Peter Gadsby
Business Intelligence Consultant
www.hgconsult.co.uk
October 29, 2007 at 5:39 am
Peter Gadsby (10/29/2007)
The amount of data stored in the other TRANSFORM table is around 0.8Million rows.Any ideas?
What kind of data do you load?
How often do you perform this process?
If assume it's daily update then 0.8 million records give us 10 records per second around o'clock.
I would split it into smaller chunks. It's usually faster to process 20 chunks of 40k rows that whole set of 800k rows.
And processing of 600 records per minute should not be a problem at all - it must take less that a second to run.
Make a note - when you creating indexes SQL Sever needs some disk space for it. How fragmented are other tables in your database?
Because you're performing a lot of updates I'm pretty sure - terribly fragmented.
It makes creating indexes sloooow.
And you know, if someone would ask me, I'd say that I would consider an idea of regular rebuilding indexes on 800k rows table very stupid. At least rebuilding clustered index - for sure. You can make some better use of that metal box with some silicon inside.
_____________
Code for TallyGenerator
October 29, 2007 at 5:44 am
Peter Gadsby (10/29/2007)
The ETL performs (generally) a full refresh of data
As I said - bad design.
Incremental updates are ALWAYS more effective.
"Full refresh" is something what should never exist in database.
_____________
Code for TallyGenerator
October 29, 2007 at 5:46 am
I don't understand why you think the indexes will be fragmented... The updates do not perform any updates to columns that are indexes.
The process carefully creates the indexes in a sequence so that the updates don't touch the indexes.
With regards to being stupid removing indexes... When you insert a million rows it makes sense not to have any indexes present, as the insert process will insert rows into the table and the indexes.
Also I have over 15 years experience of working with data warehouse systems which include working on very large data warehouses (circa 15Tb ) so I do know a little bit on how to process a large amount of data.....
You still miss the basic point which is that with no code changes and a minimal amount data growth the process is taking a lot longer.
Peter Gadsby
Business Intelligence Consultant
www.hgconsult.co.uk
October 29, 2007 at 6:30 am
I read through this, but forgive me if I missed someone else mentioning either of these.
I have heard nothing about the performance of the ETL server, just the database server? If the DB server is idle but slow, are you sure that it is the problem? Has that code changed? I thought I read that it has not.
Next I heard mention of some changes with the stored procedures. "Set nocount on" is still at the beginning of the stored procedures. Right?
Lastly 100G DB with 2G RAM and 2 Processors. How could you not have it pegged and swapping with full index rebuilds? Sorry something isn't right here. Something has to have changed in the process, but nothing has been said about what could have changed... Any DB settings? Upgrades, patches, anything.
October 29, 2007 at 7:19 am
Hi Bob,
Thanks for your post... Just to avoid confusion ... The process is running within DTS (also on the same box) All DTS is doing is extracting the data from the source system (Oracle) and performing the stored procedures.
I don't understand the question 'How could you not have it pegged and swapping with full index rebuilds' can you let me know what you mean by that.
As I said nothing regarding the code has changed. That is why I thought it might be some kind of hardware issue.
Peter Gadsby
Business Intelligence Consultant
www.hgconsult.co.uk
October 29, 2007 at 7:29 am
With such a significant database size the amount of data movement has to be very large and the recreating of the indexes will be very resource intensive. With on 2 GB of RAM it seems like that would be your primary bottleneck along with the hard disk the swap file is on since the index itself will most likely take a large portion while being a processed. However that said I have done this sort of thing before with 1 GB (1 processor) and a 70 GB database it just takes time becuase the hardware is limited, if you can budget for it I would personnally double if not quadruple the memory.
However a dumb question did you confirm the auto shrink is off and are you pregrowing the files to limit the need for the system to stop, calculate, and grow possibly multiple times.
October 29, 2007 at 7:32 am
Oh btw it is common for folks to do data loads by dropping indexes first which I got the impression was a fully reload. However, if I am wrong you should consider the number of changes to be made and unless is a significant percentage of the database contents you might not want to drop and recreate. As well, make sure if you have a clustered index that you add it first and then the non-clustered since if you do the other way around all non-clustered indexes will be rebuilt again once the clustered index is created.
October 29, 2007 at 7:38 am
HI Antares686,
Thanks for your post.... I have run performance monitoring during the load CPU does not get maxed out,
although the disk queue length are fairly large (50-146). The machine has 6 disks in RAID-5 configuration.
I haven't been able to see if the memory is being used to the maximum as the SQL Server instance is allocated most of the memory. Do you know which performance stat I should be using for memory used?
The Autoshrink is indeed off and the database(s) are sized correctly and are currently 50% free.
As the disks are local as opposed to SAN attached I have also checked the fragmentation of the disks and they seem fine.
I have read elsewhere that increasing the number of disk spindles and changing the RAID configuration to RAID 0+1 will increase I/O througput so I am going to give that a go on another server to see if this improves things.
The other machine also has 8Gb of RAM so I will be changing the SQL Server 2000 version from standard to enterprise and enabling AWE.
Peter Gadsby
Business Intelligence Consultant
www.hgconsult.co.uk
October 29, 2007 at 7:40 am
Most of the tables don't have clustered indexes as they are interim tables and are only used for the ETL processing. The ones that do have clustered indexes the indexes are dropped and the clustered index is added first.
Peter Gadsby
Business Intelligence Consultant
www.hgconsult.co.uk
Viewing 15 posts - 16 through 30 (of 74 total)
You must be logged in to reply to this topic. Login to reply