October 26, 2007 at 4:12 am
Hi all,
I have a SQL Server 2000 standard edition instance running on Windows 2000 SP4 with 2Gb of memory. The server has local disks (RAID-5) 6 disks.
The SQL Server 2000 version is :- 8.00.2187 SP4 Standard Edition
The machine is running a small data warehouse solution. Over the last few months the ETL process has gradually got slower and slower.
We run weekly optimisation and integrity checks and also re-indexing and statistics updates.
Over the last few weeks the load has run significantly slower. I have monitored the memory and CPU usage each very low. I have also monitored the I/O using the Disk Queue Length performance monitor I am seeing this value between 1-5 which I think is fairly low.
The machine is a Dell 2850 I have run the Dell diagnostic checks on the RAID controller and the disk subsystem.
Does anyone have any thoughts on what I should look at next.
Thanks
Pete
Peter Gadsby
Business Intelligence Consultant
www.hgconsult.co.uk
October 26, 2007 at 7:18 am
I would run profiler over the time that the ETL process runs. Capture the SQLBatchCompleted event (under T-SQL) and the RPC completed event (under stored procedures)
That will give you a breakdown of what statements are run against the database and how long they take. that should give you a good indication of which commands are the problem.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 26, 2007 at 7:27 am
hi thanks for that GilaMonster..
I have run SQL Profiler and identified the areas that have been running slowly. It seems that indexing dropping/creation is causing an issue. But overall it seems each query is running significantly slower than before.
To give you an idea of the timings the ETL used to take around 4-5 hours to complete, it is now taking 13 hours!!!! Even though no code has changed
Peter Gadsby
Business Intelligence Consultant
www.hgconsult.co.uk
October 26, 2007 at 8:04 am
Has the amount of data been transfered increased? Has the size of the tables increased?
Is this ETC across networks? If so, is the network handling the load?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 26, 2007 at 9:55 am
Hi,
The amount of data increased a very small amount, the inital part of the ETL does transfer data over the network from Oracle to SQL Server, but this part of the process has not increased in time. The processing that has increased are the stored procedures (called from DTS) that are running on the SQL Server end.
I'm thinking it is probably an I/O bottleneck as there are only 4 disks at RAID-5 as the memory/CPU utilisation doesn't seem very high when the process is running.
The machine has 2 processors hyperthreaded and I don't see much parellel processing when doing sp_who2 which I find a little strange.
Peter Gadsby
Business Intelligence Consultant
www.hgconsult.co.uk
October 26, 2007 at 10:07 am
Just a thought here but how does the data insert within the confines of the tables layout. Consider your indexes and your primary key and how the data would go into each item, make sure you have proper fill factor and padding within the index to support changes if they would occur throughtout the structure.
Also, a common reason for delays like this end up being page splits/allocation and database growth. What is your database growth rate for database and log each.
It could be due to the RAID configuration which 5 has a lot of read/writes for dealing with Parity but without changing the hardware consider what you might have at the root first.
October 26, 2007 at 10:09 am
There's a fairly well-known issue around SQL and hyperthreading. I know it sounds strage, but see if you can get the hyperthreading disabled and see if the ETC improves at all. (you can search for Slava Oks and hyperthreading to find the article)
All I can suggest is to take the stored procs that seem to be taking the longest, and, if you can, run them in QA and have a look at the execution plans. See if there's anything obviously wrong. (table scans, high cost aggregates/sorts). Post the queries here along with the text showplan if you're unsure
Check the transfers/sec to get a measure of IO performance, the page splits/sec, cache hit ratio and maybe locks and lock waits/sec.
Have you checked for blocking? Is any data file/log file perhaps autogrowing during the ETL?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 26, 2007 at 10:12 am
There definitily isn't any locking, because during the night this is the only process that is running and each stored procedure is run sequentially.
With regards to the performance monitoring I have run the Disk queue length one which although the peek was about 45 didn't seem that bad. The buffer hit cache was between 98% --> 99% roughly so that is ok.
I haven't checked transfers/sec and page splits/sec what would you expect the transfer/sec rate to be? and what should the page splits/sec be?
Peter Gadsby
Business Intelligence Consultant
www.hgconsult.co.uk
October 26, 2007 at 10:20 am
What transfers/sec is good depends on the hardware you have. Diffrent disks and disk configs have different max capacities. Maybe someone else will be able give some figures.
Page splits/sec should be very low. The lower the better.
I know you said you have index rebuild and stats update jobs, but just check the fragmentation and stats dates of some of the tables involved. Make sure they are what you would expect.
Sorry, is not much help I know. Is hard without been able to poke around myself.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 28, 2007 at 4:51 pm
Issues like "process has gradually got slower and slower" have nothing to do with disks, memory, hyperthreading or anything like that.
It's bad design issue.
It was preprogrammed by you System Architect.
Even if you don't have one - somebody made those architectural decisions which are bringing you this trouble.
You may spend big bucks on flash new server, upgrade network, apply some fine tuning - it will buy you another couple of month, may be half a year.
But your system is doomed. Doomed by its design.
What you really need to do is to hire a professional System Architect and redesign your system before it's stopped working completely.
And this time do smart thing: populate your test (development) database with at least 5 years volume of data and test your application in real world environment.
_____________
Code for TallyGenerator
October 29, 2007 at 12:23 am
Sorry, but that is tosh...... If a CREATE INDEX takes 10 minutes previously and then takes over an hour now, that is NOT bad design that is an issue with one or more of either memory , CPU or I/O subsystems...... Thanks for you comment but it doesn't really help me!
Peter Gadsby
Business Intelligence Consultant
www.hgconsult.co.uk
October 29, 2007 at 3:52 am
Peter Gadsby (10/29/2007)
Sorry, but that is tosh...... If a CREATE INDEX takes 10 minutes previously and then takes over an hour now, that is NOT bad design that is an issue with one or more of either memory , CPU or I/O subsystems...... Thanks for you comment but it doesn't really help me!
Probably amount of data is changed.
Don't you think?
You system is not designed to handle big amounts of data.
Sorry, but that's the fact.
If you have to recreate indexes on the whole table time after time - it's a mistake in design.
Just first one you exposed.
_____________
Code for TallyGenerator
October 29, 2007 at 3:58 am
Peter, can you describe the ETL process, step by step, along with the durations and aprox data rows affected for each step. Without some idea what's happening, we're shooting in the dark, blindfolded.
It may very well be data volumns increaasing beyond expected. You may be running into hardware botlenecks. It may be bad design. Is very hard to tell at this point.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 29, 2007 at 3:59 am
I think the design of the DB itself is a big FLAW. Each & every time if you go back & try to create indexes that may cause the reflections on the performances of the DB. Its always better to take a script of the schema objects once the design is finailised. Then to go for normalizing the DB will do the vital factor. So, better have the existing schema as a back up script file, discuss with your ARCHITECT on the areas where the loop holes are identified by you, try to arrest them (taking some inputs from MICROSOFT BEST PRACTICES) and finish off the rest of the stuff.
THIS WILL ARREST THE PROBLEM.
October 29, 2007 at 4:27 am
Hi,
In simple terms the process does the following:-
1. Extract data from source system into RAW schema.
2. Load data into a transform table(s)
3. Load anomalies into an anomaly table (moving errors and leaving warnings)
4. Load data into FACT / Dimension tables.
It is a bit complex to describe the whole of the ETL. But one stored procedure that performs the above does the following:-
Insert process
1. Truncate TRANSFORM table
2. Drop indexes TRANSFORM Table
3. Insert into TRANSFORM table from RAW table
Update process
1. Drop indexes on another TRANSFORM table
2. Create new indexes on other TRANSFORM Table
3. Add indexes on TRANSFORM_TABLE
4. Perform a number of updates from other TRANSFORM table.
The insert process works fine, the update process just dropping and creating the indexes on the other TRANSFORM table takes around an hour to complete, and previously would take far less time.
The amount of data stored in the other TRANSFORM table is around 0.8Million rows.
Any ideas?
Peter Gadsby
Business Intelligence Consultant
www.hgconsult.co.uk
Viewing 15 posts - 1 through 15 (of 74 total)
You must be logged in to reply to this topic. Login to reply