February 23, 2006 at 2:28 pm
Enable the AWE and see if it do any good.
February 23, 2006 at 3:03 pm
Yes, we have seen this also, in another larger system we develop, but in this instance, the client is not keen on changing the parallelism setting as they're worried how it might affect their other databases on the server.
I will ask them again if they can change it, or we specify in our stored procedures to only use the one processor for its execution.
February 23, 2006 at 3:31 pm
I worry sometimes when I see posts which advocate disabling parallelism. Having worked with boxes with up to 16 processors I have never been keen to make changes. In most performance tuning if parallelism is an issue you'll find it on a two way box ( or one proc with HT ) it does not just appear when you get over 4 procs or so. The only step I'd recommend is setting the option to the number of physical procs , assuming you have HT giving virtual procs.
I spend much of my time tuning production systems and in almost all cases "problems" with parallelism are a problem with inadequate indexing or poorly constructed ( or over complex - e.g 16 table joins ) tsql.
Turning down the procs can affect many of your maint tasks, and to be blunt why have a multi proc box if you're going to set everything to use 1 proc ?
The only other advice is that Hyperthreading is properly supported with w2003 so if you're running w2k I suggest an upgrade. Having been the " it runs fine on dev" route many times I've found the following :- inadequate disk subsystem ( especially sans ) forgetting the prod box has multiple databases / logs . another busy database sharing log drives which causes waits .. run a check on waits to find out if you have disk, network, rpc, log problems. Memory differences - few users on a dev box - loads on a prod box.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
February 23, 2006 at 3:51 pm
We've narrowed it down to being a database/server issue. Not network or front end app related.
While this database runs fine on the development server, it also runs fine at all our other client sites, one of which also have a Quad Xeon processor server, although they wouldn't have many other databases on there. And not sure on the number of connections at this stage, but they have around 20 users. Our problem case here, it can't cope with as few as 6 users. But it's definitelly the database/server that is the issue, as we've ran one of the stored procedures that is used often during the day, using Query Analyser and found that it took 9 seconds to execute, (which is totally unacceptable). Getting a number of users to run this at once creates long delays and eventually timeouts.
I will compare the execution plans for this stored proc as a test case between the two servers next.
February 27, 2006 at 1:18 pm
what about the disks? do both servers have the same raid configuration? raid controller? number of physical disks? RPM's?
how about phyiscal fragmentation? how full are the disks on prod vs dev?
February 28, 2006 at 11:55 am
For what it is worth:
We recently finished merging data from another company into ours. We are running SQL 2000 SP3 (IA64), 4 CPUs. Our main db is 125 GB. After reindexing, performance went in the can. Virtually everything was going parallel (CXPACKET waittypes all over).
We selected a key query, did an update statistics full scan on the tables involved and query performance was back to where it was prior to the data merger. Comparisons with a test environment helped us zero in on the index statistics being the culprit here. The lesson that I learned was to make certain that the sampling on updates statistics is high enough to get results that the optimzer can use.
March 21, 2006 at 8:58 pm
Got the execution plan from the production server. Turns out quite different, on what are identical databases on both servers. Took a backup form one and restored to the other.
They key differences are that on the production server, we're seeing Nested Loops for the joins and a couple of table spools and index spools. On the development server we're seeing Hash Matches instead and no Table/Index spools. On the production server, there is also a huge sort happening on one of the linked tables before it's joined with another which is 20% of the cost. I don't see that on the execution plan on the development server.
March 22, 2006 at 1:46 am
cool! So are the versions and sp's identical for sql server, o/s, mdac on both servers? Are you using the same client to run the query against both servers?
Are your server config settings the same on both servers?
I'd suggest updating stats on both databases and running your test again.
Out of interest do you have table variables within your query?
Having done lots of tuning, and in fact my current role is specifically db tuning, if there is a difference across servers for a plan then there is likely some major factor at play. Things I've found thta impact like this are :-
version of sql server, table variables, statistics, user defined functions ( yup I don't know quite why - but I was asked to look at a long running data import which effectively did a select * from a view for a DW load - this ran significantly quicker on the dev box then the prod box. We upgraded the prod disk subsystem to make it better/equal the dev disk subsystem as the two didn't match ( one was also san based ) this improved matters slightly - the issue in the end was an in-line function in the select, when I removed that the import ran in a few seconds instead of several hours - why it didn't quite work that way on the dev box I don't know )
If you're seeing a different plan it's highly unlikely to be hardware.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
March 22, 2006 at 2:51 pm
I have experience the same type of performance issues on production servers having more processors than the development and test servers. I believe the issue is related to parallelism/hyperthreading. Try adding the folowing statement on the end (after the WHERE clause) of your query to indicate the nuber of processors to use when processing the query.
OPTION (MAXDOP 1)
It worked for us, I hope it helps.
March 23, 2006 at 1:53 am
aaarrghhhh!!! If it's paralellism then it will show clearly in the plan(s).
Why buy a smp box and then restrict the procs ?
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
March 23, 2006 at 3:41 pm
Thanks for all the suggestions guys. Every bit helps as it kicks me off investigating various avenues.
The SQL Server configuration is the same between the two, basic standard SQL server install.
I have sent a couple suggestions to the client to try updating the statistics and run the stored proc again and send me the execution plan. Waiting on a response.
I don't believe it's parallelism, as we've tried that option and made no difference. Only thing that made a difference was when we put the FORCE ORDER option in. Other than that, the stored proc is pretty basic, with a couple of joins to a couple of tables and a view.
March 24, 2006 at 1:52 am
a view - well there's a thing - this well may be your problem - try replacing it with a function - or put the actual tables in the proc as a test - or use a derived table in place of the view.
The reason I say this is that I had some bad procs and the problem was a self joined view that was joined in the procedures. The self join did two entire table scans regardless of selection when joined this way. Try using profiler to check the execution plan and stats.
You might want to check that the numbers of rows in the underlying tables within the view are the same in all cases.
I know views are loved but sometimes they are a real pain in the butt !!!
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
March 27, 2006 at 1:52 pm
This my be too simple but does the box have on access or in memory virus scanner present and running? I place my bet there. Check services for other differences as well.
March 27, 2006 at 2:06 pm
Virus scanner is key. That can severely affect performance. Might also want to check and drop all indexes and recreate from scratch.
On the parallelism thing, you can restrict an individual query and the overall system can still run quicker. Work will be parcelled out to other CPUs for other connections, it's just that one connection can't use all CPUs. Be sure that hyperthreading is turned off. Been quite a few reports on SQL2K that it causes issues. If it's off, perhaps try it on and see.
April 9, 2006 at 6:22 pm
Ok, I think we've resolved it. After coming full circle with the server, configuration and statistics related things, we went through the stored procedure that was highlighting the performance difference between the development and production server again.
We were fortunate to be able to replicate the problem on a production server on one of our desktops in the office with MSDE, and the stored procedure that we focused on was returning results in 7 seconds. Upon close inspection, we found a key index missing on a field that was used in a join on the query within the stored procedure. After this index was added, the stored procedure returned in half a second.
So it seems that even though the same database runs well on one server, and slow on another, does not necessarily mean that it's the other servers configuration that is the problem, but it may be a problem with the database that your development server may not have highlighted by means of a different execution plan. That still amazes me, how one server can create a plan that works, despite the missing index, and another server creates a plan that doesn't work very well at all.
Thanks everyone for your input/suggestions.
Viewing 15 posts - 16 through 30 (of 31 total)
You must be logged in to reply to this topic. Login to reply