November 17, 2008 at 8:55 am
Hello All,
I have been trying to improve the performance of some very convoluted sql processes. Unfortunately, I am not in a position to upgrade to 2005. ( I impleneted some CLR and it flew, but I'm not allowed to use that).
The process uses a lot of regular expressions and functions within where clauses (oh, bother).
I have tuned loads of the code, but the problem I have is that when I run the program, the run times vary widely.
I am using the same db, with the same code, on the same server. I clear out the cache each time.
Could it be that since sp_OACreate etc is processing in anohter space, that there is some other factor at work here?
Nothing else is running on the server at the same time.
Any ideas?
Thanks
November 19, 2008 at 11:29 am
Well, you gave us no design docs, table or code scripts, data metrics, etc so we are extremely limited in our ability to help you.
Having said that, I would venture that your use of 'stuff' in WHERE clauses such as regex is preventing the optimizer from being able to accurately determine how many rows will be affected and it thus sometimes has an optimal plan and sometimes has a very suboptimal plan (both by accident). There are a few things you can do but you may well be limited in your ability to make things run faster. BTW, it won't matter what RDBMS system you are using in this instance either, so don't think that Oracle or MySQL, etc will solve your problems here. 🙂
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 20, 2008 at 2:37 am
Sorry, I didn't mean to be so vaugue, I just didn't consider that the actual code would be relevant to varying speeds (I can be a bit dim at times). I was thinking it was some sort of marshaling problem.
The problem is probably closer to home.
Your explanation makes perfect sense. The functions in the where clauses are numerable, and are based on string valued truth tables, here is an example of one:
CREATE FUNCTION dbo.func_And (
@p1 tinyint,
@p2 tinyint
) RETURNS tinyint
AS
BEGIN
DECLARE @retVal tinyint
DECLARE @truthTable char(42)
SET @truthTable = '123451022222203233330423444052345501234560'
SET @retVal = CAST( SUBSTRING( @truthTable, (((@p1 - 1) * 7) + @p2), 1 ) AS tinyint )
IF @retVal = 0 -- Should never occur...
SET @retVal = 3 -- ...so return "Error".
RETURN @retVal
END
and it is used, along with others like so:
WHERE ( dbo.func_And( T0.T4, dbo.func_And( dbo.func_Not( T0.T137 ), T1.T1262 ) ) ) = 1
There are thousands of these, and they have a variable number of these functions.
So, you are rught to say that the optimiser will not be able estimate no of rows, or optimise in any way.
I suppose now that you have ppointed me in the right direction, is to capture the plans it is actually using, and see if they vary alot over different runs?
Anyway, many thanks for your reply, and again sorry if I am not explicit enough.
November 20, 2008 at 7:13 am
Yep, I was correct. This will get even worse if you introduce joins into the mix, because the optimizer may 'guess' 1 row will be returned (quite a common guess actually) and then do nested loop joins (which are most efficient for small numbers of rows). Now when the function actually hits many (thousands, millions??) of rows, you can pack a lunch while the server gets crushed looping all those IOs.
If there weren't so many of those functions, I would suggest that you try out creating some persisted computed columns, perhaps with indexes on them if the data distribution is appropriate. That could be significantly faster for data access (but possibly cause delays on DML operations).
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply