April 30, 2009 at 10:39 am
Hi,
We have a Comments table with a Date field, and I need to determine the latest comment date per comment type for any give customer. I've written a function that basically does a MAX on the comment date per customer and type, but it runs very slow.
The Comment table is basically this:
IDNUMERIC (18) IDENTITY (1, 1) NOT NULL,
TypeIDNUMERIC (18) NOT NULL,
CustomerID NUMERIC (18) NOT NULL,
CommentText NVARCHAR (MAX) DEFAULT ('') NOT NULL,
CommentDatedatetime
I have a function ufnLastCommentDate that gets Max(CommentDate) based on CustomerID and TypeID.
And my query to get the date per Customer is this:
selectCustomerID,
dbo.ufnLastCommentDate(CustomerID,1) -- Querying Comment TypeID of 1
fromCustomers
We have almost 250,000 comments of various types and almost 50,000 customers. When I run the query above it takes
I've been working with the indexes and I've tried almost every combination I can think of, but nothing really speeds it up. The query above takes almost 10 minutes to run for all 50,000 customers.
Any suggestions? Thanks,
Sam Alex
April 30, 2009 at 10:51 am
Any chance you could post some more code? like what your function does?
Check out the following for what I'm tlaking about...
http://www.sqlservercentral.com/articles/Best+Practices/61537/[/url]
But I'd think just a simple select should get you what you want...
Select CustomerID , TypeID, Max(date)
FROM Comments
WHERE CustomerID = 'avalue' AND TypeID = 'Another Value'
GROUP BY CustomerID , TypeID
Edit: forgot the hyperlink
April 30, 2009 at 10:52 am
have a look at your execution plan and this will give you a clue where the bottlenecks are and help you pick the correct index.
Is there a reason you are using a function to find the max date ? rather than use Group by?
April 30, 2009 at 12:19 pm
Udfs have an unpleasnat tendency to perform rather poorly. One reason being that they run once for each row of the resultset so if you have 50000 customers, that function is running 50000 times.
See - http://sqlinthewild.co.za/index.php/2009/04/29/functions-io-statistics-and-the-execution-plan/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
April 30, 2009 at 12:22 pm
Hi Everyone,
I didn't post the entire process because our eventual goal is to return a list of all Customer information from the Customer table and various other tables which is about 30+ fields, and one field needs to be the last comment date. It's speedy until I add the piece to pull in the latest comment date so I pulled that piece out to post a question on since the rest is running fine. Also due to company policy I can not paste code verbatim from our application into the forum which is why I tried to paste in the pertinent snippets hoping it would be enough for those more experienced then I in this area to make some suggestions.
Here's the function that gets the date:
alter FUNCTION [dbo].[ufnMaxCommentDate]
(
@inCustomerID numeric(18,0),
@inCommentTypeint
)
RETURNS datetime
AS
BEGIN
DECLARE @outDate datetime
select
@outDate = Max(Comment.CommentDate)
From
Comment
Where
CustomerID = @inCustomerID AND
CommentType = @inCommentType
RETURN @outDate
END
And here's the Create Table code to create the comment table:
CREATE TABLE [dbo].[Comment] (
[CommentID] NUMERIC (18) IDENTITY (1, 1) NOT NULL,
[CommentType] NUMERIC (18) NOT NULL,
[CustomerID] NUMERIC (18) NOT NULL,
[CommentText] VARCHAR (MAX) DEFAULT ('') NOT NULL,
[CommentDate] DATETIME DEFAULT (getdate()) NOT NULL,
PRIMARY KEY CLUSTERED ([CommentID] ASC) WITH (ALLOW_PAGE_LOCKS = ON, ALLOW_ROW_LOCKS = ON, PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF, STATISTICS_NORECOMPUTE = OFF)
)
CREATE NONCLUSTERED INDEX [IX_Comment]
ON [dbo].[Comment]
([CommentDate] ASC, [CommentType] ASC, [CommentText] ASC)
ANd here's the final SQL to bring it all together:
SELECTCustomerID,
dbo.ufnMaxCommentDate2(CustomerID, 1)
FROM Customer
The Comment table has 215425 records (all Type 1) and Customer table has 43077 records. The reason I'm not using a Group By is because the eventual query will have 30+ records crossing multiple tables, and it'll be really ugly doing a group by with all these fields. If that's the most efficient way of doing it I'll do it, but I'd assume if indexing was setup properly on the Comment table then getting the Max value on Comment Date shouldn't be a huge impact on speed.
I also did check the Execution Plan, but it just showed the Clustered Index Seek taking 100% of the process. I guess as you guys mentioned it's just way too many records for a function to handle efficiently so a Group By might be the best option.
THanks --
Sam Alex
April 30, 2009 at 12:32 pm
SELECT CustomerID,
MaxCommentDate
FROM Customer INNER JOIN
(select Max(Comment.CommentDate) AS MaxCommentDate, CustomerID
From Comment
Where CommentType = 1
GROUP BY CustomerID) AS Comments ON Customer.CustomerID = Comments.CustomerID
This should perform well regardless of the other tables and because the group by is in the subquery, it's not going to get messy as you add more columns. More importantly, SQL should be able to run that subquery just once, rather than the 40000 times it would have been doing with the UDF.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply