November 10, 2005 at 7:59 am
My company gets http logs imported to SQL for processing. Table definition (partial) is
log_user varchar(10)
ip varchar(50)
log_date datetime
In about 60% of the rows, log_user column is null. I then get the log_user where the ip and log_date match. Here is the update query
UPDATE http_log_process
SET log_user = b.log_user
FROM http_log_process INNER JOIN http_log_process b ON
CONVERT(varchar(12), http_log_process.log_date, 101) = CONVERT(varchar(12), b.log_date, 101)
AND http_log_process.ip = b.ip
WHERE http_log_process.log_user IS NULL and b.log_user IS NOT NULL
AND http_log_process.log_date IS NOT NULL
This query takes about 2 minutes for the update. The table has 20K rows and 10K rows are updated. Can this query be fine-tuned ?
November 10, 2005 at 8:23 am
Well, I doubt it.
I'm guessing that you're getting scans all the way, because you have functions on the dates and also negations in the wehere clause, all things that invalidates indexusage. But... since the volume updated is 50% of the data, I sincerely doubt that any indexes would be used anyway.
My guess is that this update is happening as fast as the server can muster, which makes the bottleneck the disksystem, and not the query.
Though it would be interesting to see what the current plan looks like.. Are you getting tablescans, or are any indexes involved? Also, are there any triggers on the table?
/Kenneth
November 10, 2005 at 8:27 am
First, make sure you have indexes in ip and log_date
then instead of two converts:
CONVERT(varchar(12), http_log_process.log_date, 101) = CONVERT(varchar(12), b.log_date, 101)
you can use
datediff(day,http_log_process.log_date,b.log_date) = 0
hth
* Noel
November 10, 2005 at 8:36 am
The current plan does not show any table scans (there was a Hash Match/Inner Join taking the most processing time). Also no triggers on the table.
I also tried having a computed column to the date only (without the time) and did not have any performance gain.
I have indexes on ip and log_date. Changing the CONVERT to Datediff brought the update to under 25 seconds.
Thanks guys for your input.
November 10, 2005 at 8:48 am
You wouldn't see any boost in performance in the computed column solution unless it was indexed as well...
November 10, 2005 at 9:20 am
Classic issue of needing indexing to speed up selecting process, but at the same time not wanting indexing as this may slow down the update if you are changing a field with an index on.
Is there an index on the log_user field?
November 10, 2005 at 10:03 am
Yes, there is an index on log_user.
I got an error when creating index on the computed column
alter table http_log_process add log_date_only AS
CONVERT(datetime,CONVERT(varchar(12), log_date, 101) )
CREATE INDEX http_log_process_idx6 ON http_log_process ( log_date_only)
Cannot create index because the key column 'log_date_only' is non-deterministic or imprecise.
But I was able to create the index if the computed column was
alter table http_log_process add log_date_only AS
CONVERT(varchar(12), log_date, 101).
November 10, 2005 at 10:09 am
try:
alter table http_log_process add log_date_only AS
dateadd(d,0,datediff(day,0,log_date))
* Noel
November 10, 2005 at 10:32 am
I was able to create the computed column with dateadd(d,0,datediff(day,0,log_date)) but still got the same error while creating the index on the computed column.
And the update query was taking over 5 minutes (killed the update ) when using the computed column. When I use the
datediff(day,http_log_process.log_date,b.log_date) = 0
the update was done in 25 seconds.
November 10, 2005 at 10:50 am
Check the books online. You need to convert the data to the ainsi complient format (112 I think). That will make the column deterministic so you can index it.
November 10, 2005 at 12:47 pm
Actually I forgot that datediff is NOT deterministic even though dateadd is
The convert is only deterministic with datetime data type only when style is specified. Because you are getting rid of the time, further conversion may not be necessary.
try:
alter table http_log_process add log_date_only AS
CONVERT(char(8), log_date, 112) -- this gives yyyymmdd
* Noel
November 11, 2005 at 10:57 am
Whoa - this update was done in 1 sec.
Thanks Joe - I made one small change though
UPDATE http_log_process
SET log_user
= (SELECT MAX(log_user)
FROM http_log_process AS B
WHERE http_log_process.log_date = B.log_date
AND http_log_process.ip = B.ip AND b.log_user IS NOT NULL)
WHERE log_user IS NULL;
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply