September 22, 2005 at 11:21 am
Hi I run a daily job that looks up country codes based on IP. In one table are IP address ranges as "longs" and the matching country code. In another table I have log entries and dotted IP's. New rows are imported daily and I need to update them with country codes.
My strategy right now has been to run a daily job that populates a column in the logData rows with the country country code. My proc has two steps. First convert the dotted IP's to serial IP's. Then do a join update.
When this job runs it ends up blocking selects against the log table. The first statement uses a UDF to convert the dotted IP's to serial IP's. The second statement does the country code look ups.
Which proc do you think is causing the blocking?
Is there a better way to do this?
(The wmsLog table has about 15million rows)
PROC:
update dbo.wmsLog set longIP = dbo.IPStringToNumber(dbo.wmsLog.cip)
where dbo.wmsLog.longIP =0
and [datetime] > dateadd(d,-5,getdate())
UPDATE dbo.wmsLog
SET dbo.wmsLog.countrya2 = dbo.GeoIP.a2
FROM dbo.wmsLog INNER JOIN dbo.GeoIP ON dbo.wmsLog.longIP BETWEEN dbo.GeoIP.startLongIP AND dbo.GeoIP.endLongIP
where dbo.wmsLog.countrya2='' and not dbo.wmsLog.longIP =0
and [datetime] > dateadd(d,-5,getdate())
Dave
Trainmark.com IT Training B2B Marketplace
(Jobs for IT Instructors)
September 22, 2005 at 12:45 pm
At which part there are blocking selects? section 1 (the conversion) or section 2 (assigning the language) of the code?
What is the code behind dbo.IPStringToNumber?
-------------------------------
Declare @DateInPast datetime
set @DateInPast=dateadd(d,-5,getdate())
update dbo.wmsLog set longIP = dbo.IPStringToNumber(dbo.wmsLog.cip)
/*converts 10.0.50.1 to long 010 000 050 001 ?*/
where dbo.wmsLog.longIP =0 /*long ip not set*/
and [datetime] > @DateInPast /*recent entry*/
UPDATE dbo.wmsLog /* logtable*/
SET dbo.wmsLog.countrya2 = dbo.GeoIP.a2 /*assign country code*/
FROM dbo.wmsLog wmsLog /*logtable*/
INNER JOIN dbo.GeoIP GeoIP /*contains IP ranges by country*/
ON wmsLog.longIP BETWEEN GeoIP.startLongIP AND GeoIP.endLongIP
/*Must be in a GEO IP-range used by a country*/
where wmsLog.countrya2='' /*country not filled in*/
and not wmsLog.longIP =0 /*must have an valid IP*/
/*is >0 as valid as not =0? this for a possible index*/
and [datetime] > @DateInPast /*recent entry*/
September 22, 2005 at 3:41 pm
When I do a sp_who2 it shows the proc as blocking, not sure which statement in the proc is doing it. I guess I can split it into two and check it out.
Here is the function that I got somewhere on the net:
CREATE FUNCTION dbo.IPStringToNumber( @vcIPAddress varchar(15))
/**************************************************************************
DESCRIPTION: Returns Numeric IP, otherwise returns null
PARAMETERS:
@vcIPAddress- The string containing a valid IP
RETURNS:IP converted to bigint or null if not a valid IP
USAGE: SELECT dbo.IPStringToNumber( '10.255.255.255')
AUTHOR:Karen Gayda
DATE: 06/11/2003
MODIFICATION HISTORY:
WHODATEDESCRIPTION
----------------------------------------------------------------
***************************************************************************/
RETURNS bigint
AS
BEGIN
DECLARE
@biOctetA bigint,
@biOctetBbigint,
@biOctetCbigint,
@biOctetDbigint,
@biIP bigint
DECLARE @tblArray TABLE
(
OctetIDsmallint, --Array index
Octetbigint --Array element contents
)
--split the IP string and insert each octet into a table row
INSERT INTO @tblArray
SELECT ElementID, Convert(bigint,Element) FROM dbo.Split(@vcIPAddress, '.')
--check that there are four octets and that they are within valid ranges
IF (SELECT COUNT(*) FROM @tblArray WHERE Octet BETWEEN 0 AND 255) = 4
BEGIN
SET @biOctetA = (SELECT (Octet * 256 * 256 * 256) FROM @tblArray WHERE OctetID = 1)
SET @biOctetB = (SELECT (Octet * 256 * 256 ) FROM @tblArray WHERE OctetID = 2)
SET @biOctetC = (SELECT (Octet * 256 ) FROM @tblArray WHERE OctetID = 3)
SET @biOctetD = (SELECT (Octet) FROM @tblArray WHERE OctetID = 4)
SET @biIP = @biOctetA + @biOctetB + @biOctetC + @biOctetD
END
RETURN(@biIP)
END
Dave
Trainmark.com IT Training B2B Marketplace
(Jobs for IT Instructors)
September 23, 2005 at 7:19 am
why don't you play with that... seems to be considerably faster :
CREATE FUNCTION dbo.IPStringToNumber_Set( @vcIPAddress varchar(15))
/**************************************************************************
DESCRIPTION: Returns Numeric IP, otherwise returns null
PARAMETERS:
@vcIPAddress - The string containing a valid IP
RETURNS: IP converted to bigint or null if not a valid IP
USAGE: SELECT dbo.IPStringToNumber( '10.255.255.255')
AUTHOR: Karen Gayda
DATE: 06/11/2003
MODIFICATION HISTORY:
WHO DATE DESCRIPTION
--- ---------- ---------------------------------------------------
RGR'us - 2005/09/23 - Removed the inserts/select for performance, set the answer right away
***************************************************************************/
RETURNS bigint
AS
BEGIN
DECLARE
@biOctetA bigint,
@biOctetB bigint,
@biOctetC bigint,
@biOctetD bigint,
@tiCount tinyint,
@biIP bigint
--split the IP string and insert each octet into a table row
SELECT
@biOctetA = dtValues.biOctetA
, @biOctetB = dtValues.biOctetB
, @biOctetC = dtValues.biOctetC
, @biOctetD = dtValues.biOctetD
, @tiCount = dtValues.Total
FROM
(
SELECT
MAX(CASE WHEN ElementID = 1 THEN Convert(bigint,Element) * 256 * 256 * 256 END) AS biOctetA,
MAX(CASE WHEN ElementID = 2 THEN Convert(bigint,Element) * 256 * 256 END) AS biOctetB,
MAX(CASE WHEN ElementID = 3 THEN Convert(bigint,Element) * 256 END) AS biOctetC,
MAX(CASE WHEN ElementID = 4 THEN Convert(bigint,Element) END) AS biOctetD,
MAX(ElementID) AS Total
FROM dbo.Split(@vcIPAddress, '.')
) dtValues
--check that there are four octets and that they are within valid ranges
SET @biIP = NULL
IF @tiCount = 4
BEGIN
SET @biIP = @biOctetA + @biOctetB + @biOctetC + @biOctetD
END
RETURN(@biIP)
END
GO
Select dbo.IPStringToNumber_Set('255.123.234.104') AS IP_SET
GO
September 23, 2005 at 7:24 am
Not sure if it's faster that way, but you could test it :
SELECT
@biIP = CASE WHEN @tiCount = 4 THEN biOctetA + biOctetB + biOctetC + biOctetD ELSE NULL END
FROM
...
RETURN (@biIP )
You can also try this split version to see if it works faster than the procedural one (don't expect much of a difference over a small string... but times 1 million records it might make one)
IF Object_id('fnSplit_Set') > 0
DROP FUNCTION dbo.fnSplit_Set
GO
IF Object_id('Numbers') > 0
DROP TABLE dbo.Numbers
GO
CREATE TABLE dbo.Numbers (PkNumber int identity(1,1), dude bit null, CONSTRAINT Pk_Number PRIMARY KEY CLUSTERED (PkNumber))
GO
INSERT INTO dbo.Numbers (dude)
SELECT NULL FROM (SELECT TOP 100 NULL AS A FROM master.dbo.spt_values) dt100 cross join (SELECT TOP 80 null AS A FROM master.dbo.spt_values) dt80
GO
ALTER TABLE dbo.Numbers
DROP COLUMN dude
GO
--Select min(PkNumber) as MinA, max(PkNumber) as MaxA from dbo.Numbers
--1, 8000
GO
CREATE FUNCTION [dbo].[fnSplit_Set] (@IDS as varchar(8000), @vcDelimiter varchar(3))
RETURNS TABLE
WITH SCHEMABINDING
AS
Return
Select dtSplitted.EachID, dtSplitted.Rank from (
SELECT SUBSTRING(@vcDelimiter + @IDs + @vcDelimiter, N.PkNumber + len(@vcDelimiter),
CHARINDEX(@vcDelimiter, @vcDelimiter + @IDs + @vcDelimiter, N.PkNumber + len(@vcDelimiter)) - N.PkNumber - len(@vcDelimiter)) as EachID
, (LEN(SUBSTRING(@IDs, 1, N.PkNumber)) - LEN (REPLACE (SUBSTRING(@IDs, 1, N.PkNumber), ',', ''))) + 1 AS Rank
FROM dbo.Numbers N
WHERE SUBSTRING(@vcDelimiter + @IDs + @vcDelimiter, N.PkNumber, len(@vcDelimiter)) = @vcDelimiter
AND PkNumber 0
GO
DECLARE @Ids as varchar(8000)
SET @IDS = ''
Select @IDS = @IDS + CAST(id as varchar(10)) + ',' from (Select TOP 10 id from dbo.SysObjects order by NEWID()) dtIds
SET @IDS = left(@IDS, ABS(len(@IDS) - 1))
PRINT @IDS
Select O.id, dtIDS.EachId, O.name, dtIDS.Rank from dbo.SysObjects O inner join (Select CAST(EachID as INT) AS EachID, Rank from dbo.fnSplit_Set (@IDS, ',')) dtIDS on O.id = dtIDS.EachID order by O.Name
Select O.id, dtIDS.EachId, O.name, dtIDS.Rank from dbo.SysObjects O inner join (Select CAST(EachID as INT) AS EachID, Rank from dbo.fnSplit_Set (@IDS, ',')) dtIDS on O.id = dtIDS.EachID order by dtIDS.Rank
--Dont't ever forget to cast the eachid column to the same datatype as in the join to avoid table scans.
September 28, 2005 at 9:58 pm
I am working a number of angles on this. I just realized that although I have 11 indexes on the main table in question I don't have one suitable for the joined update statement. Adding an index for the longIP field should help. Currently the two statements together take an hour to run which is far longer then should be neccessary.
Second I am trying to get my ADO connection object to do dirty reads. This is primarily a reporting application and 98% of the reports could care less about country code. So I would think that if I set the connection object to do dirty reads (adXactReadUncommitted) then shouldn't it ignore any locks that the update statements are placing?
btw: from what I have seen so far it does not appear to be the function itself that is the bottleneck, or rather the alternatives don't give me a large enough performance increase. I want this proc to run in minutes at the most if I can manage it.
Thanks all and any adivce still welcome.
Dave
Trainmark.com IT Training B2B Marketplace
(Jobs for IT Instructors)
September 29, 2005 at 7:02 am
Can you post the full execution plan? Maybe we'll see something you missed.
September 29, 2005 at 10:02 am
OK I will. I have the execution plan on my monitor right now but it's the graphical one. I'll choose the text version and run it again later tonight when server traffic is low.
For now it is the update via join that consumes 98% of the execution time of 1:09:00. 60% of the cost is a clustered index seek on the geoIP table. my geoIP table is [country],[startIP],[endIP] and the where clause is a "between" the start and end. The innerJoin nested loop consumes another 26% so those two together account for 85% percent of the time.
Hard for me to see how I can do better then the
index seek thats taking 60% of the time. Maybe I can drop countries that are likly to appear as "all others" on the report.
I'll post the full plan tonight. Thanks.
Dave
Trainmark.com IT Training B2B Marketplace
(Jobs for IT Instructors)
September 30, 2005 at 7:23 am
Just a taught... maybe it would be faster to run the clustered index seek if the startIP and endIP columns were at the start of the index instead of a char column (assuming country is in text).
September 30, 2005 at 8:39 am
Sorry, sloppy on my part when describing the fields. The clustered index is [start],[end],[country] already I just wrote it wrong in the post.
However, I just took a look at the index being used on the other table and that can be improved. There is no index with dbo.wmsLog.longIP. In the where claus of the join it is used as "not dbo.wmsLog.longIP = 0". Might be a small gain there.
Im not sure if an additional index for that column is worth it though because it would add time to the first statement updating the index after each update in the first statement. I'll try it and measure the difference.
I might just resign myself to running this job at 2am when the server is very idle.
thx
Dave
Trainmark.com IT Training B2B Marketplace
(Jobs for IT Instructors)
September 30, 2005 at 8:45 am
Maybe you could speed this up if you added an indexed calculated field on the long ip (the ones you calculate on the fly)... maybe that could give you some more options... and at least flush one big update statement.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply