March 23, 2015 at 9:40 am
Two tables:
GeoIpLocation (Id, City, Country, Latitude, Longitude; PK: Id; 2m rows)
GeoIpBlock (IpFrom, IpTo, GeoIpLocation_Id; PK: IdFrom, IdTo; 100k rows)
Simple concept: get location for given IP. So:
SELECTTOP 1
GeoIpLocation.Country,
GeoIpLocation.City
FROMGeoIPLocation
INNER JOIN GeoIPBlock
ON GeoIPBlock.GeoIpLocation_Id = GeoIPLocation.Id
WHERE@nIpNumber BETWEEN GeoIPBlock.IpFrom AND GeoIPBlock.IpTo
Result: disaster.
The between operator uses the index on PK on GeoIpBlock, which results on half the table for first part of between and half the table for the second part of between. A bit better query is with FORCESCAN on GeoIPBlock, but it still runs slow, as it scans a lot of records.
Question: Is there a better way to index this kind of data?
March 23, 2015 at 11:02 am
It would be good to post the table and current index creation scripts.
This may not be the final solution but consider working on the query in two parts.
SELECT TOP 1 @ID = GeoIpLocation_Id FROM GeoIPBlock WHERE @nIpNumber BETWEEN GeoIPBlock.IpFrom AND GeoIPBlock.IpTo
SELECTTOP 1
GeoIpLocation.Country,
GeoIpLocation.City
FROMGeoIPLocation
WHERE GeoIPLocation.Id = @ID
The first part looks like it will result in a table scan but if you alter it and there is an index on IPfrom thenyou may be able to get closer
WHERE GeoIPBlock.IpFrom >= @nIpNumber AND GeoIPBlock.IpTo <= @nIpNumber
Try create a non clustered index on IPFrom and IpTo and enure the ID column is in that index. That way it should see to the IPFrom, scan until it hits the first row meeting the IPTo and grab the ID, all in the one index structure.
CREATE NONCLUSTERED INDEX IPFromTo ON GeoIPBlock(IpFrom, IPTo, ID)
The second part of the query should be quick as it's looking for one row based on @ID.
March 24, 2015 at 3:39 am
Have you tried index on IpTo ?
March 24, 2015 at 3:50 am
We really need to see the DDL for the table and its indexes.
Also, can you explain what you mean by this:
..which results on half the table for first part of between and half the table for the second part of between
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
March 24, 2015 at 4:15 am
Please post at least the execution plan so we can see how the optimizer is resolving the query based on your structures.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 24, 2015 at 4:27 am
Ok, I am sorry; here are the execution plans. The first one is without any hints, the second is with FORCESCAN on GeoIpBlock. FORCE ORDER hint produces the same plan, btw.
The first query takes on average about 400ms, the second one about 200 ms.
March 24, 2015 at 4:33 am
Actual rows and Est rows are very different. Are your statistics up to date?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
March 24, 2015 at 4:55 am
Phil Parkin (3/24/2015)
Actual rows and Est rows are very different. Are your statistics up to date?
Agreed. We don't have index definitions yet either:
SELECT TOP 1
l.fkTblCountry,
l.sCity
FROM tblGeoIPLocation l
INNER JOIN tblGeoIPBlock b
ON b.fkTblGeoIPLocation = l.pkTblGeoIPLocation
WHERE @nIpNumber BETWEEN b.nIpFrom AND b.nIpTo
Try these indexes -
tblGeoIPLocation: index on pkTblGeoIPLocation include fkTblCountry, sCity
tblGeoIPBlock: index on nIpFrom, nIpTo, fkTblGeoIPLocation
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 24, 2015 at 5:47 am
Phil Parkin (3/24/2015)
Actual rows and Est rows are very different. Are your statistics up to date?
Yeah, estimated 1, actual 87000, that's a slight disparity. No wonder it assumed a scan would work well. It only thought it was returning a single row. I suspect you have auto_updatestats disabled. You might want to turn that on and get some statistics maintenance in place.
Without the statistics, the optimizer will make very bad choices. Also, the fact that the plan shows a runtime value, but not a compile time value for @nlpNumber, I suspect that's a local variable, not a parameter. Once you have statistics, you may be better off with a parameter so that the optimizer can use specific values to retrieve the data.
As to your index, I'd make it a compound key on nIpFrom and nIpTo with an INCLUDE of [Alzasoft].[dbo].[tblGeoIPLocation].pkTblGeoIPLocation, [Alzasoft].[dbo].[tblGeoIPLocation].fkTblCountry, [Alzasoft].[dbo].[tblGeoIPLocation].sCity. At least as an experiment to test things out. You may get index intersection on this, so two indexes, one for nIpFrom and one for nIpTo might work better. Testing is your friend.
I didn't look at the plans with hints. You're going after hints when the statistics and the structure is off. That's not how hints should be used. They should be the last thing tried when we've exhausted all the standard approaches.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 24, 2015 at 6:08 am
Hey Grant, in the Plan1 execution plan there's a NLIJ with "Expr1004" in the Outer References list. Any idea what the "Expr1004" is for and where it was constructed?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 24, 2015 at 6:25 am
Auto update for statistics is not disabled; they are even manually updated in a nightly maintenance job, however, the query does not want to use the PK index. I guess it has something to do with a fact that it's not a <column> BETWEEN <value1> AND <value2>, but it's <value> BETWEEN <column1> AND <column2>.
March 24, 2015 at 6:48 am
ChrisM@Work (3/24/2015)
Hey Grant, in the Plan1 execution plan there's a NLIJ with "Expr1004" in the Outer References list. Any idea what the "Expr1004" is for and where it was constructed?
I'm not sure. It doesn't appear elsewhere in the plan. I searched the XML.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 24, 2015 at 6:53 am
Jakub.Janda (3/24/2015)
Auto update for statistics is not disabled; they are even manually updated in a nightly maintenance job, however, the query does not want to use the PK index. I guess it has something to do with a fact that it's not a <column> BETWEEN <value1> AND <value2>, but it's <value> BETWEEN <column1> AND <column2>.
Looking at the plan again, you'll note that the BETWEEN operation is being satisfied by the [Alzasoft].[dbo].[tblGeoIPBlock].[IX_tblGeoIPBlock_Location] index. The problem is the scan of the other table, tblGeoIPLocation. And that's because the statistics are radically off there. The estimated number of rows is 1. It's scanning 87000. That is the problem. Not the BETWEEN operation or the other index. That part of your structure is working fine. Try UPDATE STATISTICS WITH FULL SCAN on that primary key. See if that changes things. Your statistics are WAY off in there.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 24, 2015 at 7:22 am
Here is an alternative to the between approach but it depends on what the data looks like in GepIPBlock.
If GepIPBlock.IpFrom and IPTo represent range of ip's or subnets like 123.123.123.1 to 123.123.123.255 then you may be able to take a @nIPNumber like 123.123.123.54, strip the .54 off it and search GEOIpBlock where GepIPBlock.IpFrom LIKE '123.123.123.*' which should use a index seek.
If GeopIPBlock has multiple rows fro the same subnet 123.123.123.??? then you could strip one character off the @nIPNumber at a time until you hit the row. This will be a small number of seeks but may be quicker than a scan.
March 24, 2015 at 7:49 am
As per previous post, This test reduces reads to 3 Logical reads.....
DROP TABLE TEST
GO
CREATE TABLE TEST (ID INT IDENTITY(1,1), IPFROM VARCHAR(120), IPTO VARCHAR(120))
GO
INSERT INTO TEST(IPFROM) VALUES (1)
GO -- Add one row to test table
INSERT INTO TEST(IPFROM) SELECT IPFROM FROM TEST
GO 15 -- Add loads more rows
SELECT COUNT(1) FROM TEST
-- CONVERT Identity Column to something that looks like an IP Range
UPDATE TEST SET IPFROM = (ID*100)+100000000000, IPTO = (ID*100)+100000000099
UPDATE TEST SET IPFROM = SUBSTRING(Ipfrom,1,3)+'.'+SUBSTRING(ipfrom,4,3)+'.'+SUBSTRING(ipfrom,7,3)+'.'+SUBSTRING(ipfrom,10,3)
UPDATE TEST SET IPTO = SUBSTRING(Ipto,1,3)+'.'+SUBSTRING(ipto,4,3)+'.'+SUBSTRING(ipto,7,3)+'.'+SUBSTRING(ipto,10,3)
SELECT TOP 1000 * FROM TEST ORDER BY ID DESC
CREATE NONCLUSTERED INDEX TEST_IPFROM ON TEST(ipfrom, ipto, id)
set statistics io on
select * from test where '100.003.276.001' BETWEEN ipfrom and ipto
select * from test where ipfrom LIKE '100.003.276.%' AND '100.003.276.001' BETWEEN ipfrom and ipto
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply