January 10, 2007 at 8:00 am
Hi,
I have a query which become slower as the number of item in the temp table increase.
Here's my code:
CREATE table #regions ( ipFloat float null, countryId int null, )
insert into #regions (ipFloat) select distinct top 100 ipFloat from tbl_logs
-- This is where the code gets really slow! update #regions set countryId = (select countryId from tbl_ip2countries where ipFloat between ipFrom and ipTo)
select * from #regions drop table #regions
What it does is it takes some ip converted in a float number and look for its associated countryId in a table. The Ip2countries table has more than 2 000 000 records. There's a clustered index on the primary key which is ipTo and ipFrom (both float).
If I take only 10 ip (select distinct top 10 ipFloat from tbl_logs), the query executes in less than 1 second.
If I take up to 100 ip, it takes more than 45 secondes. And in this case, I have a index spool (eager spool) that consumes 83%. By the way, in the execution plan, the arrow from the clustered index shows about 2 000 000 actual number of rows. But the arrow after the eager spool shows an actual number of rows of more than 80 000 000! Is this normal? There's a filter after that reduces the number of rows to only 100...
If I take 1000 ip, it takes several minutes.
Is there a way to reduces the eager spool or to reduce the number of 80 000 000 of actual rows? I don't understand what's going on here and why this is so slow for few hundreds of rows.
Any idea?
Thanks
Stephane
January 10, 2007 at 8:24 am
Can you edit your post so that it's more readable... I'm not sure you'll get much attention from anyone with that question.
January 10, 2007 at 8:32 am
Thanks! I didn't notice... I made a copy and paste from word?!
January 10, 2007 at 8:42 am
That's what I thaught. Thanx for the quick fix .
January 10, 2007 at 8:43 am
Can you show us the some sample data and the required output from the update query... I'm sure there's a way to avoid the spool altogether.
January 10, 2007 at 9:00 am
I just did a new test using a cursor this time.
CREATE table #regions (
ipFloat float null,
countryId int null)
insert into #regions (ipFloat) select distinct top 500 ipFloat from tbl_logs
declare @i float, @id int
DECLARE cSE CURSOR READ_ONLY
FOR select ipFLoat from #regions
OPEN cSE
FETCH NEXT FROM cSE INTO @i
WHILE @@FETCH_STATUS = 0
BEGIN
select @id = countryId from tbl_ip2countries where @i between ipFrom and ipTo
update #regions set countryId = @id where ipFloat = @i
FETCH NEXT FROM cSE INTO @i
END
CLOSE cSE
DEALLOCATE cSE
select * from #regions
drop table #regions
The results are quite surprising! For 100 ips, it took only 6 seconds instead of minutes.
But I'm sure there's a way to get this done quickly whitout using a cursor. I just can't find it out.
Thanks for any idea,
Stephane
January 10, 2007 at 9:00 am
Stephane
Your syntax doesn't look quite right. I don't see how your update statement can work. How about this (not tested, obviously):
update r
set countryId = c.countryId
from #regions r join tbl_ip2countries c
on r.ipFloat = c.ipFloat
where r.ipFloat between c.ipFrom and c.ipTo
John
January 10, 2007 at 9:04 am
The problem with that update is that I don't have the exact IP converted in float in tbl_ip2countries. I mean, I have only a range so Canada is assigned to 123.123.123.0 to 123.123.123.256. So I can't join on r.ipFloat = c.ipFloat.
Thanks!
January 10, 2007 at 9:08 am
I have ip converted in float like this:
1362268423
3476946705
1179659265
1179726082
3234155789
3254497794
3489515534
3567080704
1390476300
1390490892
1390832915
1415938309
1456668932
3258127372
3569568532
3573210393
I have to check in a lookup table (containing more than 2M rows) set up like this:
ipFrom ipTo countryId
---------------------- ---------------------- -----------
0 33996343 95
33996344 33996351 219
33996352 50331647 95
50331648 50331903 220
50331904 50332159 220
50332160 50332671 220
50332672 50332927 220
50332928 50333695 220
50333696 50333951 220
50333952 50334719 220
So the #regions would look like this after:
ipFloat countryId
---------------------- -----------
1362268423 70
3476946705 38
1179659265 38
1179726082 38
3234155789 38
3254497794 70
3489515534 38
3567080704 199
1390476300 70
1390490892 70
thanks
Stephane
January 10, 2007 at 9:29 am
Stephane
I still don't understand. The sample data you gave shows ipFrom and ipTo having 8 digits, and ipFloat having 9. So how can ipFloat ever be between ipFrom and ipTo? Also, how do you convert an IP address into this format? For instance, to use the Canada example, what do 123.123.123.0 and 123.123.123.256 become? (By the way, the latter isn't a valid IP address. I assume this is a typo?)
John
January 10, 2007 at 9:38 am
Hi,
Thanks for your fast answer!
The ip is converted this way: For example, If the IP address 161.132.13.1, then the IP number is 2709785857.
IP Number, X = 161 x (256*256*256) + 132 x (256*256) + 13 x (256) + 1 = 2709785857.
You can have more info here: http://www.ip2location.com/docs/IP2Location_IP_Country_Region_City_Specification.pdf
So it can have 8 or 9 digits.
Sorry for the 256, it's obviously 255!
Thanks
Stephane
January 10, 2007 at 10:02 am
Stephane
OK, that clears up the IP conversion algorithm. I still don't think that your sample data from tbl_ip2countries and from #regions tally up with each other, but let's not worry about that for now.
Try this:
update r
set countryId = c.countryId
from #regions r join tbl_ip2countries c
on r.ipFloat between c.ipFrom and c.ipTo
Not sure how it would perform, but much better than a cursor, I should think. You could experiment with using >= and <= operators instead of the BETWEEN function to see if that goes any faster.
John
January 10, 2007 at 10:20 am
hum...
That's really weird: For 10 ip, it takes 7-8 seconds. For 100, it takes about a minute which is slower than my problem at the beginning.
BUT!
If I try with 40 or 50 ip, it executes in a flash: 1 second!
The difference is in uses table spool (lazy spool) in the first case but only clustered index seek in the second case.
What the ...???
By the way, in the prod environment, it's going to be up to 1000 ip.
Stephane
January 10, 2007 at 10:34 am
I really don't get it...
10-20-30 ip uses the lazy spool so it takes several seconds.
40-50-60 ip uses the clustered index seek so it executes in a flash.
70 and more also uses the lazy spool so it takes more than a minute.
Is there a way to force the use of the index instead of the costly table spool?
By the way, it's a clustered index SCAN with the table spool instead of a clustered index seek. What's the difference?
Thanks
Stephane
January 10, 2007 at 10:38 am
SCAN : reads all rows.
SEEK : binary search algorithm to find only the required row.
Could you reindex and / or update the statistics on the table.
If that doesn't help you could always use index and join hints (last resort).
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply