April 2, 2010 at 5:42 pm
Excellent! I'm now getting a duration of 1 ms and with only 3 reads!!
Could you explain what's going on here to make it so much faster?
Thanks!
April 2, 2010 at 5:43 pm
cgreathouse (4/2/2010)
Excellent! I'm now getting a duration of 1 ms and with only 3 reads!!Could you explain what's going on here to make it so much faster?
Thanks!
Which script?
The original or the one that Lutz provided?
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
April 2, 2010 at 5:48 pm
this one
WHERE LOW <= 982827279
AND LOW > 982827279 -1000
AND HIGH >= 982827279
April 2, 2010 at 5:57 pm
It uses better search arguments and also limits the result set. The better search argument allows better index use. If you include your graphical execution plan - you will probably see that the query is using an index now.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
April 3, 2010 at 3:26 am
cgreathouse (4/2/2010)
Excellent! I'm now getting a duration of 1 ms and with only 3 reads!!Could you explain what's going on here to make it so much faster?
Thanks!
....
this one
WHERE LOW <= 982827279
AND LOW > 982827279 -1000
AND HIGH >= 982827279
It is important to verify if the range value I used (based on guessing) will be larger than the HIGH-LOW difference you actually have and expect in the future. I don't know how much those can vary and the actual range though... If required, you might need to make that range value a parameter stored in a separate table and update it based on an AFTER INSERT/UPDATE/DELETE trigger.
April 7, 2010 at 1:13 am
Very nice, Lutz. I always have trouble with this one myself.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
May 30, 2010 at 8:22 pm
Well, this was working very well for a while (almost 2 months!). However I hit the potential problem that was mentioned earlier with needing to adjust the range.
Just a quick recap on the query...
SELECT *
FROM EffectiveItems
WHERE LOW <= 982827279
AND LOW > 982827279 -1000
AND HIGH >= 982827279
The -1000 works for a majority of the time. However, I hit this one just recently (using a value of 986879999)...
SELECT *
FROM EffectiveItems
WHERE LOW <= 986879999
AND LOW > 986879999-1000
AND HIGH >= 986879999
This wasn't returning any results. It turns out I had to change the 1000 to 1130001 to get results returned. So now querying for value 986879999 works and is still fast (~3 reads and 1 ms). However, if I go back and query for 982827279 the performance is horrible (~45,000 reads and ~2 seconds).
Any ideas on how to get better performance that works in both cases? As I mentioned before this table has a decent number of rows (~11 million) and for the most part is read only (updated once every 3 months).
Thanks!
May 31, 2010 at 8:02 am
cgreathouse (5/30/2010)
Well, this was working very well for a while (almost 2 months!). However I hit the potential problem that was mentioned earlier with needing to adjust the range.Just a quick recap on the query...
SELECT *
FROM EffectiveItems
WHERE LOW <= 982827279
AND LOW > 982827279 -1000
AND HIGH >= 982827279
The -1000 works for a majority of the time. However, I hit this one just recently (using a value of 986879999)...
SELECT *
FROM EffectiveItems
WHERE LOW <= 986879999
AND LOW > 986879999-1000
AND HIGH >= 986879999
This wasn't returning any results. It turns out I had to change the 1000 to 1130001 to get results returned. So now querying for value 986879999 works and is still fast (~3 reads and 1 ms). However, if I go back and query for 982827279 the performance is horrible (~45,000 reads and ~2 seconds).
Any ideas on how to get better performance that works in both cases? As I mentioned before this table has a decent number of rows (~11 million) and for the most part is read only (updated once every 3 months).
Thanks!
How many rows are returned for each of these?
have you tried OPTION (RECOMPILE)?
I would also set MAXDOP 1 as an option. Queries such as this should not be parallelized I think (assuming the new query is parallelized).
did you update all statistics with a full scan on the table? I would do this after each load period.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
May 31, 2010 at 9:29 am
Only 1 or 2 rows are going to be returned. In the actual query there is also an additional column I'm filtering on. Here's the query I originally started with...
SELECT *
FROM EffectiveItems
WHERE 982827279 between LOW and HIGH
and 20100505 between [BEGIN] and [END]
Looking at the execution plan I was getting a clustered index seek, but it was still taking a long time to complete.
After posting here and trying out some of the suggestions, then query then looked like...
SELECT *
FROM EffectiveItems
WHERE LOW <= 982827279
AND LOW > 982827279 -1000
AND HIGH >= 982827279
AND 20100505 between [BEGIN] and [END]
this was working well until I hit a value (i.e. 986879999) were no rows were being returned. That was fixed once I changed "-1000" to "-1130001 ". However the performance went downhill.
I haven't tried the OPTION (RECOMPILE) because this isn't in a stored procedure.
The MAXDOP 1 option didn't do much. It still takes about 64,000 reads and 1.3 secs.
It seems like there's got to be a way to speed this up. The combination of the columns LOW/HIGH/BEGIN/END are very selective. Only 1 or 2 rows are returned. I created a clustered index on the LOW/HIGH/BEGIN/END columns expecting that help speed things up because they're very selective when combined. However, it's still taking about 1.5 secs and 65,000 reads. I can also see that the clustered index is being used and it's doing an index seek.
Any other suggestions?
Thanks for the help!
June 12, 2010 at 11:36 am
First I think we should make it clear why this query is so hard for the SQL server.
The original query looks like this:
SELECT *
FROM EffectiveItems
WHERE 982827279 between LOW and HIGH
and 20100505 between [BEGIN] and [END]
This is equivalent to
SELECT *
FROM EffectiveItems
WHERE
LOW <= 982827279
and
HIGH >= 982827279
and
[BEGIN] <= 20100505
and
[END] >= 20100505
This is a query on the combination of four separate columns.
It is impossible to use more than one index for a query like this.
If you have an index on low, high, begin, end, the SQL server will only use the index to find all the rows with low<=982827279 - it will then scan all those rows to find the rows that satisfy the rest of the conditions.
So, depending on the value you are searching for the SQL server might have to scan all the rows in the table to find the very few rows that satisfy all the parameters.
One way to speed up a query like this is to picture this as a geometric problem. Imagine that you plot a point for each item in a diagram using low as the x-coordinate and high as the y-coordinate. All the items will now be represented as a large number of points that are positioned in a fairly narrow band above the line where y=x.
To find the items where low <= q and high >= q you will have to look in the square above and to the right of the point (q,q)
To do this efficiently you can divide the total set of items into a grid of squares. if you have 10 million points you can divide them into about 3000 squares with about 3000 points in each square.
You can then perform the query by first finding the set of squares that touches the wanted area and scan the points in these squares to find the points that actually satisfy the condition.
This is very efficient because there is such a small number of squares and such a small number of pints in each square.
The following script demonstrates the method:
-- let us play somewhere safe
use tempdb
-- Create some test data
-- I assume a uniform random distribution of values for low
-- I assume a cubic distribution of values for high-low (this gives many rows with a small difference, but some rows with a large difference)
-- The maximum value of high-low = 1 million
-- all values are between 0 and 100 million
if object_id('dbo.items') is not null drop table dbo.items
select identity(int, 1,1) as id, low, cast(low+d*d*d as int) as high
into dbo.items
from (
select top 10000000 ABS(CHECKSUM(newid()))%99000000 as low,(ABS(CHECKSUM(newid()))%100000)*0.001+1 as d
from sys.all_columns c1,sys.all_columns c2
) t
-------------------------------------------------------------------
-- Now preprocess the test data to increase query performance
-------------------------------------------------------------------
-- calculate a table of grid squares
-- the idea here is to create a grid so the number of used squares and the maximum number of items in a square is
-- of the same order of magnitude
-- adjust @d to achieve this balance with your data
declare @d int = 100000
if object_id('dbo.grid') is not null drop table dbo.grid
select low*100000+high as grid, low*@d as low, high*@d+@d as high, rows
into dbo.grid
from (
select low/@d as low, high/@d as high, count(*) as rows
from dbo.items
group by low/@d, high/@d
) t
-- display the resulting number of grid rows and items per grid
select count(*) as gridrows from dbo.grid
select top 10 * from dbo.grid order by rows desc
-- Add a calculated column with the grid number to the original table
declare @sql as varchar(max)
set @sql='alter table dbo.items add grid as low/@d*100000+high/@d'
set @sql=replace(@sql,'@d', @d)
exec (@sql)
-- Add a non-clustered covering index to help query performance
create index ix1 on dbo.items(grid,low) include(id,high)
create unique index ix1 on dbo.grid(grid) include (low,high)
-- Add another covering index to test performance without using the grid column
create index ix2 on dbo.items(low) include(id,high)
go
----------------------------------------------------------
-- preprocessing is now finished - test query performance
----------------------------------------------------------
set statistics time on
set statistics io on
dbcc dropcleanbuffers
declare @q as int = 50000000 -- the value to query for
select id,low,high
from dbo.items
where grid in (
-- find all grid squares that includes the desired value
select grid from dbo.grid
where low<=@q and high>@q
)
and low <= @q and high >= @q
and (id % 1000) = 0 -- just to avoid getting too big result set
--Table 'items'. Scan count 78, logical reads 622, physical reads 3, read-ahead reads 168, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Table 'grid'. Scan count 1, logical reads 41, physical reads 2, read-ahead reads 39, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times:
-- CPU time = 0 ms, elapsed time = 18 ms.
go
-- perform a query without using the grid just to compare
dbcc dropcleanbuffers
declare @q as int = 50000000 -- the value to query for
select id,low,high
from dbo.items
where low <= @q and high >= @q
and (id % 1000) = 0 -- just to avoid getting too big result set
--Table 'items'. Scan count 3, logical reads 11483, physical reads 165, read-ahead reads 11418, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
-- SQL Server Execution Times:
-- CPU time = 857 ms, elapsed time = 2646 ms.
go
As you can see the performance difference between the grid method and a normal query is quite dramatic.
June 22, 2010 at 5:46 am
Just curious: Did you test my last suggestion ? How did it work for you ?
June 22, 2010 at 6:50 am
I haven't tried it yet. I'll let you know once I have.
Thanks for the idea!
June 22, 2010 at 9:29 pm
OK, I gave this a try on my data and it works pretty well. With a table of about 3 million rows, the query is executing in about 2 ms. However this only works when working with the low & high columns. I also have a begin & end columns that need to be referenced. Once I add those into the query my response times go back to ~500 ms.
Just to recap, here's what the table begin queried looks like...
CREATE TABLE [dbo].[EffectiveItems](
[ItemID] [int] NOT NULL, -- foreign key
[BEGIN] [int] NOT NULL,
[END] [int] NOT NULL,
[LOW] [int] NOT NULL,
[HIGH] [int] NOT NULL,
[Val1]int NOT NULL,
[Val2] int NULL,
[Val3] int NOT NULL,
[Val4] int NULL
)
the query I need to run looks something like this...
select ItemID, Val1, Val2, Val3, Val4
from EfectiveItems
where 982827279 between low and high
and 20090103 between [begin] and [end]
Any suggestions on how to get this query to use your technique?
Thanks!
June 23, 2010 at 12:17 am
This should work just fine:
-- Add a non-clustered covering index to help query performance
create index ix1 on dbo.EffectiveItems(grid, low)
include(ItemID, high, [begin], [end], Val1, Val2, Val3, Val4)
select ItemID, Val1, Val2, Val3, Val4
from dbo.EffectiveItems
where grid in (
-- find all grid squares that includes the desired value
select grid from dbo.grid
where low <= 982827279 and high > 982827279
)
and 982827279 between low and high
and 20090103 between [begin] and [end]
If the above code is still too slow, please post the actual execution plan.
Also, please post the result of this:
-- display the resulting number of grid rows and items per grid
select count(*) as gridrows from dbo.grid
select top 10 * from dbo.grid order by rows desc
June 23, 2010 at 8:11 am
Wow, that did it. The query now takes about 1ms to complete. Nice!!
Here are the results that you asked for...
gridrows
1810
gridlowhighrows
84040840484040000084050000037696
84090840984090000084100000037366
0010000037162
99200992099200000099210000033111
84110841184110000084120000031619
98000980098000000098010000028291
84400844084400000084410000028048
84070840784070000084080000027147
84060840684060000084070000026154
98050980598050000098060000025176
Thanks for the help!!
Viewing 15 posts - 16 through 30 (of 31 total)
You must be logged in to reply to this topic. Login to reply