January 6, 2014 at 6:38 pm
Hello, all you smart people out there. I have a performance problem on SQL Express that I hope you can help me improve.
In a nutshell, I have a single table with a geography column that represents points on a map (there are other columns too) where I have the need to relate points within the table against others in the same table based on their distance to each other (and other criteria).
For example, take a random point in the table. I have to find all the other points within the table that are within a specified distance. Now, take that example and multiply it by all the other points within the same table.
It is not a specific point that I am looking for, it is the distance relationship that I need and I am trying to represent this graphically.
The process is not so bad when you are talking only 100's of points; but when you bring in 1,000's, it takes a long time to process.
I have followed the advice of this TechNet article (Tuning Spatial Point Data Queries) even to the point of changing to the Spherical Earth Model SRID 104001. However, the “Self Join” when using STDistance, is still slow even with these recommendations (seconds and minutes vs. milliseconds).
My goal is to try to get the processing down to milliseconds so that I can build a fast histogram using a scale of distances. I have developed a test database with scripts to recreate the DB with random data. Although the scripts are long, I can share them with you if need be.
This might just be a lost cause and I will have to settle for queries that take hours to complete but I am hoping someone out there might have a trick or two for me to try.
Thanks in advance,
Mike
p.s. I am adding a zip with the Create & Load scripts and some scripts I have been using to test performance.
January 6, 2014 at 10:54 pm
Have you looked at the execution plans for the queries? Is it taking advantage of indexes you have in place? If not, can you adjust the indexes, add new ones, whatever. Without the knowledge of what's happening with the query directly, that's about all I can suggest.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 7, 2014 at 5:18 am
Grant Fritchey (1/6/2014)
Have you looked at the execution plans for the queries? Is it taking advantage of indexes you have in place? If not, can you adjust the indexes, add new ones, whatever. Without the knowledge of what's happening with the query directly, that's about all I can suggest.
Hello Grant, thanks for being the first responder. Yes, I have checked the execution plans and the indexes that I have in place are being used. In addition, I consult the dynamic management views regularly to help identify improvement opportunities. Keep in mind, I am running this DB on SQL 2012 Express so unfortunately I do not have access to the Query Profiler or the Index Tuning Wizard.
January 7, 2014 at 6:02 am
So, you're not getting table scans, but the performance is still really bad? It may just be your process then.
Also, even though it's Express, you can use extended events or trace events, just no GUI. And never, ever, trust the Tuning Advisor. Plus the Tuning Advisor never works well with things like spatial indexes.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 7, 2014 at 8:55 am
Grant Fritchey (1/7/2014)
So, you're not getting table scans, but the performance is still really bad? It may just be your process then.
Here is the base for my process. This query, although relatively fast, still takes 4 seconds to process 1000 rows. This example uses just one distance criteria. When I run it against a scale of distances, the time adds up, not to mention that it slows as the distance criteria increases in size. I have attached a pic of the execution plan, I hope it comes through okay.
SELECT BatchID, Distance, Shape, Shape.STNumGeometries() AS Clusters
FROM (
SELECT BatchID, 0.25 AS Distance, geography::UnionAggregate(Cluster) AS Shape
FROM (
SELECT BatchID, PointID, geography::ConvexHullAggregate(Location) AS Cluster
FROM (
SELECT A.BatchID AS BatchID
,A.PointID AS PointID
,B.LatLon AS Location
FROM dbo.Points AS A
INNER JOIN dbo.Points AS B
ON A.BatchID = B.BatchID
WHERE A.LatLon.STDistance(B.LatLon) <= (0.25 * 1609.344 / 6378137.0)
AND A.BatchID = 1
) Clusters
GROUP BY BatchID, PointID
) Shapes
GROUP BY BatchID
) Results;
January 7, 2014 at 9:15 am
On the first sight this seems to be a good candidate for Common Table Expression http://msdn.microsoft.com/en-us/library/ms175972.aspx . Maybe worthy to try.
January 7, 2014 at 9:47 am
radek.hruby (1/7/2014)
On the first sight this seems to be a good candidate for Common Table Expression http://msdn.microsoft.com/en-us/library/ms175972.aspx . Maybe worthy to try.
I admit that CTE's are not my strong suit but I am not sure I see where it would help with performance.
January 7, 2014 at 9:58 am
I pressume that CTE might help with tempdb utilization where I'd say that tempdb might be used for temporary storing sub-queries results. I'm looking at the execution plan and wondering what those exclamation marks means on index seeks. The cost of those seeks is pretty low but anyway.
Next thing - why do you compute this value when it is a constant (0.25 * 1609.344 / 6378137.0) = 0.00006308048886 ? Again, the cost seems to be zero but why?
I'm not extremely experienced in query tuning so please bear my questions/recommendations rather as a sort of "where to look" than "do this"? Would you be able to share the example database somewhere? It's interesting area and I'd like to do some testing ... if possible.
January 7, 2014 at 10:30 am
radek.hruby (1/7/2014)
I'm looking at the execution plan and wondering what those exclamation marks means on index seeks.
The exclamation marks are giving me a warning that there are no statistics on the geography column (I am assuming that is normal).
Next thing - why do you compute this value when it is a constant (0.25 * 1609.344 / 6378137.0) = 0.00006308048886 ? Again, the cost seems to be zero but why?
When I converted over to using the Spherical Earth Model I had to change the numbers I use for distance. The 0.25 represents the distance in miles, the 1609.344 the meters in a mile and the 6378137.0 the earths diameter in meters. I felt it easier to read and reuse instead of calculating it out to a constant everywhere. If I want to adjust to 1/2 mile I only have to change what is familiar.
I am still learning too and I take no offense from the inquiries. I am living with the process now, although when some queries take hours to run it starts to become unbearable. I have been tinkering in my test database trying to come up with alternate solutions that yield the same results. I will have to clean it up a bit and then decide how to post it all.
Mike
January 7, 2014 at 11:00 am
I'm overseas currently. I'm going to have to get back to you on the details here.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 7, 2014 at 11:25 am
I added a zip file to my first post that has what you need to recreate my testing DB.
Mike
January 8, 2014 at 1:49 am
Are you on the latest service pack and cumulative update? I still don't have time to really dig into this, but thinking about the missing statistics, I'm wondering if you're hitting a bug. You should have statistics. It's not right that they're not there. Do you have auto_stats or auto_update disabled?
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 8, 2014 at 7:55 am
I'm wondering if it would help to let SQL Server know somehow that [Lat] and [Lon] are actually related to LatLon (geography)?
Maybe something like computed persisted columns:
...,
Lat AS ([LatLon].[Lat]) PERSISTED NOT NULL,
Lon as ([LatLon].[Long]) PERSISTED NOT NULL
Then it would be beneficial to use them in the primary key.
I have a feeling that the query optimizer doesn't find this relation when defined as:
Lat [float] NOT NULL,
Lon [float] NOT NULL
I'm not sure though.
January 8, 2014 at 1:55 pm
radek.hruby (1/8/2014)
I'm wondering if it would help to let SQL Server know somehow that [Lat] and [Lon] are actually related to LatLon (geography)?Maybe something like computed persisted columns:
...,
Lat AS ([LatLon].[Lat]) PERSISTED NOT NULL,
Lon as ([LatLon].[Long]) PERSISTED NOT NULL
Then it would be beneficial to use them in the primary key.
I have a feeling that the query optimizer doesn't find this relation when defined as:
Lat [float] NOT NULL,
Lon [float] NOT NULL
I'm not sure though.
I am not sure either. The tuning guide that I mentioned in my first post suggested the Lat & Long be included in the primary key but didn't say that computed versions were required. I will have to test it out and see if there is a difference.
January 8, 2014 at 2:11 pm
Grant Fritchey (1/8/2014)
Are you on the latest service pack and cumulative update? I still don't have time to really dig into this, but thinking about the missing statistics, I'm wondering if you're hitting a bug. You should have statistics. It's not right that they're not there. Do you have auto_stats or auto_update disabled?
I thought I had installed the latest version but it appears that I do not have SP1. I will have to update and see if that makes a difference.
Thanks.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply