February 10, 2004 at 12:09 pm
I am running the code below. The table "tblGU" has approximately 188k records in it. I have three indexes created that I feel are valid for this query. I have a single index for the site_id field, another index for the init_site_id field, and one other index for the site_id and init_site_id fields combined. This query takes approximately 3 seconds to execute, which I feel is way too long. The query ends up inserting 63k rows into @tblGU as it sits below. When viewing the execution plan, I see it is using the index that contains both fields (site_id and init_site_id). However, when I add more fields to the insert and select statement, it uses the two seperate indexes (the one solely for site_id, and the one solely for init_site_id). Any help would be appreciated. Hopefully any insight I learn here will help me modify some othe indexes and queries in our application.
Thanks, Shawn..
-- Set site_id to whatever site we wish to grab the stats for
SET @site_id = 1
DECLARE @tblGU table
(
uid int
)
INSERT INTO @tblGU (uid)
SELECT uid
FROM dbo.tblGU
WHERE site_id = @site_id
OR init_site_id = @site_id
February 10, 2004 at 1:48 pm
Because of the OR, you'll probably find a UNION faster, as it can seek (instead of scan) on each of the two indexed columns. 3 seconds for inserting 63,000 rows seems pretty fast to me, though. With the query as written, you could certainly speed things up by using covering indexes, but then you write that you "add more fields to the insert."
--Jonathan
February 10, 2004 at 1:57 pm
That is correct. I simplified the query for posting just to make it easier to read. There are about a dozen more fields that really get inserted into that tabla variable. I will try the union and see where that brings us.
February 10, 2004 at 2:01 pm
Also, the query takes 8 seconds to run when I add the rest of the fields to it. Huge difference.
The UNION did nothing at all for performance -- exactly the same. If this were the only query running for a particular web page, 8 seconds MAY be acceptable, however, it is one of many, which total take more than 30 seconds to execute, which just isn't acceptable. Seems crazy that it takes so long to run these queries with relatively minimal data compared to other companies who have tables with 10s of millions of records.
February 10, 2004 at 2:10 pm
So with a UNION it's still scanning the indexes, rather than seeking? The selectivity of your values may be so low that the optimizer won't seek. A result set of 63,000 rows shouldn't be described as "minimal" in any case; how is such a large result set of any use in a web page, anyway?
--Jonathan
February 10, 2004 at 2:14 pm
It is now doing an Index Seek, and no Bookmark lookups anymore.
The web page doesn't use all 63k in its output. What this stored procedure does insert all of the rows it needs to generate statistics into this table variable, which about 10 queries below it in the same stored procedure use to generate statistical data (aggregate funtions mostly on different sub-sets of this data). The reason this was done instead of querying the real table for all 10+ queries was because it was quicker. I did not want to create 10+ cover indexes on this table. Or is that not a bad idea?
February 10, 2004 at 2:27 pm
"An index seek"? It should be performing two index seeks or scans with the union statement, and that should be faster than using an OR, which will only scan. Your statement should be something like this:
INSERT INTO @tblGU (uid)
SELECT uid
FROM dbo.tblGU
WHERE site_id = @site_id
UNION
SELECT uid
FROM dbo.tblGU
WHERE init_site_id = @site_id
If you are aggregating against one column, try making that column the table variable's index. If not, you might be better off with an indexed temporary table rather than a table variable, as that will use statistics.
--Jonathan
February 10, 2004 at 2:48 pm
That is exactly what my query looks like, with the exception I am using "UNION ALL". In any case, it is doing 2 Index Seeks. Sorry for the misunderstanding saying it was doing only one.
I thought about switching it over to a temporary table which I can create indexes on earlier, just wasn't sure if this was a good idea, as I've heard that temporary tables aren't great to use, and was always told to use the table data type instead as it is done completely in memory. Will have to try out an indexed temporary table and see how that performs.
February 10, 2004 at 2:55 pm
Are you sure UNION isn't faster than using the OR? If the optimizer uses two index seeks, that's got to be a different execution plan from using an OR...
UNION ALL should be faster still, but, unless Init_Site_Id is never equal to Site_Id, you won't get the same result set as you would by using OR.
--Jonathan
February 10, 2004 at 2:59 pm
Even changing it to UNION, it still takes 2-3 seconds to execute with only the UID field, the same as with the OR.
February 10, 2004 at 3:12 pm
> I thought about switching it over to a temporary table which I can create indexes on earlier, just wasn't sure if
> this was a good idea, as I've heard that temporary tables aren't great to use, and was always told to use the
> table data type instead as it is done completely in memory. Will have to try out an indexed temporary table and
> see how that performs.
Table variables are not necessarily in memory only, and chances of this are slim with 63,000 rows. If you are indeed selecting from this table with multiple predicates or aggregates, a temporary table could well be faster. The INSERT may be slower (due to logging), but the subsequent SELECTS should be faster with the proper indexes.
--Jonathan
February 11, 2004 at 3:26 am
Several years ago I worked with ORACLE database, and I learned one thing: indexes are not always the best solution. Depending on the selectivity of indexed columns, there are cases when a full table scan is a better option than using the index. I think this problem is a good example, for two reasons:
1. The query returns approximately 1/3 of the rows in the tblGU table (although there are 2 separate queryes).
2. There are other columns to retrieve, and that requires additional lookup into the data pages of the table, once the index entry is located.
I tested a little bit with a table with 250K records, having 3-4 distinct values for site_id and init_site_id. The optimizer always choose the table scan access method to get the result rows, instead of using the the index. So, my oppinion is that you should:
- update statistics on the table and indexes,
- defragement table and indexes,
- consider the possibility of avoiding the use of indexes and execute the query using full table scan,
- consider different approaches for different values, depending on their distribution (frequen values should use full table scan, whilerare values should use undexes).
There is a INDEX hint, to force use of some index. Unfortunatelly, I didn't find a hint to AVOID use of an index. If MS SQL Server does not provide such a hint, then it seems that MS did not learn the lesson from ORACLE completelly: when you put a switch to force use of something, always provide some way to avoid use of that. For example, in ORACLE, if you want to avoid use of an index on some column, the column name in WHERE condition is replaced with some expression that does not change the meaning of the query, such as 0 + column for number columns or ' ' + column for character columns.
At the end, you might end up without a solution that will speed up that query. I also think 63K rows for 3 secons is a good execution time. My test resulted in almost same execution time both for index scan (forced by a hint) and full table scan. 3 seconds are 10% of those 30 seconds, so maybe you should check those statistical queryes, too.
I hope this will be at least of some help,
Regards,
Goce Smilevski.
February 11, 2004 at 7:49 am
Thanks so much for all of your replies.
I am going to try some of your suggestions, Goce, and see what happens.
February 11, 2004 at 8:06 am
SQL Server's query optimizer will of course use a table scan when selectivity warrants. If seeks can be performed on indexes, the cardinality would have to be extremely high, likely higher than with your data. Just using OR does cause a table scan in this case, of course, and you have not quantified the difference found between OR and UNION (and I do not accept that they are "exactly the same").
One can force a table scan in SQL Server with the well-documented WITH (INDEX(0)) hint. I strongly suggest you not incorporate hints like this; it will certainly cause poor performance as your search values change.
I'm still intrigued by your assumption that inserting 63,000 rows should take considerably less than 3 seconds. What is this based on?
--Jonathan
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply