March 17, 2010 at 1:43 pm
i have a table with 170million rows (tableA),
and a second table which have about 500,000 rows(tableB).
i want to compare a certain column from tableA (type bigint)
to check if a value of it exist in tableB.
i tried to do this using inner join, making a select with IN expression, but it runs for hours.
any idea how i can improve te result?
(the column from tableA dosent have an index, and the column from tableB is defined as primary key)
Thanks
Peleg
March 17, 2010 at 2:03 pm
You should try "WHERE EXISTS" instead of "IN".
Also, INTERSECT and EXCEPT might be better if you can't use EXISTS.
March 17, 2010 at 2:48 pm
i used the Exists and it seems to help,
the problem is that i split the compare into bulk of 1 million rows each times(which take 1.18 min), but when i try to make a query
with a parameter (@minid), the time increases :
select distinct ReasonRef from Con_RecipientToMessages (nolock)
where Con_RecipientToMessages.Id>=@minid and Con_RecipientToMessages.Id<= @minid+1000000
and exists (
select ReasonRef from #tblReasonList(nolock) where Con_RecipientToMessages.ReasonRef=#tblReasonList.ReasonRef
)
March 17, 2010 at 3:04 pm
Any reason why you cannot also create an index on table a, then drop it when you are done?
March 17, 2010 at 3:40 pm
i will check it,
but what can i do with the query where i use a variable?
March 17, 2010 at 3:48 pm
If you create the index, you may not need to break it up into smaller chunks. Creating an index on the id field will still improve response time, even with the parameters. Also, you may look at using a join instead of a subquery, esp when you have the index in place.
March 18, 2010 at 4:15 am
i added the index and it improved the query time,
i have 2 questions :
1)when i added an index - can i check how long it will take or approximatly how much was done?
2)when i am using parameters in the query it still slower then working with the values themselfs - any idea how can i improve it?
thanks
peleg
March 18, 2010 at 4:30 am
peleg k (3/18/2010)
1)when i added an index - can i check how long it will take or approximatly how much was done?
How long it will take - no.
How much was done - depends what you mean. How much what?
2)when i am using parameters in the query it still slower then working with the values themselfs - any idea how can i improve it?
Yes. Add the OPTION (RECOMPILE) hint as shown:
-- Something like this index should probably exist anyway
CREATE NONCLUSTERED INDEX nc1
ON dbo.Con_RecipientToMessages (Id)
INCLUDE (ReasonRef)
WITH (
FILLFACTOR = 100,
MAXDOP = 1,
ONLINE = OFF,
SORT_IN_TEMPDB = ON
);
-- Optimal index for the temporary table
-- Add UNIQUE constraint if it is valid
CREATE NONCLUSTERED INDEX nc1
ON #tblReasonList (ReasonRef)
WITH (
FILLFACTOR = 100,
MAXDOP = 1,
ONLINE = OFF,
SORT_IN_TEMPDB = ON
);
SELECT DISTINCT
CRTM.ReasonRef
FROM dbo.Con_RecipientToMessages CRTM
WHERE CRTM.Id BETWEEN @minid AND @minid + 1000000
AND EXISTS
(
SELECT *
FROM #tblReasonList RL
WHERE RL.ReasonRef = CRTM.ReasonRef
)
OPTION (RECOMPILE);
March 18, 2010 at 4:31 am
Could you post the actual execution plans with and without use of the parameters?
Edit: May be redundant after seeing Paul's post 🙂
March 18, 2010 at 4:36 am
HowardW (3/18/2010)
Could you post the actual execution plans with and without use of the parameters?
No worries Howard, it is a good point. It is always nice to see execution plans, but I am thinking that this example is sufficiently simple that we can assume it is a parameter sniffing issue, and OPTION (RECOMPILE) seems a good and simple fix here.
@peleg: If you want to know more about parameter sniffing, see Batch Compilation, Recompilation, and Plan Caching Issues In SQL Server 2005.
March 18, 2010 at 5:17 am
i will add the execution plan later beacuse i am in the middle of the task
but just to update you : when adding the OPTION (RECOMPILE); -> it made the job run up to
4 minutes, instead of 60-120 seconds
March 18, 2010 at 5:20 am
peleg k (3/18/2010)
i will add the execution plan later beacuse i am in the middle of the taskbut just to update you : when adding the OPTION (RECOMPILE); -> it made the job run up to
4 minutes, instead of 60-120 seconds
Make sure it is the plan from an actual execution please - an estimated plan omits many of the helpful details.
Are the statistics up to date on the large table?
March 19, 2010 at 2:11 am
i attached both execution plans, he first is with variables, and the second with numbers.
i went over them and didnt see any diffrence?!?
how do i check the last statistics update?
March 19, 2010 at 2:17 am
peleg k (3/19/2010)
i attached both execution plans, he first is with variables, and the second with numbers.i went over them and didnt see any diffrence?!?
Those are estimated plans.
March 19, 2010 at 3:05 am
i hope this time it ok
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply