July 18, 2016 at 2:21 am
Hi,
I have the following query
select dp.FarmerCropId,
am.ActivityId,
am.ActivityName,
cfm.customformid,
getutcdate() talendate,
aadt.AttributeDataTypeId,
aadt.DataTypeId,
aadt.SequenaceNo,
aadt.groupid,
datatypename,
AttributeId,
AttributeName,
fcaa.reading
from ActivityMaster am
join (select activityid,
isactive,
AttributID,
AttributeDataTypeId,
DataTypeId,
SequenaceNo,
groupid from dbo.activityattributedatatype) aadt
on aadt.activityid = am.activityid
join CustomFormMapping cfm
on cfm.activityid=am.activityid
AND 1=(Case when 1 = 0 then 1 When 1 = 1 and cfm.isactive = 1 then 1 END )
join (select FarmerCropId,varietyid from etl.plot ) dp
on cfm.CropTypeId = dp.VarietyId
left join (select FarmerCropId,
ActivityID,
FarmerCropActivityID,
CompletionDate,
LastModifiedDate,isactive from pactivity) fca
on dp.farmercropid = fca.farmercropid
and fca.activityid = am.activityid
AND 1=(Case when 1 = 0 then 1 When 1 = 1 and fca.isactive = 1 then 1 END )
join vw_datatype dt
on aadt.datatypeid = dt.datatypeid
join activityattributesmaster aam
on aadt.AttributID = aam.AttributeID
AND 1=(Case when 1 = 0 then 1 When 1 = 1 and aam.isactive = 1 then 1 END )
left join (select attributedatatypeid,farmercropactivityid from pactivityattributes) fcaa
on
fcaa.attributedatatypeid = aadt.attributedatatypeid
and fcaa.farmercropactivityid = fca.farmercropactivityid
where aadt.isactive = 1 and
am.iscustomform = 1
The last left join with farmercropactivityattributes table iss the problem. There is a composite non clustered index on attributedatatypeid,farmercropactivityid of the farmercropactivityattributes table. On viewing the execution plan it shows there is 66% cost on the same.
The table has a clustered index as well.
There is a decent performance until joined with farmercropactivityattributes table. But, once I join this table, the performance goes down. I cant use a covering index because the Reading column in farmercropactivityattributes table is a nvarchar(max) column.
the query takes 3 minutes to run.
I am testing this solo, no other users are accessing these tables.
Any help in improvising this would be of super help
thanks
Rathi
July 18, 2016 at 2:36 am
Table definitions, index definitions and execution plan please.
What exactly is AND 1=(Case when 1 = 0 then 1 When 1 = 1 and fca.isactive = 1 then 1 END )
supposed to do? It can be simplified as just AND fca.isactive = 1
, which is likely to be a lot easier for the optimiser to handle.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 18, 2016 at 2:37 am
Rathi
It may not help performance, but it would simplify your query if you removed those subqueries and joined directly to the tables, and if you get rid of the CASE expressions and change the whole line to [font="Courier New"]AND xxx.isactive = 1[/font].
If performance is indeed the same after you've done that, please post the actual execution plan and the definition of any views that are used in the query.
John
July 18, 2016 at 3:43 am
Hi,
The query is
select dp.FarmerCropId,
am.ActivityId,
am.ActivityName,
cfm.customformid,
getutcdate() talendate,
aadt.AttributeDataTypeId,
aadt.DataTypeId,
aadt.SequenaceNo,
aadt.groupid,
datatypename,
AttributeId,
AttributeName,
fcaa.reading
from dbo.ActivityMaster am
join (select activityid,
isactive,
AttributID,
AttributeDataTypeId,
DataTypeId,
SequenaceNo,
groupid from dbo.activityattributedatatype) aadt
on aadt.activityid = am.activityid
join dbo.CustomFormMapping cfm
on cfm.activityid=am.activityid
AND 1=(Case when @parm_Active = 0 then 1 When @parm_Active = 1 and cfm.isactive = 1 then 1 END )
join (select FarmerCropId,varietyid from etl.plot ) dp
on cfm.CropTypeId = dp.VarietyId
left join (select FarmerCropId,
ActivityID,
FarmerCropActivityID,
CompletionDate,
LastModifiedDate,isactive from dbo.farmercropactivity) fca
on dp.farmercropid = fca.farmercropid
and fca.activityid = am.activityid
AND 1=(Case when @parm_Active = 0 then 1 When @parm_Active = 1 and fca.isactive = 1 then 1 END )
join dbo.vw_datatype dt
on aadt.datatypeid = dt.datatypeid
join dbo.activityattributesmaster aam
on aadt.AttributID = aam.AttributeID
AND 1=(Case when @parm_Active = 0 then 1 When @parm_Active = 1 and aam.isactive = 1 then 1 END )
left join (select attributedatatypeid,farmercropactivityid,reading from dbo.farmercropactivityattributes) fcaa
on
fcaa.attributedatatypeid = aadt.attributedatatypeid
and fcaa.farmercropactivityid = fca.farmercropactivityid
where aadt.isactive = 1 and
am.iscustomform = 1
Sorry the case expressions are for this purpose
case when @parm_active = 1 and isactive = 1 then 1 when @parm_active = 0 and isactive = 0 then 1 end
So when @parm_active = 1 it picks data that has isactive = 1 and when 0 picks isactive = 0 rows.
The business requirement is to fetch isactive = 0 or isactive = 1 rows. So, I used the case statement to accomodate both of them
The subqueries like 'select attributedatatypeid,farmercropactivityid from pactivityattributes',
'select FarmerCropId,varietyid from etl.plot ) dp
on cfm.CropTypeId = dp.VarietyId
left join (select FarmerCropId,
ActivityID,
FarmerCropActivityID,
CompletionDate,
LastModifiedDate,isactive from dbo.farmercropactivity'
are used, so that the index that contains attributedatatypeid,farmercropactivityid is used and table need not be accessed
The index definitions are as follows
create nonclustered INDEX IDX_ActivityMasteriscustom on ActivityMaster(iscustomform);
primary key on activitymaster(activityid)
create nonclustered INDEX IDX_plotvariety on etl.plot(VarietyId);
create index idx_varietyfarmercropid on etl.plot(varietyid,FarmerCropId)
create clustered index idx_farmercropid on etl.plot(FarmerCropId)
create nonclustered index idx_comp activityattributedatatype(ActivityID, IsActive, AttributID, AttributeDataTypeID, DataTypeID, SequenaceNo, GroupID)
primary key on activityattributedatatype(AttributeDataTypeID)
create nonclustered index idx_activityid on activityattributedatatype(ActivityID)
create nonclustered index idx_fcaa_attrid on farmercropactivityattributes(attributedatatypeid)
create nonclustered index idx_fcaa_activityid on farmercropactivityattributes(farmercropactivityid)
create nonclustered index idx_comp_attrreading on farmercropactivityattributes(AttributeDataTypeID, FarmerCropActivityID)
primary key farmercropactivityattributes(FarmerCropActivityAttributeID)
create clustered INDEX IDX_etlplotuser on etl.plot_user(farmercropid);
The view vw_datatype is a simple select on a table that contains only 4 rows---
DataTypeID
DataTypeName
DataTypeDesc
LastModifiedDate
IsActive
The actual execution plan is attached on this reply
July 18, 2016 at 4:01 am
It does help if you post the actual query...
Add OPTION(RECOMPILE) to the end if the query and test again. The case statements as they are ensure that the optimiser cannot generate a good plan for all cases, so it will have to generate a sub-optimal plan
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 18, 2016 at 4:38 am
Would [font="Courier New"]AND xxx.isactive = @parm_active[/font] not do it?
Are you saying that if you put the table names in instead of the subqueries, you get a different, less efficient execution plan?
You've posted in the SQL Server 2005 forum. Is that the actual version you're using? If not, which version, and what compatibility level is your database?
John
July 18, 2016 at 5:26 am
Ahhh.. apologies, posted in a hurry and didn't see the forum version. I am using sql server 2012
I have posted the query in previous reply.
The removal of case statement and just putting in isactive = 1 in the where clause gives the result in less than 45 seconds.
But, if I have to accomodate both @parm_active = 1 and @parm_active = 0,how do I accomplish this??
My intention is to use a single query for both the cases. And that's why the usage of 'case'
The OPTION (RECOMPILE) is very good. It is indeed making my query run very fast
Also, I am creating the etl.plot table ,populating and creating indexes on it just before I run this query. So, would it help if I ran UPDATE STATISTICS etl.plot table???
PFA the new plan (plan without case statement) for reference. This runs in 45 seconds and the previous with case ran for 3 minutes
Thanks
Rathi
July 18, 2016 at 5:31 am
rathimittha.mb (7/18/2016)
My intention is to use a single query for both the cases. And that's why the usage of 'case'
'Generic' queries such as that are typically slow. I think at this point your choices are to split it into two queries or use the recompile hint (adds compilation overhead on each execution)
The problem with generic queries is that usually the optimiser has to generate sub-optimal query plans to accommodate the multiple alternative ways the query will run.
Also, I am creating the etl.plot table ,populating and creating indexes on it just before I run this query. So, would it help if I ran UPDATE STATISTICS etl.plot table???
Not in the slightest.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 18, 2016 at 5:45 am
The OPTION RECOMPILE with CASE still generates a sub optimal plan. So I guess I need to consider writing into 2 separate queries
Also, under what conditions would I use UPDATE STATISTICS ?? I assumed it was right to use in the current situation of mine
Also,
lets say I have a query, select c1,c2,c3 from table. And I have a index on c1,c2,c3.
So when I use this in the join , should I join as
tab join (select c1,c2,c3 from table) on .....
OR
tab join table on....
My understanding is that the former ensures usage of index and picks only the columns that is required hence reducing the work of the optimizer
Thanks
Rathi
July 18, 2016 at 5:57 am
rathimittha.mb (7/18/2016)
Also, under what conditions would I use UPDATE STATISTICS ?? I assumed it was right to use in the current situation of mine
When the statistics are out of date due to data changes, which definitely won't be the case when the indexes were created (and hence the index statistics were created) in the previous step
lets say I have a query, select c1,c2,c3 from table. And I have a index on c1,c2,c3.
So when I use this in the join , should I join as
tab join (select c1,c2,c3 from table) on .....
OR
tab join table on....
The latter.
The first is just unnecessary complexity, more typing, harder to read, no advantage
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 18, 2016 at 6:00 am
Okay.
But, many a places on the internet suggest that only the required columns from a table must be retrieved and not all of them. If the select requires only 5 columns and there are 50 columns in the table. Wouldn't it be better to just select the required 5 and then join with any other table??
Thanks
Rathi
July 18, 2016 at 6:04 am
rathimittha.mb (7/18/2016)
Wouldn't it be better to just select the required 5 and then join with any other table??
No. It's better to write your query as simply as you can.
SELECT t1.col1, t1.col2, t2.col1, t2.col2, t2.col3, t2.col4, t2.col5
FROM t1 inner join t2 on ...
or
SELECT t1.col1, t1.col2, t2.col1, t2.col2, t2.col3, t2.col4, t2.col5
FROM (SELECT col1, col2 from t1) t1
INNER JOIN (SELECT col1, col2, col3, col4, col5 FROM t2) t2 ON ...
They do exactly the same thing, in exactly the same way, and both only retrieve the required columns from the tables. Which is easier to read and understand?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 18, 2016 at 6:15 am
Okay
But some places on the internet suggest that only the required columns must be taken and then joined to other tables. Say I require 5 columns from a 50 column table, wouldn't just selecting those 5 columns be more efficient??
Thanks
Rathi
July 18, 2016 at 6:24 am
Sorry about the same question being posted twice, the page did not refresh and I accidentally posted again.
I now get it.
Thanks a lot for the great help GilaMonster 🙂
Regards
Rathi
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply