December 8, 2008 at 1:32 am
Hi,
i have query as shown below
SELECT
'D_Account' AS SourceName
, 'Dimension' AS SourceType
, Fact.Account_MemberId_Label AS MissingLabel
, Fact.Datasource_MemberId_Label AS DataSource
, Fact.Scenario_MemberId_Label AS Scenario
, Sum(Fact.[Value]) AS [Value]
FROM dbo.COI_OSB_WINDOWS_FactData Fact
WHERE NOT EXISTS(SELECT D.Label FROM dbo.D_Account D
WHERE Fact.Account_MemberId_Label = D.Label)
AND Fact.Account_MemberId_Label != 'None'
GROUP BY Fact.Account_MemberId_Label, Fact.Datasource_MemberId_Label, Fact.Scenario_MemberId_Label
UNION
SELECT
'D_Allocation' AS SourceName
, 'Dimension' AS SourceType
, Fact.Allocation_MemberId_Label AS MissingLabel
, Fact.Datasource_MemberId_Label AS DataSource
, Fact.Scenario_MemberId_Label AS Scenario
, Sum(Fact.[Value]) AS [Value]
FROM dbo.COI_OSB_WINDOWS_FactData Fact
WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D
WHERE Fact.Allocation_MemberId_Label = D.Label)
AND Fact.Allocation_MemberId_Label != 'None'
GROUP BY Fact.Allocation_MemberId_Label, Fact.Datasource_MemberId_Label, Fact.Scenario_MemberId_Label
this union is jus for 2 tables, but i run a query where there are around 24 such tables
i just want to optimize this query as it take long time...
kindly help me out in optimizing it.
Regards
Imtiaz
December 8, 2008 at 1:39 am
Maybe I'm missing something, but the two queries unioned look the same.
Why do you need to union 24 results sets? There may be a better way.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 8, 2008 at 1:40 am
UNION ALL is faster than UNION. So, if u allow any duplicate data, then use UNION ALL.
Also u may try to tune each query individually, like
use WHERE NOT EXISTS(SELECT 1 FROM D_Allocation D...
instead of WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D..
Also grouping at the last of UNIONing 24 queries, not each individual queries.
December 8, 2008 at 2:10 am
Hi Gail,
The inner query contains different tables
December 8, 2008 at 2:12 am
Hi Arup,
wel can we do something instead of NOT EXIST
Thanks for the grouping trick
December 8, 2008 at 2:30 am
EACH GROUPING HAS DIFFERENT COLUMN IN IT
December 8, 2008 at 2:37 am
mohd.imtiaz (12/8/2008)
EACH GROUPING HAS DIFFERENT COLUMN IN IT
No need to shout.
Since there's no way that the resultsets can overlap (there's a literal string that's different in each one), use UNION ALL instead of union. Union means concatenate, sort resulting rows and eliminate duplicates. Union all just concatenates rows.
It may still be slow though. What indexes do you have on the fact tables?
Can you post the execution plan of one of the queries within the union? (saved as a .sqlplan file, zipped and attached)
Also u may try to tune each query individually, like
use WHERE NOT EXISTS(SELECT 1 FROM D_Allocation D...
instead of WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D..
There's no difference between those two. Exists does not do any data retrieval and ignores columns specified.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 8, 2008 at 3:06 am
Why dont you make use of a variable/temporary table to put all your data into and then retrieve the data using select *......
I assure you its more easy and better in performance compared to all that union your making use of...:)
December 8, 2008 at 4:08 am
Please don't type in all caps. It's the online equivalent of shouting at someone.
From what I can see from the exec plan, you have no indexes at all on a table with 6 million rows. That's going to be slow no matter what. You needs some effective indexes if this is going to run any faster than it is now.
Can you please post the table definition (create statement) and the definitions of the indexes on that table?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 8, 2008 at 6:40 am
You've got a scan going against 6 million rows and returning nothing. That's a problem.
I'd suggest checking the indexes on COI_AppDb. Also, the <> construct can cause problems with index usage.
That's before you get to the UNION. Gail already made the best suggestion of all there. Since each query is already returning a unique result set, use UNION ALL.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 10, 2008 at 12:08 am
mohd.imtiaz (12/9/2008)
Hi Gail,here is the table defination
Index definitions?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 10, 2008 at 12:21 am
HI Gail,
Want 2 know something.
"Also u may try to tune each query individually, like
use WHERE NOT EXISTS(SELECT 1 FROM D_Allocation D...
instead of WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D..
There's no difference between those two. Exists does not do any data retrieval and ignores columns specified."
I have a notion than "SELECT D.Label" will take more system resources than "SELECT 1". And since EXISTS works on boolean value, hence no difference between the two but the performance is better for "SELECT 1"
I have actually tested it and SELECT 1 gives lesser duration.
December 10, 2008 at 12:25 am
HI Imtiaz,
EXISTS is the fastest operator to check existence. It works on boolean value and whenever finds any TRUE it terminates. Also, it is not taking the NULL values.
December 10, 2008 at 12:51 am
arup_kc (12/10/2008)
I have a notion than "SELECT D.Label" will take more system resources than "SELECT 1". And since EXISTS works on boolean value, hence no difference between the two but the performance is better for "SELECT 1"
Exists ignored columns passed to it. There's no difference in the way it executes if a column is passed, or if a constant is passed. During the parsing phase, all columns are stripped out of an exists so by the time it gets to the optimiser, there's no columns listed.
I have actually tested it and SELECT 1 gives lesser duration.
Post your test?
In the meantime, here's mine. I ran each one several times, the cpu times of all three vary from around 290 ms to about 422 ms and elapsed time from 1400 to 1600 ms.
Setup first
Create table #TestingExists1 (
id int identity primary key,
somestring char(2)
)
insert into #TestingExists1 (somestring)
select top 1000000 CHAR(97+ FLOOR(RAND(a.number * 5412)*26)) + CHAR(97+ FLOOR(RAND(b.number * 3462)*26))
from master..spt_values a cross join master..spt_values b
where a.name is null and b.name is null
select COUNT(distinct somestring) from #TestingExists1 -- 676 distinct values
select distinct top 100 somestring
into #testingExists2
from #TestingExists1
go
And testing. Done on SQL 2008 Developer edition RTM x64
set statistics io on
go
set statistics time on
go
-- Exists with a column
select * from #TestingExists1 where exists (select somestring from #testingExists2 where #testingExists2.somestring = #TestingExists1.somestring) option (maxdop 1)
/*
Table '#testingExists2'. Scan count 1, logical reads 2
Table '#TestingExists1'. Scan count 3, logical reads 1860
Table 'Worktable'. Scan count 0, logical reads 0
SQL Server Execution Times:
CPU time = 328 ms, elapsed time = 1437 ms.
*/
-- exists with *
select * from #TestingExists1 where exists (select * from #testingExists2 where #testingExists2.somestring = #TestingExists1.somestring) option (maxdop 1)
/*
Table 'Worktable'. Scan count 0, logical reads 0
Table '#TestingExists1'. Scan count 1, logical reads 1860, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#testingExists2'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 359 ms, elapsed time = 1426 ms.
*/
-- exists with 1
select * from #TestingExists1 where exists (select 1 from #testingExists2 where #testingExists2.somestring = #TestingExists1.somestring) option (maxdop 1)
/*
Table 'Worktable'. Scan count 0, logical reads 0
Table '#TestingExists1'. Scan count 1, logical reads 1860, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#testingExists2'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 344 ms, elapsed time = 1433 ms.
*/
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 10, 2008 at 12:56 am
arup_kc (12/10/2008)
HI Gail,Want 2 know something.
"Also u may try to tune each query individually, like
use WHERE NOT EXISTS(SELECT 1 FROM D_Allocation D...
instead of WHERE NOT EXISTS(SELECT D.Label FROM D_Allocation D..
There's no difference between those two. Exists does not do any data retrieval and ignores columns specified."
I have a notion than "SELECT D.Label" will take more system resources than "SELECT 1". And since EXISTS works on boolean value, hence no difference between the two but the performance is better for "SELECT 1"
I have actually tested it and SELECT 1 gives lesser duration.
If you check the query plan for all versions (using a constant such as 1, using a star (*) or using a column name), you'll find out that it has the exact same query plan. Since the query is being performed the exact same way each time, and each time it will pass to the client the same amount of data, I don't see how come one version is faster then the other. Can you explain how did you test it?
Adi
--------------------------------------------------------------
To know how to ask questions and increase the chances of getting asnwers:
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply