June 5, 2008 at 3:01 am
Hi,
I have a table with appr. 10 million records.(history data)
While trying to retrieve records with a max query it takes around 4 minutes (appr. 20 lakh records)
I have included indexes ; clustered index on the data but so far no partioning has been done
Please let me know Will retieving data be faster if partioning is done?
And also steps for the same ...
Thanks!
June 5, 2008 at 3:14 am
Hi
Is your query using the clustered index?
Does your query have a where clause? If it does and you partition your data according to the condition in the where clause AND your query does not need to access more than 1 partition then i think partition might be helpfull. Whew:D
Other opinions most welcome..
"Keep Trying"
June 5, 2008 at 3:27 am
Yes teh indexes are used in the where clause, a glimpse of the query. Highlighted are the indexes used:
Createddatetime uses clustered index and sid uses non clustered
SELECT * FROM Ex A WHERE
( A.CreatedDateTime in (select max(CS.CreatedDateTime) from Ex CS where CS.conId = A.ConId group by Conid))
AND ( A.SId in (Select Max(CS.SId) from EX CS where CS.ConId = A.ConId and CS.CreatedDateTime = A.CreatedDateTime group by ConId))
June 5, 2008 at 6:38 am
There's "used" and there's "used"
Do the indexes get scanned or is there a seek operation against them? If scanned, how many rows are reported as being scanned?
Also, the code snippet you showed was using IN clauses with select statements. These should be converted into JOIN statements. They will perform better.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 5, 2008 at 7:00 am
Am using 'IN' since its a self join
June 5, 2008 at 7:17 am
Not a problem. Just alias the table. Self-join or not, you're looking at the equivalent of a cursor.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
June 5, 2008 at 11:28 am
jananik (6/5/2008)
Yes teh indexes are used in the where clause, a glimpse of the query. Highlighted are the indexes used:Createddatetime uses clustered index and sid uses non clustered
SELECT * FROM Ex A WHERE
( A.CreatedDateTime in (select max(CS.CreatedDateTime) from Ex CS where CS.conId = A.ConId group by Conid))
AND ( A.SId in (Select Max(CS.SId) from EX CS where CS.ConId = A.ConId and CS.CreatedDateTime = A.CreatedDateTime group by ConId))
you may want to try:
With CTE
AS
( SELECT col1,.... -- Use column names not "*"
ROW_NUMBER() OVER ( Partition by CondId, order by createdDateTime DESC, Sid DESC) rn
)
SELECT col1,.... -- Use column names not "*"
FROM CTE
where rn = 1
* Noel
June 5, 2008 at 11:47 am
Actually, the only thing that I see wrong with your query is that you are using GROUP BY in your subSelects when you do not need to.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
June 5, 2008 at 3:34 pm
Try this, see if it doesn't speed up the query a bit:
;with
CTE_CreateTime (ConID, Created) as
(select conid, max(createddatetime)
from Ex
group by conid),
CTE_SID (ConID, Created, [SID]) as
(select ex.conid, ex.createddatetime, max([sid])
from Ex
inner join CTE_CreateTime cte1
on ex.conid = cte1.conid
and ex.createddatetime = cte1.created
group by ex.conid, ex.createddatetime)
SELECT a.*
FROM Ex A
inner join CTE_SID
on A.conid = cte_sid.conid
and a.createddt = cte_sid.created
and a.[sid] = cte_sid.[sid]
I recommend replacing the "select *" with a specific list of columns. Even if it's the whole table, it's a good idea, for various reasons.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 5, 2008 at 7:49 pm
GSquared (6/5/2008)
Try this, see if it doesn't speed up the query a bit:
;with
CTE_CreateTime (ConID, Created) as
(select conid, max(createddatetime)
from Ex
group by conid),
CTE_SID (ConID, Created, [SID]) as
(select ex.conid, ex.createddatetime, max([sid])
from Ex
inner join CTE_CreateTime cte1
on ex.conid = cte1.conid
and ex.createddatetime = cte1.created
group by ex.conid, ex.createddatetime)
SELECT a.*
FROM Ex A
inner join CTE_SID
on A.conid = cte_sid.conid
and a.createddt = cte_sid.created
and a.[sid] = cte_sid.[sid]
Actually, this gives almost identical performance and query plans as the "No Group By" version:
SELECT *
FROM Ex A
WHERE A.CreatedDateTime in (select max(CS.CreatedDateTime) from Ex CS
where CS.conId = A.ConId
--group by Conid --dont need these
)
AND A.SId in (Select Max(CS.SId) from EX CS
where CS.ConId = A.ConId
and CS.CreatedDateTime = A.CreatedDateTime
--group by ConId --dont need these
)
The "No Group By" version is about 00.01% faster according to the query plans because it eliminates an insignificant Sort step. Both are about 10% faster than the original.
To do any better, we really need to see your table DDL and your query plan.
Both are about
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
June 12, 2008 at 3:47 pm
Hi Everyone,
Just to confuse the issue, given that you are working with historical (and therefore I assume relatively static) data, it sounds like you may have a good case for a Data Warehouse solution and might wish to concider the use of Analysis Services as a more robust approach to working with your data.
Cheers,
John
June 12, 2008 at 7:21 pm
jananik (6/5/2008)
While trying to retrieve records with a max query it takes around 4 minutes (appr. 20 lakh records)
That's 2,000,000 rows that you're returning to the screen... don't measure how long that takes... measure how long it takes to insert into a table with no indexes using SELECT/INTO... see how long that takes... better measure would just be to select into variables that throw the data away... that would be a closer measure of how the server is going to handle things
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply