Can I move sorting and avoid Order By with Clustered Index on an empty table?

  • hi list;

    1) Is it cheaper to impose order on a result if a clustered index is created on an empty table instead of ordering the result at the end of a select statement?

    2) does a clustered index on an empty table move the ordering work to inserting instead of retrieval?

    3) How would i test each assertion to see for myself?

    e.g. for lets say 1,000,000 rows, is

    Create table #t1(c1 char(5), c2 datetime, c3, datetime)

    create index idx_1 on #t1(c1, c2, c3)

    insert #t1...1,000,000 rows

    faster than

    insert #t1...1,000,000 rows

    select * from #t1

    order by c1, c2, c3

    i have a stored procedure that uses a huge unindexed table to conform a data set. Then it runs a bunch of select statements on segments of that table, all with order by, and i can see by the execution plan that the order bys are expensive, so i thought i could take the ordering hit up front while filling the conforming table, then do the selects without the order by, hoping the clustered index implicity returns the result in order?

    good? no good?

    the whole thing is dog slow, so this was my first attempt at speeding it up

    thanks very much

    drew

  • You can't depend on a clustered index to return data in a specific order without the order by. With the order by and the index, SQL will be able to avoid the sorts in some cases.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • drew.georgopulos (8/10/2011)


    hi list;

    Um, hi pen?

    1) Is it cheaper to impose order on a result if a clustered index is created on an empty table instead of ordering the result at the end of a select statement?

    Same cost if it's done each time. You can get some gains by having the clustered in place and then forcing the same order if the optimizer likes the idea, but without the order by you can never guarantee that order in the result set.

    2) does a clustered index on an empty table move the ordering work to inserting instead of retrieval?

    Nope, now it's done on both sides, once on inbound and once on outbound, if you want to guarantee the order on outbound. However, as mentioned, the optimizer may attempt to make use of the existing index if it thinks it will cost less then a sort at the tail of the process.

    3) How would i test each assertion to see for myself?

    A lot of benchmarking code to try all the different ways.

    e.g. for lets say 1,000,000 rows, is

    Create table #t1(c1 char(5), c2 datetime, c3, datetime)

    create index idx_1 on #t1(c1, c2, c3)

    insert #t1...1,000,000 rows

    faster than

    insert #t1...1,000,000 rows

    select * from #t1

    order by c1, c2, c3

    If the entire purpose of the temp table is merely to allow for the order by, this is not going to speed things up except under extreme circumstances which I'm currently not sure how to recreate. You've decided to write to another database before manipulating the data, instead of working with the existing dataset already prepared in memory. If the #temp is built for other reasons, yes, a clustered index might help on the final delivery, but it will be more expensive going into the table unless you're careful about the order it goes into it in.

    i have a stored procedure that uses a huge unindexed table to conform a data set. Then it runs a bunch of select statements on segments of that table, all with order by, and i can see by the execution plan that the order bys are expensive, so i thought i could take the ordering hit up front while filling the conforming table, then do the selects without the order by, hoping the clustered index implicity returns the result in order?

    Why are the selects ran with order by? Does this proc deliver multiple resultsets to a front end? That's really the only place that can handle multiple outbound rowsets. However, if the order by's are needed, you won't be able to avoid them, but because they're used multiple times, you may get some benefit from the table being pre-sorted to your preference. The best I can offer here is TEST. It will depend on the optimizer's decisions if it wants to use your clustered on the #tmp.

    good? no good?

    the whole thing is dog slow, so this was my first attempt at speeding it up

    thanks very much

    drew

    If you'd like further optimization help, may I recommend you take a look at the second link down on the left side of my signature, it will walk you through what we'd need to assist you in optimizing the procedure.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • thanks very much

    i was trying to avoid the overhead i thought i saw in the execution plan, but something you said piqued my curiosity:

    could it be possible to coerce the result into the clustered index's order by using an index hint instead of the order by?

    i will have to see if that creates a different plan

  • Thanks very much, i will take a look at the resource you mention.

    Performance is not acceptable.

    I appreciate your help and will be back.

    drew

  • Please post the actua execution plan(s) and we'll have a look.

  • there's a lot of funk in the sprocs, so i uploaded them too.

    im sure the examples' submission could be improved, but i hope they are adequate to illustrate the issue.

    thanks very much for asking.

    drew

  • drew.georgopulos (8/10/2011)


    there's a lot of funk in the sprocs, so i uploaded them too.

    im sure the examples' submission could be improved, but i hope they are adequate to illustrate the issue.

    thanks very much for asking.

    drew

    Have you tried running all of that in the txt file you uploaded? I just took a look and I'm not even going to touch it. It looks like a jumble of text with no real resemblence of code, even though it looks like there may be code in there.

    If you are going to post code it really needs to formatted so as to be readable by more than a machine. I'd like to be able to review the code before I try running it.

  • forgive me, i apologize.

    the attachment is cleaned up and will run in tempdb

    i still have to generate some tables and sample data.

    thank you for your time

    drew

  • drew.georgopulos (8/10/2011)


    thanks very much

    i was trying to avoid the overhead i thought i saw in the execution plan, but something you said piqued my curiosity:

    could it be possible to coerce the result into the clustered index's order by using an index hint instead of the order by?

    i will have to see if that creates a different plan

    No order by, no guarantee of order. Bottom line. End of story.

    If the optimiser sees an opportunity to avoid the sort by relying on the index order, it will. You still have to specify the order by or the order you get back may change due to data changes and plan change

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • drew.georgopulos (8/10/2011)


    ...

    the whole thing is dog slow...drew

    The whole thing is dog slow because it's coded naively. Here's a suggestion for you. Post each stored procedure separately. This will encourage folks to load the tables and experiment with your code. I've had a quick play with the script you posted earlier and it comes out at something like 1000 lines already - I'm guessing it's around 3-4000. It's hard to see where statements begin and end. Much easier with separate sprocs / files.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Lynn Pettis (8/10/2011)


    drew.georgopulos (8/10/2011)


    there's a lot of funk in the sprocs, so i uploaded them too.

    im sure the examples' submission could be improved, but i hope they are adequate to illustrate the issue.

    thanks very much for asking.

    drew

    Have you tried running all of that in the txt file you uploaded? I just took a look and I'm not even going to touch it. It looks like a jumble of text with no real resemblence of code, even though it looks like there may be code in there.

    If you are going to post code it really needs to formatted so as to be readable by more than a machine. I'd like to be able to review the code before I try running it.

    Guys you must SAVE the text files locally. Otherwise your browser will render as html and destroy the formatting.

    It's actually better formatted than most queries we get here.

    Nice work!

  • I loaded up the plan in sql sentry plan explorer (free) to figure out the biggest queries and it doesn't look real easy (99%+ of each query is wasted on the insert at the end of the plan). So unless you have 1000 index on that table I don't see a 2 minute fix in here.

    Top 3 queries in the order of the picture :

    select 'PR' as claim_type

    ,' ' as TOS1

    ,cast(ph.CASUBNO+ph.capersno as varchar(20)) as member_id

    ,aasex

    ,ds_age_band

    ,ph.CAPRIMDATE as fdos

    ,ph.CATHRUDATE as ldos

    ,cast(ph.CACLAIM as varchar(20)) as claim_id

    ,pd.CBLINE as claim_line

    ,ph.CAPROVIDER as provider_id

    ,ph.CAPLACE as pos

    ,cast(pd.CBNET/100 as money) as paid_amt

    ,cast(pd.cbbilled/100 as money) as billed_amt

    ,cast(pd.cballowed/100 as money) as allowed_amt

    ,cast(pd.cbcopay/100 as money) as copay_amt

    ,cast(replace(ph.CADX1,'.','') as varchar(5)) as dx1

    ,cast(replace(ph.CADX2,'.','') as varchar(5)) as dx2

    ,cast(replace(ph.CADX3,'.','') as varchar(5)) as dx3

    ,cast(replace(ph.CADX4,'.','') as varchar(5)) as dx4

    ,cast(replace(ph.CADX5,'.','') as varchar(5)) as dx5

    ,cast(replace(ph.CADX6,'.','') as varchar(5)) as dx6

    ,cast(replace(ph.CADX7,'.','') as varchar(5)) as dx7

    ,cast(replace(ph.CADX8,'.','') as varchar(5)) as dx8

    ,null as dx9

    ,pd.CBPROCCODE as proc1

    ,null as proc2

    ,null as proc3

    ,null as proc4

    ,null as proc5

    ,null as proc6

    ,null as RevCode1

    ,null as RevCode2

    ,null as RevCode3

    ,null as RevCode4

    ,null as RevCode5

    ,null as RevCode6

    ,null as RevCode7

    ,null as RevCode8

    ,null as RevCode9

    ,null as RevCode10

    ,null as billtype

    ,null as drg

    ,null as mdc

    ,null as ndc

    ,null as dayssupply

    ,null as tradename

    ,null as dx1_ccs

    ,pd.CBCLAIMSTAT as claim_status

    from dbo.JUTILHM0_DAT ph

    inner join

    dbo.JUTILDM0_DAT pd

    on ph.CACLAIM = pd.CBCLAIM

    --add the ds_age_band grouper based on age in months at the time of the claim; DATEDIFF(mm,aadob, caprimdate)

    inner join JMEMBRM0_DAT b on ph.cASUBNO=b.AASUBNO and ph.cAPERSNO=b.AAPERSNO

    inner join HH_RDSM.dbo.tblage_dim c on total_months = DATEDIFF(mm,aadob, CAPRIMDATE )

    left join HH_RDSM.dbo.RevCode r

    on r.ClaimNo = ph.CACLAIM

    where cast(left(ph.CAPRIMDATE,6) as int) between @start_dt and @end_dt

    insert into encounters_conformed

    select 'RX' as claim_type

    ,' ' as TOS1

    ,cast(r.memberid as varchar(20)) as member_id

    ,aasex

    ,ds_age_band

    ,convert(varchar(8),r.filled,112) as fdos

    ,null as ldos

    ,cast(r.claimno as varchar(20)) as claim_id

    ,' 001' as claim_line

    ,null as provider_id

    ,null as pos

    ,cast(r.AmtPaid/100 as money) as paid_amt

    ,cast(r.[total claim charge]/100 as money) as billed_amt

    ,null as allowed_amt

    ,cast(r.[co-payment amount]/100 as money) as copay_amt

    ,null as dx1

    ,null as dx2

    ,null as dx3

    ,null as dx4

    ,null as dx5

    ,null as dx6

    ,null as dx7

    ,null as dx8

    ,null as dx9

    ,null as proc1

    ,null as proc2

    ,null as proc3

    ,null as proc4

    ,null as proc5

    ,null as proc6

    ,null as RevCode1

    ,null as RevCode2

    ,null as RevCode3

    ,null as RevCode4

    ,null as RevCode5

    ,null as RevCode6

    ,null as RevCode7

    ,null as RevCode8

    ,null as RevCode9

    ,null as RevCode10

    ,null as billtype

    ,null as drg

    ,null as mdc

    ,ndc

    ,dayssupply

    ,drug as tradename

    ,null as dx1_ccs

    ,null as claim_status

    from dbo.pharmacyclaims r

    --add the ds_age_band grouper based on age in months at the time of the claim; DATEDIFF(mm,aadob, caprimdate)

    inner join JMEMBRM0_DAT b on r.SUBNO=b.AASUBNO and r.PERSNO=b.AAPERSNO

    inner join HH_RDSM.dbo.tblage_dim c on total_months = DATEDIFF(mm,aadob, filled)

    where cast(left(convert(varchar(8),r.filled,112),6) as int) between @start_dt and @end_dt

    insert into encounters_conformed

    select 'BH' as claim_type

    ,' ' as TOS1

    ,cast(b.memberid as varchar(20)) as member_id

    ,aasex

    ,ds_age_band

    ,convert(varchar(8),b.fromdate,112) as fdos

    ,convert(varchar(8),b.todate,112) as ldos

    ,cast(b.claimid as varchar(20)) as claim_id

    ,b.cllineid as claim_line

    ,b.providerid as provider_id

    ,b.pos

    ,cast(p.pmtamt as money) as paid_amt

    ,null as billed_amt

    ,null as allowed_amt

    ,null as copay_amt

    ,cast(replace(b.diag1,'.','') as varchar(5)) as dx1

    ,cast(replace(b.diag2,'.','') as varchar(5)) as dx2

    ,cast(replace(b.diag3,'.','') as varchar(5)) as dx3

    ,null as dx4

    ,null as dx5

    ,null as dx6

    ,null as dx7

    ,null as dx8

    ,null as dx9

    ,b.proc_code as proc1

    ,null as proc2

    ,null as proc3

    ,null as proc4

    ,null as proc5

    ,null as proc6

    ,null as RevCode1

    ,null as RevCode2

    ,null as RevCode3

    ,null as RevCode4

    ,null as RevCode5

    ,null as RevCode6

    ,null as RevCode7

    ,null as RevCode8

    ,null as RevCode9

    ,null as RevCode10

    ,null as billtype

    ,null as drg

    ,null as mdc

    ,null as ndc

    ,null as dayssupply

    ,null as tradename

    ,null as dx1_ccs

    ,null as claim_status

    from dbo.beacon_claims b

    inner join

    dbo.beacon_pmt p

    on b.cllineid = p.cllineid

    inner join JMEMBRM0_DAT c on b.memberid=cast(c.AASUBNO+c.aapersno as char(9))

    inner join HH_RDSM.dbo.tblage_dim d on total_months = DATEDIFF(mm,aadob, CONVERT(varchar(8),fromdate,112))

    where cast(left(convert(varchar(8),b.fromdate,112),6) as int) between @start_dt and @end_dt

  • Doubt it's the insert portions of the queries. Probably mostly caused by these:

    where cast(left(ph.CAPRIMDATE,6) as int) between @start_dt and @end_dt

    where cast(left(convert(varchar(8),r.filled,112),6) as int) between @start_dt and @end_dt

    where cast(left(convert(varchar(8),b.fromdate,112),6) as int) between @start_dt and @end_dt

    Those are all non-SARGable, so index/table scans. Since there are several million rows involved, that's going to be expensive.

    What are the values for @start_dt and @end_dt and what's the rule for how the date filters should work?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • No argument there Gail, but I'm just calling it as I see it. The filter has some power (33% of the table left) but I've never seen the insert part of the plan go from 99% to 10% or even 50%.

    I'm also worried by the apparent complete lack of custered index (and possibly index at all).

    Out for a meeting.

Viewing 15 posts - 1 through 15 (of 37 total)

You must be logged in to reply to this topic. Login to reply