March 4, 2013 at 4:53 pm
I have a table with 6 million rows. The table has around 200 columns. The data can be sliced and diced by the user in a number of ways.
Are there any tips of improving query performance other than adding indexes?
Currently some of the queries take around 40 seconds.
thanks
KK
March 4, 2013 at 8:20 pm
There are a couple of suggestions.
1. Make sure the queries are actually SARGable where an INDEX SEEK followed by an internal scan can be accomplished.
2. Possibly, normalize the table into smaller tables. Of course, this is highly data dependent. For example, call detail rows will frequently have 200 different pieces of information that are just about impossible to normalize.
3. Use indexed views. Of course, this can take a fair bit of space but allows you to virtually have more than 1 clustered index for some breakneck speed.
There are other suggestions such as doing pre-aggregation if a pivot or crosstab is involved but those are the big 3 that quickly came to mind.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 5, 2013 at 11:52 am
I have the following questions
1. What does SARGable mean?
2. Wouldn't normalization decrease performance, since it would involve joining of multiple tables?
3. Would partitioning the table help?
Thanks
March 5, 2013 at 12:40 pm
kk1173 (3/5/2013)
I have the following questions1. What does SARGable mean?
2. Wouldn't normalization decrease performance, since it would involve joining of multiple tables?
3. Would partitioning the table help?
Thanks
1. Pretty much what I said. It means that the criteria was written in such a manner as to allow INDEX SEEKs to happen if the correct index is available. This generally means that you have no columns within a formula (especially but certainly not limited to aggregates) but also means that the criteria is tight enough to return only what is necessary. That last part helps defeat accidental many-to-many joins.
2. Correctly done, normalization can actually help improve performance because it makes the rows in the tables much more narrow. Skinny rows mean more rows per page. More rows per page means fewer reads (logical or otherwise). Fewer reads frequently results in faster speeds even in the presence of joins. Notice I never used the word "always" anywhere in that. Like anything else, "It Depends" on proper design and indexing.
3. No. Partitioning doesn't necessarily help for performance and usually doesn't help at all especially when otherwise performance challenged code is present. Correct partitioning's biggest advantage is that it makes it so you don't have to rebuild huge indexes on single partitioned tables during maintenance periods. There are some huge disadvantages to partitioning if you're using things like an IDENTITY column as the PK on the table. Partitioning also makes it easier to add new data and archive old data. Even then, the best reason to partition is still to reduce index maintenance times.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 5, 2013 at 12:42 pm
kk1173 (3/5/2013)
I have the following questions1. What does SARGable mean?
SARGable means Search Argument-able. http://en.wikipedia.org/wiki/Sargable
2. Wouldn't normalization decrease performance, since it would involve joining of multiple tables?
That is why he said possibly.
3. Would partitioning the table help?
Possibly.
The real challenge here is that your question is sufficiently vague enough that nobody can really tell you how to improve performance. We can't see your screen, we aren't familiar with your data structures or data. From my experience when a table has 200 columns there are some structural challenges that need to be dealt with. It also sounds from your description that users are doing a "catch-all" of searching through this table. Without knowing the structures and what you are trying to do the best anybody can do is take a shot in the dark.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 5, 2013 at 12:59 pm
These are some really good suggestions. I will try these and let you know if i see any performance gains or have any other questions
March 5, 2013 at 1:21 pm
to give you an idea of type of queries that are being run
SELECT minormkt
,Sum(admits) as 'Total Admits' ,Sum([days]) as 'Total Days' , sum(OppAdm) as 'Opp Adm'
,sum(OppTotDays) as 'Opp Tot Days', sum(paid_amt) as 'Paid Amt',sum(bill_amt) as 'Bill Amt'
, sum(ReadmitOpportNet) as 'Readmit Oppor Net' ,sum(readmit_expected) as 'Readmit Expected'
, sum(Readmit_Factor) as 'Readmit Factor',sum(ER) as 'Total ER', ISNULL(sum(ER)/NULLIF(sum(Admits),0),0) as '% ER'
,ISNULL(sum([Days])/NULLIF(sum(Admits),0),0) as 'LOS' , ISNULL(sum(ExpDays)/NULLIF(sum(Admits),0),0) as 'Expected LOS'
,ISNULL(sum(OppTotDays)/NULLIF(sum(OppAdm),0),0) as 'Opport Admit LOS',ISNULL(sum(OppExpDays)/NULLIF(sum(OppAdm),0),0) as 'Opport Admit Expected LOS'
,ISNULL(sum(Readmit_Los)/NULLIF(sum(Readmit30),0),0) as 'ReadmitLOS',ISNULL(sum(Readmit_Allowed)/NULLIF(sum(Readmit30),0),0) as 'ReadmitAllowedPerCase'
,sum(ReadmitOpportNet)*ISNULL(sum(Readmit_Allowed)/NULLIF(sum(Readmit30),0),0) as 'ReadmitOpportNetSavings'
,ISNULL(sum(OppDaysPos)/NULLIF(sum(Admits),0),0) as 'Opport Days/Admit'
FROM vw_FactData a where 1=1
and ( region IN(SELECT Param FROM dbo.uf_AppendString('WEST','CENTRAL' )))
and ( [plan] IN(SELECT Param FROM dbo.uf_AppendString('MCR','COM')))
and( year(adm_dt) IN(SELECT Param FROM dbo.uf_AppendString('2011','2012')))
group by minormkt order by a.minormkt
The FROM vw_FactData is a standard view that further joins the base table (6 mill rows) with some look up tables.
I will remove the function in the WHERE clause to make it SARGable.
Any other suggestions?
March 5, 2013 at 1:25 pm
I assume these queries must be being generated "on the fly".
In addition the scalar function you have year(adm_dt) which is also going to render that nonSARGable.
It is impossible to offer much real advice here because we still can't see what is going on. What is the definition of the base tables, what about the view? What are the indexes on the base tables? Is this an indexed view, etc etc etc.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 5, 2013 at 1:31 pm
Yes, there is a stored procedure that build dynamic sql. This query above is just a PRINT of the sql that gets executed.
If you look at the query, there are SUM functions on various fields and also formulas like ISNULL(sum([Days])/NULLIF(sum(Admits),0),0).
I know its very hard to tell without looking at the data structure, but is there a better way of doing this aggregation rather than 'on the fly'?
Thanks
KK
March 5, 2013 at 1:37 pm
kk1173 (3/5/2013)
Yes, there is a stored procedure that build dynamic sql. This query above is just a PRINT of the sql that gets executed.If you look at the query, there are SUM functions on various fields and also formulas like ISNULL(sum([Days])/NULLIF(sum(Admits),0),0).
I know its very hard to tell without looking at the data structure, but is there a better way of doing this aggregation rather than 'on the fly'?
Thanks
KK
Maybe.
Nobody can offer suggestions on improvement without first knowing the scope of the issue. It is not that it is difficult, it is impossible.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
March 6, 2013 at 7:36 am
kk1173 (3/5/2013)
to give you an idea of type of queries that are being runSELECT minormkt
,Sum(admits) as 'Total Admits' ,Sum([days]) as 'Total Days' , sum(OppAdm) as 'Opp Adm'
,sum(OppTotDays) as 'Opp Tot Days', sum(paid_amt) as 'Paid Amt',sum(bill_amt) as 'Bill Amt'
, sum(ReadmitOpportNet) as 'Readmit Oppor Net' ,sum(readmit_expected) as 'Readmit Expected'
, sum(Readmit_Factor) as 'Readmit Factor',sum(ER) as 'Total ER', ISNULL(sum(ER)/NULLIF(sum(Admits),0),0) as '% ER'
,ISNULL(sum([Days])/NULLIF(sum(Admits),0),0) as 'LOS' , ISNULL(sum(ExpDays)/NULLIF(sum(Admits),0),0) as 'Expected LOS'
,ISNULL(sum(OppTotDays)/NULLIF(sum(OppAdm),0),0) as 'Opport Admit LOS',ISNULL(sum(OppExpDays)/NULLIF(sum(OppAdm),0),0) as 'Opport Admit Expected LOS'
,ISNULL(sum(Readmit_Los)/NULLIF(sum(Readmit30),0),0) as 'ReadmitLOS',ISNULL(sum(Readmit_Allowed)/NULLIF(sum(Readmit30),0),0) as 'ReadmitAllowedPerCase'
,sum(ReadmitOpportNet)*ISNULL(sum(Readmit_Allowed)/NULLIF(sum(Readmit30),0),0) as 'ReadmitOpportNetSavings'
,ISNULL(sum(OppDaysPos)/NULLIF(sum(Admits),0),0) as 'Opport Days/Admit'
FROM vw_FactData a where 1=1
and ( region IN(SELECT Param FROM dbo.uf_AppendString('WEST','CENTRAL' )))
and ( [plan] IN(SELECT Param FROM dbo.uf_AppendString('MCR','COM')))
and( year(adm_dt) IN(SELECT Param FROM dbo.uf_AppendString('2011','2012')))
group by minormkt order by a.minormkt
The FROM vw_FactData is a standard view that further joins the base table (6 mill rows) with some look up tables.
I will remove the function in the WHERE clause to make it SARGable.
Any other suggestions?
Hold on a minute. The function may or may not be a performance problem here. Same goes with the view. It may actually be the only problem here. As suggested though, it's impossible to make a recommendation without more info.
Post the code for the view and the function. Also, post the information requested in the article at the second link in my signature line below.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 6, 2013 at 8:15 am
Jeff Moden (3/6/2013)
kk1173 (3/5/2013)
to give you an idea of type of queries that are being runSELECT minormkt
,Sum(admits) as 'Total Admits' ,Sum([days]) as 'Total Days' , sum(OppAdm) as 'Opp Adm'
,sum(OppTotDays) as 'Opp Tot Days', sum(paid_amt) as 'Paid Amt',sum(bill_amt) as 'Bill Amt'
, sum(ReadmitOpportNet) as 'Readmit Oppor Net' ,sum(readmit_expected) as 'Readmit Expected'
, sum(Readmit_Factor) as 'Readmit Factor',sum(ER) as 'Total ER', ISNULL(sum(ER)/NULLIF(sum(Admits),0),0) as '% ER'
,ISNULL(sum([Days])/NULLIF(sum(Admits),0),0) as 'LOS' , ISNULL(sum(ExpDays)/NULLIF(sum(Admits),0),0) as 'Expected LOS'
,ISNULL(sum(OppTotDays)/NULLIF(sum(OppAdm),0),0) as 'Opport Admit LOS',ISNULL(sum(OppExpDays)/NULLIF(sum(OppAdm),0),0) as 'Opport Admit Expected LOS'
,ISNULL(sum(Readmit_Los)/NULLIF(sum(Readmit30),0),0) as 'ReadmitLOS',ISNULL(sum(Readmit_Allowed)/NULLIF(sum(Readmit30),0),0) as 'ReadmitAllowedPerCase'
,sum(ReadmitOpportNet)*ISNULL(sum(Readmit_Allowed)/NULLIF(sum(Readmit30),0),0) as 'ReadmitOpportNetSavings'
,ISNULL(sum(OppDaysPos)/NULLIF(sum(Admits),0),0) as 'Opport Days/Admit'
FROM vw_FactData a where 1=1
and ( region IN(SELECT Param FROM dbo.uf_AppendString('WEST','CENTRAL' )))
and ( [plan] IN(SELECT Param FROM dbo.uf_AppendString('MCR','COM')))
and( year(adm_dt) IN(SELECT Param FROM dbo.uf_AppendString('2011','2012')))
group by minormkt order by a.minormkt
The FROM vw_FactData is a standard view that further joins the base table (6 mill rows) with some look up tables.
I will remove the function in the WHERE clause to make it SARGable.
Any other suggestions?
Hold on a minute. The function may or may not be a performance problem here. Same goes with the view. It may actually be the only problem here. As suggested though, it's impossible to make a recommendation without more info.
Post the code for the view and the function. Also, post the information requested in the article at the second link in my signature line below.
The UDFs are a problem if for no other reason than the optimizer cannot know what is coming out of them, thus it will get inaccurate estimates on the IN clauses and possibly a disastrously bad query plan.
First thing I would do is either hard-code the proper values in the query without function calls or shred UDF outputs into temp tables (NOT table variables) and reference them in the main query.
I agree that the view needs to be evaluated too.
OP: consider getting a professional tuner on board to assist. I have seen people hunt and peck on forums for days (or even weeks) trying to get a good query when a pro could fix it up in a matter of minutes or hours.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
March 6, 2013 at 11:35 am
I am sorry for not posting the performance problem in the recommended way. Please see below the detailed information
-- This is the main table it contains around 6 million rows.
-- Case_ID is a unique key on the table and so is the identity key ID
-- The table contains all possible information and users pulling reports can slice and dice data in different ways
-- for e.g. user will need to pull a report that aggregates various columns by minor mkt, or by plan
-- or entity or hospital etc.
CREATE TABLE [dbo].[tbl_fact](
[ID] [int] IDENTITY(1,1) NOT NULL,
[clm_prov] [varchar](20) NULL,
[Mem_No] [varchar](20) NULL,
[emp_grp] [varchar](20) NULL,
[clm_prov_mpin] [varchar](12) NULL,
[entity] [varchar](14) NULL,
[dis_dt] [datetime] NULL,
[adm_Dt] [datetime] NULL,
[Readmit_AprDrg] [varchar](10) NULL,
[Readmit_Ms_Drg] [varchar](10) NULL,
[Readmit_Los] [int] NULL,
[Readmit_Bill] [money] NULL,
[Readmit_Paid] [money] NULL,
[Readmit_Allowed] [money] NULL,
[Readmit_PK] [varchar](25) NULL,
[Readmit_Case_Id] [varchar](50) NULL,
[Readmit_diag1] [varchar](10) NULL,
[Readmit_proc1] [varchar](10) NULL,
[apr_drg] [varchar](3) NULL,
[Days] [int] NULL,
[Paid_Amt] [money] NULL,
[Amt_Allowed_Case] [money] NULL,
[bill_amt] [money] NULL,
[New_Prov_Key] [varchar](25) NULL,
[Case_Id] [varchar](50) NULL,
[Drg] [varchar](3) NULL,
[DIS_STAT] [varchar](10) NULL,
[MKT] [varchar](10) NULL,
[M_DOB] [datetime] NULL,
[M_SEX] [char](1) NULL,
[PMGDEC] [varchar](6) NULL,
[Cob_Amt] [money] NULL,
[APR_MDC] [varchar](2) NULL,
[APR_SOI] [char](1) NULL,
[APR_ROM] [char](1) NULL,
[APR_WT] [int] NULL,
[Admits] [int] NULL,
[DRG_WT] [int] NULL,
[Prov_Key] [varchar](25) NULL,
[paid_dt] [datetime] NULL,
[m_zip] [varchar](5) NULL,
[m_fname] [varchar](40) NULL,
[m_lname] [varchar](40) NULL,
[m_mi] [char](1) NULL,
[clm_prov_tax_id] [varchar](9) NULL,
[Funding] [varchar](2) NULL,
[Low_Perdiems] [char](1) NULL,
[Readdays] [int] NULL,
[Readmit] [int] NULL,
[Transfer_Out] [int] NULL,
[Readdays_SameMdc] [int] NULL,
[readdays2] [int] NULL,
[Transfer_In] [int] NULL,
[CMS_CONTRACT_ID] [varchar](5) NULL,
[cms_pbp_id] [varchar](3) NULL,
[cms_geo_code] [varchar](5) NULL,
[MBR_SYS_ID] [int] NULL,
[ELIG_AMT] [money] NULL,
[COB_Flg] [char](1) NULL,
[PROV_PRTCP_STS_CD] [char](1) NULL,
[SRC_SYS_GRP_CD] [char](1) NULL,
[source] [char](4) NULL,
[INDV_SYS_ID] [int] NULL,
[UHC_Dup] [int] NULL,
[PRVPARST] [varchar](10) NULL,
[plan1] [varchar](10) NULL,
[Readmit_Uid] [varchar](20) NULL,
[UID_v3] [varchar](20) NULL,
[Uid_v1] [varchar](20) NULL,
[uid_f] [char](1) NULL,
[mbr_age] [int] NULL,
[age_group] [int] NULL,
[uid] [varchar](20) NULL,
[UID_MPIN] [varchar](20) NULL,
[UID_Tax_ID] [varchar](9) NULL,
[Priorty_Flag] [int] NULL,
[ICMR_Days] [int] NULL,
[ICMR_Admits] [int] NULL,
[CDU_Days] [int] NULL,
[CDU_Admits] [int] NULL,
[med_prime] [char](1) NULL,
[plan_old] [char](3) NULL,
[prod_old] [char](3) NULL,
[plan] [char](3) NULL,
[prod] [char](3) NULL,
[mednecfacadmits] [int] NULL,
[ExpDays] [int] NULL,
[Outlier] [char](1) NULL,
[OppDays] [int] NULL,
[OppDaysPos] [int] NULL,
[OppTotDays] [int] NULL,
[OppAdm] [int] NULL,
[OppExpDays] [int] NULL,
[OneDayStay] [int] NULL,
[Par_Flag] [varchar](10) NULL,
[Readmit30] [int] NULL,
[Readmit90] [int] NULL,
[Readmit_MDC30] [int] NULL,
[Readmit_MDC] [int] NULL,
[Prev_Admit30] [int] NULL,
[Readmit30_PBC] [int] NULL,
[Rdmt_Prob_Include_FG_PBC] [char](2) NULL,
[MS_Drg_Avg_Wt_PBC] [varchar](20) NULL,
[MS_Drg_Rdmt_30_Avg_PBC] [varchar](20) NULL,
[readmit_factor] [int] NULL,
[year] [int] NULL,
[nurse_id] [varchar](20) NULL,
[N2P2P] [char](1) NULL,
[tat] [int] NULL,
[authorization_no] [varchar](20) NULL,
[Hosp_Assigned_Type] [varchar](10) NULL,
[ahrq_chronic_flag] [int] NULL,
[readmit30_unplanned] [int] NULL,
[HospState] [char](2) NULL,
[pmgname] [varchar](100) NULL,
[Mcg_Obs_Count] [int] NULL,
[Mcg_Obs_Oneday_Count] [int] NULL,
[Mcg_Opsurg_Count] [int] NULL,
[Mcg_Opsurg_Oneday_Count] [int] NULL,
[Mcg_Oneday_count] [int] NULL,
[Mcg_Oneday] [char](1) NULL,
[Short_Stay_1] [int] NULL,
[Short_Stay_2] [int] NULL,
[Short_Stay_3] [int] NULL,
[volume] [char](1) NULL,
[drg_key] [varchar](50) NULL,
[drg_hosp] [varchar](10) NULL,
[Hosp_System] [varchar](60) NULL,
[MinorMkt] [varchar](25) NULL,
[age_group_rev] [int] NULL,
[readmit_expected] [int] NULL,
[ReadmitOpportNet] [int] NULL,
[dual_snp_migrate] [char](1) NULL,
[Factor_Rate_Adj] [varchar](10) NULL,
[Phy_UID] [varchar](30) NULL,
[Phy_Prvname_UID] [varchar](60) NULL,
[Phy_PrvTaxID_UID] [varchar](9) NULL,
[Phy_PrvSpecCD_UID] [varchar](30) NULL,
[UID_Spec_Flg] [int] NULL,
[phy_amt_paid] [money] NULL,
[phy_amt_bill] [money] NULL,
[phy_amt_allowed] [money] NULL,
[phy_rvu] [int] NULL,
[uid_primdoc_flg] [int] NULL,
[Phy_Tot_Paid] [money] NULL,
[Phy_Tot_Bill] [money] NULL,
[phy_tot_allowed] [money] NULL,
[phy_tot_rvu] [int] NULL,
[PHY_PROV_MPIN] [int] NULL,
[Phy_PrvMPIN_Uid] [varchar](10) NULL
) ON [PRIMARY]
GO
-- These are the indexes created on the table
-- Queries can be run to pull data by Year/Entity/UID/Plan etc or it can be grouped by these fields
CREATE NONCLUSTERED INDEX IX_year
ON tbl_fact ([Year]);
go
CREATE NONCLUSTERED INDEX IX_Entity
ON tbl_fact (Entity);
GO
CREATE NONCLUSTERED INDEX IX_UIDV3
ON tbl_fact (UID_V3);
GO
CREATE NONCLUSTERED INDEX IX_Plan
ON tbl_fact ([Plan]);
GO
CREATE NONCLUSTERED INDEX IX_Product
ON tbl_fact (Prod);
GO
CREATE NONCLUSTERED INDEX IX_MinorMkt
ON tbl_fact (MinorMkt);
GO
CREATE NONCLUSTERED INDEX IX_AprDrg
ON tbl_fact (apr_drg);
-- A view was created since there are some fields that have a corresponding lookup table
CREATE VIEW [dbo].[vw_FactData]
AS
SELECT
a.*
,b.HospName
,c.AprDrg_Desc
,d.region
FROM tbl_fact a
LEFT JOIN HOSP_UID_Xref b on a.uid_v3=b.uid_v3 and a.prov_key = b.prov_key
LEFT JOIN aprdrg_lkup c on a.apr_drg=c.apr_drg
LEFT JOIN minormkt d on a.minormkt=d.minormkt
-- The function uf_Appendstring is taking in a comma seperated string and returning a table variable back.
-- if you want i can post the code but i do not think that is the problem, since i replaced it with hard code values and did not find much difference.
--I hope this information will clarify the current design and you would be in a better position to offer any valuable suggestions. The query i had posted previously
-- was just a stored procedure written that runs dynamic sql taking various parameters
-- and grouping it by.
Thanks.
March 6, 2013 at 11:41 am
Post the code for the function(s) used as well. They can be part of the problem.
March 6, 2013 at 2:57 pm
kk1173 (3/6/2013)
I am sorry for not posting the performance problem in the recommended way.
As Lynn mentioned, we still need to see the code for the function and the Actual Execution Plan saved as an SQL plan file.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 33 total)
You must be logged in to reply to this topic. Login to reply