May 25, 2005 at 4:29 pm
There’s table with over 6.5 million records (it has about 20 columns) in our data warehouse. ‘SELECT TOP 100 PERCENT * FROM TABLENAME’ takes over 10 minutes. Also, there is a view which reads data from this table with a couple of join tables. It takes about an hour to query the data from this view.
We have done our due diligence by running ‘Index Tuning Wizard’ and creating couple of indexes (cluster and non clustered indexes) on couple of columns such as client_id, etc We also followed ‘Efficient Index Design’ concepts and ‘Efficient Query Design’ concepts when creating view and indexes.
Since this table has historical data (data warehouse), we can’t CREATE UNIQUE CLUSTERED INDEX, there are no unique records as well.
I tried to index the view, since there are quite a few requirements indexing views such as it can’t have outer joins, I have given-up that idea as well. The next option is to create a table from the view, then we have to populate this table from the view every time data gets added to the base tables (the tables that the view is based on). We don’t want to use triggers to keep it up-to-date as data gets added on batch mode. What are the other options do I have?
The other questions is, how can we increase the performance of this view/table? Is the time it takes right now to retrieve data is reasonable?
Any help would be appreciated..
Thanks,
Chandi
May 25, 2005 at 5:07 pm
Could you post the view create statement and the create statements for your indexes?
But just for my own edification, how long do you expect the server to take to deliver the 6.5 million rows?
May 26, 2005 at 1:57 am
This recent article http://www.sqlservercentral.com/columnists/jBulinckx/indexedviewswithouterjoins.asp showed how to do outer joins with and indexed view.
If you replace your SELECT * statement with explicit column names do you get any improvement in performance?
May 26, 2005 at 8:35 am
Well, I can certainly try that, but it wouldn't make a difference for us. Users access this view through MS Access as an 'ODBC' import table, and it times out. Also, we create Cognos cubes from this view, it takes 6 hours. In both cases we have no control over how we access the view, in fact Cognos reads the data column by column I think…
Thanks,
Chandi
May 26, 2005 at 11:38 am
Here is the view and the index script. Most of the joins are lookup values, ie for '01' we want to display 'Jan' in the view.
/* The View */
CREATE VIEW dbo.ClaimTableforCube WITH SCHEMABINDING AS
SELECT dbo.ClaimsTable.CLIENT_ID, dbo.dp_care_level_code.TD_CODE_LONG_DESCRIPTION AS CI_Care_Level_Code_Description,
dbo.ALL_PROVIDERS.PR_NAME, dbo.ClaimsTable.CI_SERVICE_YEAR,
dbo.ClaimsTable.CI_SERVICE_MONTH, dbo.ClaimsTable.CI_DAYS_OF_SERVICE_AMOUNT,
dbo.ClaimsTable.CI_HOURS_OF_SERVICE_AMOUNT, dbo.ClaimsTable.CI_PAID_AMOUNT,
dbo.calendar_to_fiscal.fiscal, dbo.ALL_CLIENTS.CL_SEX_CODE, dbo.Month_Name.MonthName, H_Unit_1.[Hunit Name] AS ProviderHUnit,
H_Unit2_1.[Hunit name] AS ProviderHUnit2, (CASE dbo.ClaimsTable.CI_CLIENT_CONTRIBUTION_AMT WHEN 999.99 THEN 0.00
ELSE dbo.ClaimsTable.CI_CLIENT_CONTRIBUTION_AMT
END) AS CI_CLIENT_CONTRIBUTION_AMT, dbo.[SERVICE_TYPE_CODE_LOOKUP].TD_CODE_LONG_DESCRIPTION AS service_type,
dbo.[org_code_lookup].TD_CODE_LONG_DESCRIPTION AS org_type, dbo.ALL_CLIENTS.CL_BIRTH_DATE, DATEPART(yyyy,
dbo.ALL_CLIENTS.CL_BIRTH_DATE) AS DOB_Year, DATEPART(m, dbo.ALL_CLIENTS.CL_BIRTH_DATE) AS DOB_Month,
H_Unit_2.[Hunit Name] AS ClientHUnit, H_Unit2_2.[Hunit name] AS ClientHUnit2,
tblPostalCodes_2.PostalCode AS Client_PC, '05/03/2005' AS DateofExtract,
(CASE WHEN [CI_Service_Month] >= DATEPART(m, dbo.ALL_CLIENTS.CL_BIRTH_DATE)
THEN dbo.ClaimsTable.CI_SERVICE_YEAR - DATEPART(yyyy, dbo.ALL_CLIENTS.CL_BIRTH_DATE)
ELSE dbo.ClaimsTable.CI_SERVICE_YEAR - DATEPART(yyyy, dbo.ALL_CLIENTS.CL_BIRTH_DATE) - 1 END) AS Age
FROM dbo.[org_code_lookup]
-- lookup Units (1 and 2)
RIGHT OUTER JOIN dbo.H_Unit H_Unit_2
RIGHT OUTER JOIN dbo.ClaimsTable
LEFT OUTER JOIN dbo.H_Unit2 H_Unit2_2 ON dbo.ClaimsTable.CI_HEALTH_UNIT2_ID = H_Unit2_2.Hunit ON
H_Unit_2.Hunit = dbo.ClaimsTable.CI_HEALTH_UNIT_ID ON
--lookup Organization
dbo.[org_code_lookup].TD_ID = dbo.ClaimsTable.CI_ORGANIZATION_CODE
-- lookup Month
LEFT OUTER JOIN dbo.Month_Name ON dbo.ClaimsTable.CI_SERVICE_MONTH = dbo.Month_Name.MonthCode
--lookup Year
LEFT OUTER JOIN dbo.calendar_to_fiscal ON dbo.ClaimsTable.CI_SERVICE_MONTH = dbo.calendar_to_fiscal.[month] AND
dbo.ClaimsTable.CI_SERVICE_YEAR = dbo.calendar_to_fiscal.[year]
--lookup Level Of Care
LEFT OUTER JOIN dbo.dp_care_level_code ON dbo.ClaimsTable.CI_CARE_LEVEL_CODE = dbo.dp_care_level_code.TD_ID
-- Providers
LEFT OUTER JOIN dbo.ALL_PROVIDERS
--lookup Providers by postal code ( within the ALL_PROVIDERS lookup table)
LEFT OUTER JOIN dbo.tblPostalCodes tblPostalCodes_1 ON dbo.ALL_PROVIDERS.PR_POSTAL_CODE_ADDRESS = tblPostalCodes_1.PostalCode
--lookup Providers by Units 1 and 2 ( within the ALL_PROVIDERS lookup table)LEFT OUTER JOIN dbo.H_Unit H_Unit_1 ON dbo.ALL_PROVIDERS.PR_HEALTH_UNIT_ID = H_Unit_1.Hunit
LEFT OUTER JOIN dbo.H_Unit2 H_Unit2_1 ON dbo.ALL_PROVIDERS.PR_HEALTH_UNIT2_ID = H_Unit2_1.Hunit ON
dbo.ClaimsTable.PROVIDER_ID = dbo.ALL_PROVIDERS.PROVIDER_ID
--lookup Services by service code
LEFT OUTER JOIN dbo.[SERVICE_TYPE_CODE_LOOKUP] ON dbo.ClaimsTable.CI_SERVICE_TYPE_CODE = dbo.[SERVICE_TYPE_CODE_LOOKUP].TD_ID
--lookup Clients by postal code and Client ID
FULL OUTER JOIN dbo.tblPostalCodes tblPostalCodes_2
RIGHT OUTER JOIN dbo.ALL_CLIENTS ON tblPostalCodes_2.PostalCode = dbo.ALL_CLIENTS.CL_POSTAL_CODE_ADDRESS ON
dbo.ClaimsTable.CLIENT_ID = dbo.ALL_CLIENTS.CLIENT_ID
WHERE (dbo.ClaimsTable.PROVIDER_ID <> '12345')
/ * Clustered Index */
CREATE UNIQUE CLUSTERED INDEX ClaimTableforCube _UNIQUE ON ClaimTableforCube (client_id, provider_pc,CI_PAID_AMOUNT, CI_Service_Month)
Note :
ClaimTable - Claim Table which has about 6,692,500 records
CL_ALL_CLIENT - Master Client table which has about 600,000 clients
May 26, 2005 at 11:49 am
Do you really need all those outer joins? You're doing a lot more than just looking up matching values with this.
May 26, 2005 at 12:29 pm
Yes, we need all the joins. We look up Organization, Units(1 and 2) , Month and Year, 'Level Of Care', providers by postal codes and Units, Services by service code, Clients by postal code and Client ID. So, they all are lookup joins.
Chandi
May 26, 2005 at 12:35 pm
Yes, you need the joins to do the lookups, but do they have to be OUTER joins? Outer joins may return more data than you need and hamper the optimizer's ability to use indexes efficiently.
May 26, 2005 at 12:59 pm
So, what's your solusion? INNER JOIN? and what are they?
Thanks,
Chandi
May 26, 2005 at 1:06 pm
If you don't need all the rows from the lookup tables returned, whether you find a match or not, you shouldn't be using OUTER joins.
Here's a clip from Books Online:
Join conditions can be specified in either the FROM or WHERE clauses; specifying them in the FROM clause is recommended. WHERE and HAVING clauses can also contain search conditions to further filter the rows selected by the join conditions.
Joins can be categorized as:
Inner joins use a comparison operator to match rows from two tables based on the values in common columns from each table. For example, retrieving all rows where the student identification number is the same in both the students and courses tables.
Outer joins are specified with one of the following sets of keywords when they are specified in the FROM clause:
The result set of a left outer join includes all the rows from the left table specified in the LEFT OUTER clause, not just the ones in which the joined columns match. When a row in the left table has no matching rows in the right table, the associated result set row contains null values for all select list columns coming from the right table.
A right outer join is the reverse of a left outer join. All rows from the right table are returned. Null values are returned for the left table any time a right table row has no matching row in the left table.
A full outer join returns all rows in both the left and right tables. Any time a row has no match in the other table, the select list columns from the other table contain null values. When there is a match between the tables, the entire result set row contains data values from the base tables.
Cross joins return all rows from the left table, each row from the left table is combined with all rows from the right table. Cross joins are also called Cartesian products.
For example, here is an inner join retrieving the authors who live in the same city and state as a publisher:
USE pubsSELECT a.au_fname, a.au_lname, p.pub_nameFROM authors AS a INNER JOIN publishers AS p ON a.city = p.city AND a.state = p.stateORDER BY a.au_lname ASC, a.au_fname ASC
The tables or views in the FROM clause can be specified in any order with an inner join or full outer join; however, the order of tables or views specified when using either a left or right outer join is important. For more information about table ordering with left or right outer joins, see Using Outer Joins.
May 26, 2005 at 1:16 pm
I've read this many many times. But, it won't work for me. We need all the data
from 'ClaimsTable' table regardless if we find a value in the lookup table or not.
May 27, 2005 at 7:02 am
Maybe I've missed something but shouldn't the lookup table have all possible values. I can imagine having a states table with only 42 states in it because they are not all used...
Maybe you should make sure that all the lookup values are in the table and then use an inner join (or maybe I'm completly off with this topic... sorry if it's the case).
If you're not too sure about the performance boost, try rerunning the query with only inner joins (where possible), just to see the difference in speed.
June 1, 2005 at 1:44 am
Loosely speaking an INNER join requires there to be a matching record in both tables within the join.
If the codes in your main table always exist in your lookup table then you can use inner joins.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply