January 29, 2008 at 6:31 am
Hi
We are planning to implement a database which etl's about 2GB of data daily.
Estimating the overall size on the server to be close to 1 TB on a yearly basis.
We are going to be using an ETL database and a Datawarehouse , from which we will load the data into cubes. Considering the ETL and the overall size of the database are there any specifics that i need to consider when designing the database. I am sorry i cant provide much detail right now but we are looking for alternatives in the database design so that we dont tradeoff too much on space or performance.
I am thinking that I dont want to normalize the database too far. Am i right ???
Thanks
Jay
January 29, 2008 at 8:42 am
What about table partitioning, will it work in your case?
January 29, 2008 at 9:13 am
You want to normalize the data as much as you need to. There really isn't a "too far" or a "not far enough." Meet the business requirements in the best way possible. Remember that normalization not only increases data accuracy, but it reduces the amount of data stored. For example, you can create all the address information with 50 million customers, repeating addresses over and over again, or you can link to an address table and radically reduce the amount of data stored. That two table join is not going to seriously impact performance. Three, four, and 15 table joins won't seriously impact performance either if you've got good indexes, especially good clustered indexes. Flattening the structure reduces joins and simplifies queries, but it could make for poorer performance (you'll need to index more columns on the table and maintain that data on that one table with more page splits, more index rebuilds...). If flat files were better, we'd never have gone to relational databases in the first place.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 29, 2008 at 11:21 pm
Yes , we will be implementing a horizontal partition on the main tables , with a partition for each month.
regarding the indexes , we will be loading data once every 5 - 10 minutes, so i was wondering if i should use the indexes since the tables will have frequent inserts happening . I also need to implement full text indexing as well.
I need the ETL to complete in 5 -10 minutes before it starts all over again.
Indexes i think could become a performance bottle neck during etl , am i right ?
Thanks again for your help.
January 30, 2008 at 5:28 am
Indexes can be, but aren't always, a performance problem when performing ETL. Best answer to that question is for you to test your load both ways, running with the indexes on and then running with a drop & recreate on the indexes. The one thing you can do to speed up either load is to, where possible, ensure that the data being loaded is in the same order as your clustered index. That helps regardless of whether you recreate the indexes or not.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
January 31, 2008 at 1:04 pm
If you have an ETL database for staging the data you would not have indexes on tables. The point of the ETL is to pull the information out and prep it for loading into your datawarehouse not to query against. If you use the datawarehouse as the collection of everything for historic purposes then you should think about implementing data marts or small subsets of the data using snapshots for your different internal business clients and leave the datawarehouse for your BI folks, and I don’t mean marketing, I mean your internal person who understands data mining algorithms and who can perform predictive analytics. The reduced number of data columns tailored to the specific needs of the internal client group could save you from index problems. The thing to keep in mind with indexes is that if the query returns more then 1% of your total data the optimizer will not use the index < this tidbit comes from Kimberly Trip at the SQLConnections conference (her site is excellent BTW http://www.sqlskills.com/). Your indexes on a DW will add roughly 3-5 times the storage space of the data and keep in-mind that to do online index rebuilds you need to have space equal to the size of the index being rebuilt.
A couple of things to keep in mind for planning purposes, first the 5-10 min ETL as a Service Level Agreement, may not be workable, 80% of the effort in the build of datawarehouse and data mining is getting the data clean and in the format you need. Only you are the expert as to the quality of your data but until you run through this process I would not commit to 5-10 mins. Additionally, only you can realistic assess if a 5-10 minute refresh makes business sense too. Marketing does just fine on a 12 or 24 hour old view of the data and inventory managment does fine with 30 min views of the data so long as you have a method to notify customers that you are out of stock when they have already placed an order.
Also, keep in mind that in order to data mine the data you need to use nvarchar and ntext as data types, varchar and varchar(max) won’t work. This could double your initial storage requirements and you want to store data in this type instead of converting on the fly which you can in BI-Studio because of the additional memory overheard associated with the conversion which leaves you with less memory for the predictive analytics which use a lot of memory; this alone is often reason enough to move to a 64 bit version for DW needs.
Based on the size of your data normalizing of 3NF should work fine the advantage to a star schema is that it is more intuitive for the end user clients so if you are looking to use the data mining plug-in for Excel to allow end-user to query data themselves this may work to your advantage > the advantage being that it shifts the report monkey duties to them and away from you leave you to do the heavy lifting such as predictive analytics and multichannel analysis.
Last recommendation is to use the following tool which I seem to be recommending a lot lately to automatically figure out your index needs based on usage patterns. This is from an earlier post but holds true here too.
What you want to do is pull the data from the missing index dmv and the script below will do this for you by creating a database called AUTOINDEXRECS that polls the missing index dmv and sucks in the info which leaves you to come back at a later time and look at the table to determine what indexes you need to create, and which can be dropped, and on what tables. You need to have sa permission to do this. This comes from the query team at microsoft and you should download the .zip here http://blogs.msdn.com/queryoptteam/archive/2006/06/01/613516.aspx.
I found it on Paul's former storage engine blog.
When you query the recommendation table the results will look like the following:
CREATE INDEX _MS_Sys_1 ON [Database_name].[dbo].[tbl_name]([ResponseID]) INCLUDE ([ResponseText])
This is without a doubt the best tuning tool for a database server, works wonders in OLAP environments where you don't know what the reports are going to be before hand, and I am baffled why this is not more widely known.
Lastly, make sure to use upsert statements if you will be updataing as well as inserting data and wrap everything in one transaction statement this will save you from the overhead of a row by agonizing row committ for all inserts. And make sure your temp DB has one file for each CPU core on the box to prevent contention issues.
The partitioning by month is a good plan but take a look here at a better design methodology to use >From Kim Tripp
(Entire post here http://www.sqlskills.com/blogs/kimberly/2007/10/03/SQLServer2008OffersPartitionlevelLockEscalationExcellentBut.aspx)
"As a result, I would suggest a slightly different architecture. Instead of using only a single partitioned table for both read-only and read-write data, use at least two tables. One table for read-only data and another for read-write data. If you think this might be defeating the purpose of partitioning... then look at these benefits:
* the read-only portion of the table (which is typically the *much* larger portion of the table - can still be managed with partitioning)
* the read-only portion - once separated from the read-write - can have additional indexes for better [range] query performance
* the read-only portion of the table can actually be partitioned into multiple partitioned tables - to give better per-table statistics (statistics are still at the table-level only so even if your partitioning scheme is "monthly" you might want to have tables that represent a year's worth of data...especially if your trends seem to change year to year)
* large range queries against the read-only portion of the data will only escalate to the "table" (which is now separated from the read-write data)
* the read-write portion of the data can have fewer indexes
* the read-write portion of the data can be placed on different disks (MORE fault tolerant disks) due to the importance/volatility of the data
* finally, and most importantly, the read-write portion of the data can be maintained completely separately from the read-only portion with regard to index rebuilds
"
Hope this helps,
--Dave
/****************************************************************************
// Copyright (c) 2005 Microsoft Corporation.
//
// @File: AutoIndex.sql
//
// @test-2:
//
// Purpose:
// Auto create or drop indexes
//
// Notes:
//
//
// @EndHeader@
*****************************************************************************/
CREATE DATABASE AUTOINDEXRECS
go
USE AUTOINDEXRECS
go
-- Table to store recommendations
IF object_id(N'dbo.recommendations', N'U') IS NOT NULL
DROP table [dbo].[recommendations]
GO
create table [dbo].[recommendations]
(
id int IDENTITY primary key,
recommendation nvarchar(400),
type char(2),
initial_time datetime,
latest_time datetime,
[count] int,
status nvarchar(20)
)
GO
-- Table to store recommendation history
IF object_id(N'dbo.recommendations_history', N'U') IS NOT NULL
DROP table [dbo].[recommendations_history]
GO
create table [dbo].[recommendations_history]
(
id int,
operation nvarchar(20),
time datetime,
db_user_name sysname,
login_name sysname
)
GO
-- Table to store index recommendations details
IF object_id(N'dbo.recommendations_details_index', N'U') IS NOT NULL
DROP table [dbo].[recommendations_details_index]
GO
create table [dbo].[recommendations_details_index]
(
id int,
database_id int,
table_id int,
table_modify_time datetime
)
GO
------------------------- add_recommendation_history ----------------------------------------------------
------ SP for adding a recommendation into the recommendations_history table.
IF OBJECT_ID (N'dbo.add_recommendation_history', N'P') IS NOT NULL
DROP PROC [dbo].[add_recommendation_history];
GO
create procedure [dbo].[add_recommendation_history]
@id int,
@operation nvarchar(20),
@time datetime
AS
BEGIN
declare @db_user_name sysname
select @db_user_name = CURRENT_USER
declare @login_name sysname
select @login_name = SUSER_SNAME()
insert into recommendations_history values (@id, @operation, @time, @db_user_name, @login_name)
END
go
------------------------- add_recommendation----------------------------------------------------
------ SP for inserting a new recommendation into the dbo.RECOMMENDATIONS table.
------ If the same entry already exists, it just changes latest_create_date to the latest time
------ and increase the count by one
IF OBJECT_ID (N'dbo.add_recommendation', N'P') IS NOT NULL
DROP PROC [dbo].[add_recommendation];
GO
create procedure [dbo].[add_recommendation]
@recommendation nvarchar(max),
@type_desc char(2),
@id int OUTPUT
AS
BEGIN
declare @create_date datetime
set @create_date = getdate()
IF ( @recommendation not in
(select recommendation from dbo.recommendations))
BEGIN
insert into dbo.recommendations values
(@recommendation, @type_desc, @create_date, @create_date, 1, N'Active')
select @id = @@identity
-- add it into the recommendation history
exec [dbo].[add_recommendation_history] @id, N'ADD', @create_date
return 0
END
ELSE
BEGIN
select @id = id
from dbo.recommendations
where @recommendation = recommendation
update dbo.recommendations
set latest_time = @create_date,
[count] = [count] +1
where id = @id
-- add it into the recommendation history
exec [dbo].[add_recommendation_history] @id, N'UPDATE', @create_date
return 10
END
END
go
------------------------- disable_recommendation----------------------------------------------------
------ SP for disabling a recommendation in the RECOMMENDATIONS table.
IF OBJECT_ID (N'dbo.disable_recommendation', N'P') IS NOT NULL
DROP PROC [dbo].[disable_recommendation];
GO
create procedure [dbo].[disable_recommendation]
@id int
AS
BEGIN
BEGIN TRANSACTION xDisableRecommendation
declare @create_date datetime
set @create_date = getdate()
update recommendations
set status = N'Inactive'
where id = @id
-- add it into the recommendation history
exec [dbo].[add_recommendation_history] @id, N'DISABLE', @create_date
DECLARE @Error int
SET @Error = @@ERROR
IF @Error <> 0
BEGIN
ROLLBACK TRANSACTION xDisableRecommendation
RETURN @Error
END
COMMIT TRANSACTION xDisableRecommendation
END
go
------------------------- enable_recommendation----------------------------------------------------
------ SP for enabling a recommendation in the RECOMMENDATIONS table.
IF OBJECT_ID (N'dbo.enable_recommendation', N'P') IS NOT NULL
DROP PROC [dbo].[enable_recommendation];
GO
create procedure [dbo].[enable_recommendation]
@id int
AS
BEGIN
BEGIN TRANSACTION xEnableRecommendation
declare @create_date datetime
set @create_date = getdate()
update recommendations
set status = N'Active'
where id = @id
-- add it into the recommendation history
exec [dbo].[add_recommendation_history] @id, N'ENABLE', @create_date
DECLARE @Error int
SET @Error = @@ERROR
IF @Error <> 0
BEGIN
ROLLBACK TRANSACTION xEnableRecommendation
RETURN @Error
END
COMMIT TRANSACTION xEnableRecommendation
END
go
------------------------- execute_recommendation----------------------------------------------------
------ SP for executing a recommendation in the RECOMMENDATIONS table.
IF OBJECT_ID (N'dbo.execute_recommendation', N'P') IS NOT NULL
DROP PROC [dbo].[execute_recommendation];
GO
create procedure [dbo].[execute_recommendation]
@id int
AS
BEGIN
declare @recommendation nvarchar(max)
declare @status nvarchar(20)
-- exec the recommendation
select @recommendation = recommendation, @status = status
from [recommendations]
where id = @id
-- check recommendation status
if (@status = 'Inactive')
begin
print N'Error: Recommendation ' + cast ( @id as nvarchar(10)) + ' is Inactive. Change the status to Active before execution'
return 1
end
-- check whether the schema has changed for the table
declare @database_id int
declare @object_id int
declare @stored_modify_date datetime
select @database_id = database_id, @object_id = table_id, @stored_modify_date = table_modify_time
from [dbo].[recommendations_details_index]
where id = @id
declare @database_name sysname
select @database_name = db_name(@database_id)
-- create temporary table to store the current table schema version
create table [#tabSchema] ( modify_date datetime)
truncate table [#tabSchema]
declare @exec_stmt nvarchar(4000)
select @exec_stmt =
'use '+ @database_name +
'; insert [#tabSchema] select modify_date from sys.objects where object_id = ' + cast ( @object_id as nvarchar(10))
--print @exec_stmt
EXEC (@exec_stmt)
declare @modify_date datetime
select @modify_date = modify_date from #tabSchema
if (object_id('[#tabSchema]') is not null)
begin
drop table [#tabSchema]
end
if (@modify_date > @stored_modify_date)
begin
print N'Error: Recommendation ' + cast ( @id as nvarchar(10)) + ' might be invalid since the schema on the table has changed since the recommendation was made'
return 1
end
declare @create_date datetime
set @create_date = getdate()
BEGIN TRANSACTION xExecuteRecommendation
exec (@recommendation)
-- add it into the recommendation history
exec [dbo].[add_recommendation_history] @id, N'EXECUTE', @create_date
DECLARE @Error int
SET @Error = @@ERROR
IF @Error <> 0
BEGIN
ROLLBACK TRANSACTION xExecuteRecommendation
RETURN @Error
END
COMMIT TRANSACTION xExecuteRecommendation
END
go
------------------------- add_recommendation_details_index ----------------------------------------------------
------ SP for adding index recommendation details into the recommendations_details_index table.
IF OBJECT_ID (N'dbo.add_recommendation_details_index', N'P') IS NOT NULL
DROP PROC [dbo].[add_recommendation_details_index];
GO
create procedure [dbo].[add_recommendation_details_index]
@id int,
@database_id int,
@table_id int
AS
BEGIN
declare @database_name sysname
select @database_name = db_name(@database_id)
-- create temporary table to store the current table schema version
create table [#tabSchemaVer] ( modify_date datetime)
truncate table [#tabSchemaVer]
declare @exec_stmt nvarchar(4000)
select @exec_stmt =
'use '+ @database_name +
'; insert [#tabSchemaVer] select modify_date from sys.objects where object_id = ' + cast ( @table_id as nvarchar(10))
--print @exec_stmt
EXEC (@exec_stmt)
declare @tabVer datetime
select @tabVer = modify_date from #tabSchemaVer
insert into recommendations_details_index values (@id,@database_id, @table_id, @tabVer)
if (object_id('[#tabSchemaVer]') is not null)
begin
drop table [#tabSchemaVer]
end
END
go
---------------------------- auto_create_index ------------------------------
IF OBJECT_ID (N'dbo.auto_create_index', N'P') IS NOT NULL
DROP PROC [dbo].[auto_create_index];
GO
create procedure [dbo].[auto_create_index]
as
-- NOTE: This sp will create indexes recommended by the Missing Index DMVs.
--
set nocount on
-- required for creating index on ICC/IVs
set ansi_warnings on
set ansi_padding on
set arithabort on
set concat_null_yields_null on
set numeric_roundabort off
declare @exec_stmt nvarchar(4000)
declare @table_name nvarchar(521)
declare @column_name sysname
declare @column_usage varchar(20)
declare @column_id smallint
declare @index_handle int
declare @database_id int
declare @object_id int
-- find the top 5 indexes with maximum total improvent
declare ms_cri_tnames cursor local static for
Select Top 5 mid.database_id, mid.object_id, mid.statement as table_name, mig.index_handle as index_handle
from
(
select
(user_seeks+user_scans) * avg_total_user_cost * (avg_user_impact * 0.01) as index_advantage, migs.*
from sys.dm_db_missing_index_group_stats migs
) as migs_adv,
sys.dm_db_missing_index_groups mig,
sys.dm_db_missing_index_details mid
where
migs_adv.group_handle = mig.index_group_handle and
mig.index_handle = mid.index_handle
and migs_adv.index_advantage > 10
order by migs_adv.index_advantage DESC
-- create temporary table to store the table names on which we just auto created indexes
create table #tablenametab
( table_name nvarchar(521) collate database_default
)
truncate table #tablenametab
open ms_cri_tnames
fetch next from ms_cri_tnames into @database_id, @object_id, @table_name, @index_handle
--print @table_name
while (@@fetch_status <> -1)
begin
-- don't auto create index on same table again
-- UNDONE: we may try to filter out local temp table in the future
if (@table_name not in (select table_name from #tablenametab ))
begin
-- these are all columns on which we are going to auto create indexes
declare ms_cri_cnames cursor local for
select column_id, quotename(column_name,'['), column_usage
from sys.dm_db_missing_index_columns(@index_handle)
-- now go over all columns for the index to-be-created and
-- construct the create index statement
open ms_cri_cnames
fetch next from ms_cri_cnames into @column_id, @column_name, @column_usage
declare @index_name sysname
declare @include_column_list nvarchar(517)
declare @key_list nvarchar(517)
select @index_name = '_MS_Sys'
select @key_list = ''
select @include_column_list = ''
declare @num_keys smallint
declare @num_include_columns smallint
select @num_keys = 0
select @num_include_columns = 0
while @@fetch_status >= 0
begin
-- construct index name, key list and include column list during the loop
-- Index Name in the format: _MS_Sys_colid1_colid2_..._colidn
if (@column_usage = 'INCLUDE')
begin
if (@num_include_columns = 0)
select @include_column_list = @column_name
else
select @include_column_list = @include_column_list + ', ' +@column_name
select @num_include_columns = @num_include_columns + 1
end
else
begin
if (@num_keys = 0)
select @key_list = @column_name
else
select @key_list = @key_list + ', ' +@column_name
select @num_keys = @num_keys + 1
select @index_name = @index_name + '_'+cast ( @column_id as nvarchar(10))
end
fetch next from ms_cri_cnames into @column_id, @column_name, @column_usage
end
close ms_cri_cnames
deallocate ms_cri_cnames
--print @index_name
--print @table_name
--print @key_list
--print @include_column_list
-- construct create index statement
-- "CREATE INDEX @INDEX_NAME ON @TABLE_NAME (KEY_NAME1, KEY_NAME2, ...) INCLUDE (INCLUDE_COL_NAME1, INCLUDE_COL_NAME2, ...) WITH (ONLINE = ON)" (Note: for recommendation mode, we don't use online option)
if (@num_include_columns > 0)
select @exec_stmt = 'CREATE INDEX ' + @index_name + ' ON ' + @table_name + '(' + @key_list + ') INCLUDE ('+ @include_column_list + ')'-- WITH (ONLINE = ON)'
else
select @exec_stmt = 'CREATE INDEX ' + @index_name + ' ON ' + @table_name + '(' + @key_list + ')'-- WITH (ONLINE = ON)'
--print @exec_stmt
declare @id int
declare @create_date datetime
BEGIN TRANSACTION xAddCreateIdxRecommendation
DECLARE @result int;
EXEC @result = dbo.add_recommendation @exec_stmt, 'CI', @id OUT
if (@result <> 10)
EXEC dbo.add_recommendation_details_index @id, @database_id, @object_id
DECLARE @Error int
SET @Error = @@ERROR
IF @Error <> 0
BEGIN
ROLLBACK TRANSACTION xAddCreateIdxRecommendation
RETURN @Error
END
COMMIT TRANSACTION xAddCreateIdxRecommendation
--EXEC (@exec_stmt)
-- insert the table name into #tablenametab
insert into #tablenametab values (@table_name)
end
fetch next from ms_cri_tnames into @database_id, @object_id, @table_name, @index_handle
end
deallocate ms_cri_tnames
return(0) -- auto_create_index
go
---------------------------- sp_autodropindex ------------------------------
IF OBJECT_ID (N'dbo.auto_drop_index', N'P') IS NOT NULL
DROP PROC [dbo].[auto_drop_index];
GO
create procedure [dbo].[auto_drop_index]
as
-- NOTE: This sp will drop indexes that are automatically created and
-- are no longer very useful in a cost efficient manner based on feedbacks
-- from index usage DMVs.
set nocount on
declare @database_id int
declare @object_id int
declare @index_id int
declare ms_drpi_iids cursor local static for
Select Top 3 database_id, object_id, index_id
from sys.dm_db_index_usage_stats
where user_updates > 10 * (user_seeks+user_scans)
and index_id > 1
order by user_updates / (user_seeks+user_scans+1) DESC
open ms_drpi_iids
fetch next from ms_drpi_iids into @database_id, @object_id, @index_id
-- create temporary table to store the table name and index name
create table #tabIdxnametab
(
table_name nvarchar(1000) collate database_default,
index_name nvarchar(521) collate database_default
)
while (@@fetch_status >= 0)
begin
declare @exec_stmt nvarchar(4000)
declare @database_name sysname
select @database_name = db_name(@database_id)
truncate table #tabIdxnametab
-- insert the table name and index name into the temp table
select @exec_stmt =
'use '+ @database_name + ';'+
'insert #tabIdxnametab select quotename(''' + @database_name+''', ''['')+ ''.'' +quotename(schema_name(o.schema_id), ''['')+''.''+quotename(o.name,''['') , i.name
from sys.objects o, sys.indexes i where o.type = ''U'' and o.is_ms_shipped = 0 and i.is_primary_key = 0 and i.is_unique_constraint = 0 and o.object_id =' + cast ( @object_id as nvarchar(10))+' and o.object_id = i.object_id and index_id = '+ cast ( @index_id as nvarchar(10))
--print @exec_stmt
EXEC (@exec_stmt)
-- get the table_name and index_name
declare @table_name nvarchar(1000)
declare @index_name sysname
select @table_name = table_name, @index_name = index_name from #tabIdxnametab
--use name convention to recognize auto-created indexes for now
--in the future, we will add a special bit inside metadata to distinguish
--if (substring(@index_name, 1, 8) = '_MS_Sys_')
--begin
-- construct drop index statement
-- "DROP INDEX @TABLE_NAME.@INDEX_NAME"
--select @exec_stmt = 'drop index '+@index_name+' on '+@table_name
--print @exec_stmt
--EXEC (@exec_stmt)
--end
--else
--print 'User Index: '+@table_name + '.'+ @index_name
IF (@index_name IS NOT NULL)
begin
select @exec_stmt = 'drop index '+@index_name+' on '+@table_name
declare @id int
declare @create_date datetime
BEGIN TRANSACTION xAddDropIdxRecommendation
DECLARE @result int;
EXEC @result = dbo.add_recommendation @exec_stmt, 'DI', @id out
if (@result <> 10)
EXEC dbo.add_recommendation_details_index @id, @database_id, @object_id
DECLARE @Error int
SET @Error = @@ERROR
IF @Error <> 0
BEGIN
ROLLBACK TRANSACTION xAddDropIdxRecommendation
RETURN @Error
END
COMMIT TRANSACTION xAddDropIdxRecommendation
end
fetch next from ms_drpi_iids into @database_id, @object_id, @index_id
end
if (object_id('[#tabIdxnametab]') is not null)
begin
drop table [#tabIdxnametab]
end
deallocate ms_drpi_iids
return(0) -- auto_drop_index
go
--
-- JOBs for Executing [auto_create_index] and [auto_drop_index]
--
DECLARE @jobId BINARY(16)
EXEC msdb.dbo.sp_add_job
@job_name=N'SQL MDW: Auto Index Management',
@job_id = @jobId OUTPUT
GO
EXEC msdb.dbo.sp_add_jobstep
@job_name=N'SQL MDW: Auto Index Management',
@step_name=N'Auto Create Index',
@step_id=1,
@subsystem=N'TSQL',
@command=N'EXECUTE [dbo].[auto_create_index]',
@on_success_action = 3, -- on success, go to next step
@database_name=N'AUTOINDEXRECS'
GO
EXEC msdb.dbo.sp_add_jobstep
@job_name=N'SQL MDW: Auto Index Management',
@step_name=N'Auto Drop Index',
@step_id=2,
@subsystem=N'TSQL',
@command=N'EXECUTE [dbo].[auto_drop_index]',
@database_name=N'AUTOINDEXRECS'
GO
EXEC msdb.dbo.sp_add_jobserver
@job_name=N'SQL MDW: Auto Index Management'
GO
DECLARE @schedule_id int
EXEC msdb.dbo.sp_add_schedule
@schedule_name = N'SQL MDW: Auto Index Management' ,
@freq_type = 4, -- daily
@freq_interval = 1, -- every day
@freq_subday_type = 4, -- subday interval in minutes
@freq_subday_interval = 30, -- every 30 minutes
@schedule_id = @schedule_id OUTPUT
EXEC msdb.dbo.sp_attach_schedule
@job_name=N'SQL MDW: Auto Index Management',
@schedule_id = @schedule_id
go
February 3, 2008 at 4:02 pm
Hi Jay,
After Running a similar sized Datawarehouse (without Cubes) the following is the main advice I can give you.
1) Business rules for validating and cleaning the Data take the bulk of the ETL time.
2) Pay particular attention to your disk configuration, make good use of filegroups and multiple disk arrays (we did not have a SAN but inherited 100 disks with 6 Raid controllers), make sure your NTFS allocations are correct, I found 64K was best performance for our configuration. Would recommend using Mount points for you disk arrays as it makes for easier configuration and easier restores to other systems i.e. dev, test.
3) Backup Compression software in our case was a must for to meet backup windows and reduce disk space. (used SQL Lite Speed with no problems)
4) The Warehouse was based around Kimballs dimensional model, found that we had to add back some of the Natural keys to some of the very large Fact and dimensional tables for performance reasons, the joins between the large tables was killing performance.
5) Appropriate indexing and Up to date statistic's (we updated nightly for all but the largest table) makes a huge performance impact.
6) Setting the Warehouse to Read only after the ETL made a huge difference to reporting speed
I trust this is helpful, can give further details if you would like.
Cheers
Brandon
Brandon
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply