Purging data from Master and 5 child tables

Question

Purging data from Master and 5 child tables

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 1

Mac1986 (7/5/2014)
Point taken. My plan is to drop all the Non Clustered indexes in all 6 tables and only add them in case by case basis.
Can you possibly give scripts needed for partition function and partition schema for all 6 tables so that i can start the further testing.
Sample table with TimeStarted as Partitioned key.
dbo.ConnectionSample with SampleID as Partitioned key
ConnectionDB with ConnectionSampleID as Partitioned key
ConnectionSS with ConnectionSampleID as Partitioned key
ConnectionRT with ConnectionSampleID as Partitioned key
Command with ConnectionSampleID as Partitioned key

To be honest and with absolutely no malice in my heart when I say this, you're the one that will need to support all of this in the future. The best way to do this is for you to study the subject well enough to do it on your own and then do it.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

Jeff Moden (7/6/2014)
Mac1986 (7/5/2014)
Point taken. My plan is to drop all the Non Clustered indexes in all 6 tables and only add them in case by case basis.
Can you possibly give scripts needed for partition function and partition schema for all 6 tables so that i can start the further testing.
Sample table with TimeStarted as Partitioned key.
dbo.ConnectionSample with SampleID as Partitioned key
ConnectionDB with ConnectionSampleID as Partitioned key
ConnectionSS with ConnectionSampleID as Partitioned key
ConnectionRT with ConnectionSampleID as Partitioned key
Command with ConnectionSampleID as Partitioned key
To be honest and with absolutely no malice in my heart when I say this, you're the one that will need to support all of this in the future. The best way to do this is for you to study the subject well enough to do it on your own and then do it.

But also, if you have questions or run into a temporary roadblock you can always come back here and ask questions.

Mac1986 SSCertifiable Points: 6216 More actions · Answer 3

Thanks Lynn. I've created partitions to the table table in the way i wanted. I've create hourly partitions on the table. Now I'm thinking to use switch partition to move the data older than 8 days to another table and just truncate the cleanup table.

For the existing table, i will use SPLIT to create new partitions once a day. I understand that to do this i need exclusive lock on the table for a short period of time so i will be turning off the service run the split partition script and SWITCH Partition script and then turn back on the services. This i'm planning to do once a day to keep the latency minimal.

My next step, write below store procs

1) To dynamically decide which partitions to switch and perform the SWITCH Partition Operation accordingly

2) To dynamically decide what should be new 24 partitions for the next day.The new partitions will be create for getdate() + 2. Meaning, today is 2014-07-06. I will create 24 partitions for 2014-07-08.

All the above things needs to be calculated dynamically. this is what i'm thinking. Hope all this works out. FYI above things is only for the Master table Sample. For the tables that do not have Datetime column, i will perform the same activity but using BIGINT columns.

Meanwhile, can you tell me how to convert Datetime '2014-07-09 00:00:00.000' to this weird format '2014-07-09T00:00:00'

Bless me for success. :):)

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 4

If the column is a datetime data type, there is no formatting of the value. What you are showing are two different character representations of the same datetime value.

Mac1986 SSCertifiable Points: 6216 More actions · Answer 5

When i scripted out the Partition function the rage values were in those weird format.

Thanks for the tip I was able to create Partition Function using the format.

ScottPletcher SSC Guru Points: 101130 More actions · Answer 6

I'm virtually certain the tables are clustered improperly. That should be fixed before you do the partitioning. It's even possible you won't need to partition if you put the best clustered indexes on the tables. Of course, given the size of the tables, you'll probably need to archive first, then re-cluster :-). You can move the older data in batches.

Based only on your current indexing, all but the ConnectionSample have fairly clear-cut clustered indexes:

Sample ( [TimeStarted], [DepotId] )

Edit: Sample might be better off with Sample ( [DepotId], [TimeStarted] ), would need more details to decide which is best.

ConnectionSample ?? ( [ConnectionId], [SampleId] ) ??OR?? ( [SampleId], ?? )

<last 4 tables> ( [ConnectionSampleId] )

But, before you do anything, you need to capture the existing index stats from SQL. The script below will do that. This will tell you existing index usage, which is critical, and what SQL thinks are "missing" indexes, which is useful but must be carefully analyzed (i.e., do NOT willy-nilly build such indexes, SQL over-recommends indexes big time). Note too that index usage is reported very quickly, even for large tables, whereas missing indexes can take quite some time, depending on the db. It's very useful info, though, so you should run it at least once.

--Before running, change the bolded lines to match your requirements

--

--!! chg to your db name

USE [b]<your_db_name>[/b]

SET DEADLOCK_PRIORITY LOW --probably irrelevant, but just in case

DECLARE @list_missing_indexes bit

DECLARE @table_name_pattern sysname

--NOTE: showing missing indexes can take some time; set to 0 if you don't want to wait.

SET @list_missing_indexes = 1 --1=list missing index(es); 0=don't.

--!! put your table name/name pattern here

SET @table_name_pattern = '%' --%=all tables;.

PRINT 'Started @ ' + CONVERT(varchar(30), GETDATE(), 120)

IF @list_missing_indexes = 1

BEGIN

SELECT

GETDATE() AS capture_date,

DB_NAME(mid.database_id) AS Db_Name,

OBJECT_NAME(mid.object_id /*, mid.database_id*/) AS Table_Name,

mid.equality_columns, mid.inequality_columns, mid.included_columns,

ca1.sql_up_days AS days_in_use,

migs.*,

mid.statement, mid.object_id, mid.index_handle

FROM sys.dm_db_missing_index_details mid WITH (NOLOCK)

CROSS APPLY (

SELECT DATEDIFF(DAY, create_date, GETDATE()) AS sql_up_days FROM sys.databases WHERE name = 'tempdb'

) AS ca1

LEFT OUTER JOIN sys.dm_db_missing_index_groups mig WITH (NOLOCK) ON

mig.index_handle = mid.index_handle

LEFT OUTER JOIN sys.dm_db_missing_index_group_stats migs WITH (NOLOCK) ON

migs.group_handle = mig.index_group_handle

--order by

--DB_NAME, Table_Name, equality_columns

WHERE

1 = 1

AND mid.database_id = DB_ID()

AND OBJECT_NAME(mid.object_id) LIKE @table_name_pattern

ORDER BY

--avg_total_user_cost * (user_seeks + user_scans) DESC,

Db_Name, Table_Name, equality_columns, inequality_columns

END --IF

PRINT 'Midpoint @ ' + CONVERT(varchar(30), GETDATE(), 120)

-- list index usage stats (seeks, scans, etc.)

SELECT

ius2.row_num, DB_NAME() AS db_name,

i.name AS index_name,

OBJECT_NAME(i.object_id/*, DB_ID()*/) AS table_name,

i.index_id, --ius.user_seeks + ius.user_scans AS total_reads,

dps.row_count,

SUBSTRING(key_cols, 3, 8000) AS key_cols, SUBSTRING(nonkey_cols, 3, 8000) AS nonkey_cols,

ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,

ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,

fk.Reference_Count AS fk_ref_count,

FILEGROUP_NAME(i.data_space_id) AS filegroup_name,

ca1.sql_up_days AS days_in_use,

ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,

ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update

FROM sys.indexes i WITH (NOLOCK)

INNER JOIN sys.objects o WITH (NOLOCK) ON

o.object_id = i.object_id

CROSS APPLY (

SELECT DATEDIFF(DAY, create_date, GETDATE()) AS sql_up_days FROM sys.databases WHERE name = 'tempdb'

) AS ca1

OUTER APPLY (

SELECT

', ' + COL_NAME(object_id, ic.column_id)

FROM sys.index_columns ic

WHERE

ic.key_ordinal > 0 AND

ic.object_id = i.object_id AND

ic.index_id = i.index_id

ORDER BY

ic.key_ordinal

FOR XML PATH('')

) AS key_cols (key_cols)

OUTER APPLY (

SELECT

', ' + COL_NAME(object_id, ic.column_id)

FROM sys.index_columns ic

WHERE

ic.key_ordinal = 0 AND

ic.object_id = i.object_id AND

ic.index_id = i.index_id

ORDER BY

COL_NAME(object_id, ic.column_id)

FOR XML PATH('')

) AS nonkey_cols (nonkey_cols)

LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON

dps.object_id = i.object_id AND

dps.index_id = i.index_id

LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON

ius.database_id = DB_ID() AND

ius.object_id = i.object_id AND

ius.index_id = i.index_id

LEFT OUTER JOIN (

SELECT

database_id, object_id, MAX(user_scans) AS user_scans,

ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans

FROM sys.dm_db_index_usage_stats WITH (NOLOCK)

WHERE

database_id = DB_ID()

--AND index_id > 0

GROUP BY

database_id, object_id

) AS ius2 ON

ius2.database_id = DB_ID() AND

ius2.object_id = i.object_id

LEFT OUTER JOIN (

SELECT

referenced_object_id, COUNT(*) AS Reference_Count

FROM sys.foreign_keys WITH (NOLOCK)

WHERE

is_disabled = 0

GROUP BY

referenced_object_id

) AS fk ON

fk.referenced_object_id = i.object_id

WHERE

i.object_id > 100 AND

i.is_hypothetical = 0 AND

i.type IN (0, 1, 2) AND

o.type NOT IN ( 'IF', 'IT', 'TF', 'TT' ) AND

(

o.name LIKE @table_name_pattern AND

o.name NOT LIKE 'dtprop%' AND

o.name NOT LIKE 'filestream[_]' AND

o.name NOT LIKE 'MSpeer%' AND

o.name NOT LIKE 'MSpub%' AND

--o.name NOT LIKE 'queue[_]%' AND

o.name NOT LIKE 'sys%'

)

--AND OBJECT_NAME(i.object_id, DB_ID()) IN ('tbl1', 'tbl2', 'tbl3')

ORDER BY

--row_count DESC,

--ius.user_scans DESC,

--ius2.row_num, --user_scans+user_seeks

-- list clustered index first, if any, then other index(es)

db_name, table_name, CASE WHEN i.index_id IN (0, 1) THEN 1 ELSE 2 END, index_name

PRINT 'Ended @ ' + CONVERT(varchar(30), GETDATE(), 120)

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

Jonathan AC Roberts SSCoach Points: 18377 More actions · Answer 7

If you make all the FK's ON DELETE CASCADE you can just delete small batches of rows from the Sample table. You would need indexes on the SampleID columns on all the tables with FKs though to get any sort of performance.