Purging data from Master and 5 child tables

  • Mac1986 (7/5/2014)


    Point taken. My plan is to drop all the Non Clustered indexes in all 6 tables and only add them in case by case basis.

    Can you possibly give scripts needed for partition function and partition schema for all 6 tables so that i can start the further testing.

    Sample table with TimeStarted as Partitioned key.

    dbo.ConnectionSample with SampleID as Partitioned key

    ConnectionDB with ConnectionSampleID as Partitioned key

    ConnectionSS with ConnectionSampleID as Partitioned key

    ConnectionRT with ConnectionSampleID as Partitioned key

    Command with ConnectionSampleID as Partitioned key

    To be honest and with absolutely no malice in my heart when I say this, you're the one that will need to support all of this in the future. The best way to do this is for you to study the subject well enough to do it on your own and then do it.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (7/6/2014)


    Mac1986 (7/5/2014)


    Point taken. My plan is to drop all the Non Clustered indexes in all 6 tables and only add them in case by case basis.

    Can you possibly give scripts needed for partition function and partition schema for all 6 tables so that i can start the further testing.

    Sample table with TimeStarted as Partitioned key.

    dbo.ConnectionSample with SampleID as Partitioned key

    ConnectionDB with ConnectionSampleID as Partitioned key

    ConnectionSS with ConnectionSampleID as Partitioned key

    ConnectionRT with ConnectionSampleID as Partitioned key

    Command with ConnectionSampleID as Partitioned key

    To be honest and with absolutely no malice in my heart when I say this, you're the one that will need to support all of this in the future. The best way to do this is for you to study the subject well enough to do it on your own and then do it.

    But also, if you have questions or run into a temporary roadblock you can always come back here and ask questions.

  • Thanks Lynn. I've created partitions to the table table in the way i wanted. I've create hourly partitions on the table. Now I'm thinking to use switch partition to move the data older than 8 days to another table and just truncate the cleanup table.

    For the existing table, i will use SPLIT to create new partitions once a day. I understand that to do this i need exclusive lock on the table for a short period of time so i will be turning off the service run the split partition script and SWITCH Partition script and then turn back on the services. This i'm planning to do once a day to keep the latency minimal.

    My next step, write below store procs

    1) To dynamically decide which partitions to switch and perform the SWITCH Partition Operation accordingly

    2) To dynamically decide what should be new 24 partitions for the next day.The new partitions will be create for getdate() + 2. Meaning, today is 2014-07-06. I will create 24 partitions for 2014-07-08.

    All the above things needs to be calculated dynamically. this is what i'm thinking. Hope all this works out. FYI above things is only for the Master table Sample. For the tables that do not have Datetime column, i will perform the same activity but using BIGINT columns.

    Meanwhile, can you tell me how to convert Datetime '2014-07-09 00:00:00.000' to this weird format '2014-07-09T00:00:00'

    Bless me for success. :):)

  • If the column is a datetime data type, there is no formatting of the value. What you are showing are two different character representations of the same datetime value.

  • When i scripted out the Partition function the rage values were in those weird format.

    Thanks for the tip I was able to create Partition Function using the format.

  • I'm virtually certain the tables are clustered improperly. That should be fixed before you do the partitioning. It's even possible you won't need to partition if you put the best clustered indexes on the tables. Of course, given the size of the tables, you'll probably need to archive first, then re-cluster :-). You can move the older data in batches.

    Based only on your current indexing, all but the ConnectionSample have fairly clear-cut clustered indexes:

    Sample ( [TimeStarted], [DepotId] )

    Edit: Sample might be better off with Sample ( [DepotId], [TimeStarted] ), would need more details to decide which is best.

    ConnectionSample ?? ( [ConnectionId], [SampleId] ) ??OR?? ( [SampleId], ?? )

    <last 4 tables> ( [ConnectionSampleId] )

    But, before you do anything, you need to capture the existing index stats from SQL. The script below will do that. This will tell you existing index usage, which is critical, and what SQL thinks are "missing" indexes, which is useful but must be carefully analyzed (i.e., do NOT willy-nilly build such indexes, SQL over-recommends indexes big time). Note too that index usage is reported very quickly, even for large tables, whereas missing indexes can take quite some time, depending on the db. It's very useful info, though, so you should run it at least once.

    --Before running, change the bolded lines to match your requirements

    --

    --!! chg to your db name

    USE [b]<your_db_name>[/b]

    SET DEADLOCK_PRIORITY LOW --probably irrelevant, but just in case

    DECLARE @list_missing_indexes bit

    DECLARE @table_name_pattern sysname

    --NOTE: showing missing indexes can take some time; set to 0 if you don't want to wait.

    SET @list_missing_indexes = 1 --1=list missing index(es); 0=don't.

    --!! put your table name/name pattern here

    SET @table_name_pattern = '%' --%=all tables;.

    PRINT 'Started @ ' + CONVERT(varchar(30), GETDATE(), 120)

    IF @list_missing_indexes = 1

    BEGIN

    SELECT

    GETDATE() AS capture_date,

    DB_NAME(mid.database_id) AS Db_Name,

    OBJECT_NAME(mid.object_id /*, mid.database_id*/) AS Table_Name,

    mid.equality_columns, mid.inequality_columns, mid.included_columns,

    ca1.sql_up_days AS days_in_use,

    migs.*,

    mid.statement, mid.object_id, mid.index_handle

    FROM sys.dm_db_missing_index_details mid WITH (NOLOCK)

    CROSS APPLY (

    SELECT DATEDIFF(DAY, create_date, GETDATE()) AS sql_up_days FROM sys.databases WHERE name = 'tempdb'

    ) AS ca1

    LEFT OUTER JOIN sys.dm_db_missing_index_groups mig WITH (NOLOCK) ON

    mig.index_handle = mid.index_handle

    LEFT OUTER JOIN sys.dm_db_missing_index_group_stats migs WITH (NOLOCK) ON

    migs.group_handle = mig.index_group_handle

    --order by

    --DB_NAME, Table_Name, equality_columns

    WHERE

    1 = 1

    AND mid.database_id = DB_ID()

    AND OBJECT_NAME(mid.object_id) LIKE @table_name_pattern

    ORDER BY

    --avg_total_user_cost * (user_seeks + user_scans) DESC,

    Db_Name, Table_Name, equality_columns, inequality_columns

    END --IF

    PRINT 'Midpoint @ ' + CONVERT(varchar(30), GETDATE(), 120)

    -- list index usage stats (seeks, scans, etc.)

    SELECT

    ius2.row_num, DB_NAME() AS db_name,

    i.name AS index_name,

    OBJECT_NAME(i.object_id/*, DB_ID()*/) AS table_name,

    i.index_id, --ius.user_seeks + ius.user_scans AS total_reads,

    dps.row_count,

    SUBSTRING(key_cols, 3, 8000) AS key_cols, SUBSTRING(nonkey_cols, 3, 8000) AS nonkey_cols,

    ius.user_seeks, ius.user_scans, ius.user_lookups, ius.user_updates,

    ius.last_user_seek, ius.last_user_scan, ius.last_user_lookup, ius.last_user_update,

    fk.Reference_Count AS fk_ref_count,

    FILEGROUP_NAME(i.data_space_id) AS filegroup_name,

    ca1.sql_up_days AS days_in_use,

    ius.system_seeks, ius.system_scans, ius.system_lookups, ius.system_updates,

    ius.last_system_seek, ius.last_system_scan, ius.last_system_lookup, ius.last_system_update

    FROM sys.indexes i WITH (NOLOCK)

    INNER JOIN sys.objects o WITH (NOLOCK) ON

    o.object_id = i.object_id

    CROSS APPLY (

    SELECT DATEDIFF(DAY, create_date, GETDATE()) AS sql_up_days FROM sys.databases WHERE name = 'tempdb'

    ) AS ca1

    OUTER APPLY (

    SELECT

    ', ' + COL_NAME(object_id, ic.column_id)

    FROM sys.index_columns ic

    WHERE

    ic.key_ordinal > 0 AND

    ic.object_id = i.object_id AND

    ic.index_id = i.index_id

    ORDER BY

    ic.key_ordinal

    FOR XML PATH('')

    ) AS key_cols (key_cols)

    OUTER APPLY (

    SELECT

    ', ' + COL_NAME(object_id, ic.column_id)

    FROM sys.index_columns ic

    WHERE

    ic.key_ordinal = 0 AND

    ic.object_id = i.object_id AND

    ic.index_id = i.index_id

    ORDER BY

    COL_NAME(object_id, ic.column_id)

    FOR XML PATH('')

    ) AS nonkey_cols (nonkey_cols)

    LEFT OUTER JOIN sys.dm_db_partition_stats dps WITH (NOLOCK) ON

    dps.object_id = i.object_id AND

    dps.index_id = i.index_id

    LEFT OUTER JOIN sys.dm_db_index_usage_stats ius WITH (NOLOCK) ON

    ius.database_id = DB_ID() AND

    ius.object_id = i.object_id AND

    ius.index_id = i.index_id

    LEFT OUTER JOIN (

    SELECT

    database_id, object_id, MAX(user_scans) AS user_scans,

    ROW_NUMBER() OVER (ORDER BY MAX(user_scans) DESC) AS row_num --user_scans|user_seeks+user_scans

    FROM sys.dm_db_index_usage_stats WITH (NOLOCK)

    WHERE

    database_id = DB_ID()

    --AND index_id > 0

    GROUP BY

    database_id, object_id

    ) AS ius2 ON

    ius2.database_id = DB_ID() AND

    ius2.object_id = i.object_id

    LEFT OUTER JOIN (

    SELECT

    referenced_object_id, COUNT(*) AS Reference_Count

    FROM sys.foreign_keys WITH (NOLOCK)

    WHERE

    is_disabled = 0

    GROUP BY

    referenced_object_id

    ) AS fk ON

    fk.referenced_object_id = i.object_id

    WHERE

    i.object_id > 100 AND

    i.is_hypothetical = 0 AND

    i.type IN (0, 1, 2) AND

    o.type NOT IN ( 'IF', 'IT', 'TF', 'TT' ) AND

    (

    o.name LIKE @table_name_pattern AND

    o.name NOT LIKE 'dtprop%' AND

    o.name NOT LIKE 'filestream[_]' AND

    o.name NOT LIKE 'MSpeer%' AND

    o.name NOT LIKE 'MSpub%' AND

    --o.name NOT LIKE 'queue[_]%' AND

    o.name NOT LIKE 'sys%'

    )

    --AND OBJECT_NAME(i.object_id, DB_ID()) IN ('tbl1', 'tbl2', 'tbl3')

    ORDER BY

    --row_count DESC,

    --ius.user_scans DESC,

    --ius2.row_num, --user_scans+user_seeks

    -- list clustered index first, if any, then other index(es)

    db_name, table_name, CASE WHEN i.index_id IN (0, 1) THEN 1 ELSE 2 END, index_name

    PRINT 'Ended @ ' + CONVERT(varchar(30), GETDATE(), 120)

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • If you make all the FK's ON DELETE CASCADE you can just delete small batches of rows from the Sample table. You would need indexes on the SampleID columns on all the tables with FKs though to get any sort of performance.

Viewing 7 posts - 16 through 21 (of 21 total)

You must be logged in to reply to this topic. Login to reply