purging old data and outer joins

Question

purging old data and outer joins

Viewing 15 posts - 16 through 30 (of 53 total)

You must be logged in to reply to this topic. Login to reply

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 1

Indianrock (2/14/2016)
Thanks Hugo, I will thoroughly review your post.
I'll experiment with the number of rows gathered into the temp table up front -- this is only done once per day as the job runs once per day. Or in the case of this Address table where there are currently only about 4 million available for deletion ( meeting purge criteria ) I might try gathering all of them into the temp table on a weekend and then run the while loop. Also considering having the while loop deletions purge 100 at a time as long as that value is low enough to not cause blocking.
Current simplified purge logic:
Get x number of "id" values from target table into a temp table
In a while loop, delete one at a time of the ID values in the temp table

Note that the gathering of rows to be deleted into a temp table (assuming default isolation level) does not cause blocking. Only the deletion does. I think you are aware of this, but just wanted to make sure.

EDIT: Also, deleting one at a time is a bit too extreme, in my opinion. I think that deleting 5,000 or 10,000 or so at a time whould be just fine.

Do check the estimated execution plan of a delete first, to verify that all foreign key constraint checks can be done using index seek. If index or table scans are needed, it will be slow and cause blocking (regardless of the number of rows deleted).

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 2

Indianrock (2/14/2016)
As far as indexes on the temp table, the temp table is created with an integer ID identity column as primary key.
CREATE TABLE #tempdata
(
ID INT IDENTITY(1, 1)
PRIMARY KEY ,
BILLING_ENTRY_ID INT
);

Why on earth would you need an identity column for that table?

Either create it like this:

CREATE TABLE #tempdata

(

BILLING_ENTRY_ID INT PRIMARY KEY

);

Or create it as a heap and add the clustered index later:

CREATE TABLE #tempdata

(

BILLING_ENTRY_ID INT

);

-- Load data

ALTER TABLE #tempdata ADD PRIMARY KEY BILLING_ENTRY_ID;

(Note that I typed that syntax by head, you may want to check this)

Also: While you edited your message, I was busy editing my previous message. Please check the additional comment there as well.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Indianrock SSC-Insane Points: 20333 More actions · Answer 3

The identity column is used in the while loop as shown below.

SELECT @counter = MAX(ID)

FROM #tempdata;

SET @rowcount = 1;

WHILE @rowcount > 0

AND @counter > 0

BEGIN

BEGIN TRY

BEGIN TRANSACTION;

DELETE TOP ( 1 )

FROM collateralmanager.DBO.ADDRESS

WHERE ADDRESS.ADDRESS_ID IN ( SELECT ADDRESS_ID

FROM #tempdata IN ( SELECT ADDRESS_ID

FROM #tempdata WHERE [ID] = @counter );

---WHERE [ID] BETWEEN @counter AND @counter-100 )

OPTION ( MAXDOP 8 );

SET @rowcount = @@rowcount;

SET @counter = @counter - 1;

IF ( @counter % 1000 ) = 0

BEGIN

SET @errmessage = 'Counter down to '

+ CAST(@counter AS VARCHAR(20)) + ' and deleted this loop=' + CAST(@rowcount AS VARCHAR(20)) + ' at '

+ CAST(GETDATE() AS VARCHAR(20));

RAISERROR(@errmessage ,10,1) WITH NOWAIT;

END;

COMMIT;

SELECT @logpercentage = cntr_value

FROM sys.dm_os_performance_counters

WHERE instance_name = 'collateralmanager'

AND counter_name = 'Percent Log Used';

IF @logpercentage > 30

BEGIN

SET @msg = ' log more than 30 percent full, waiting 5 min';

RAISERROR (@msg,10,1) WITH NOWAIT;

WAITFOR DELAY '00:05:00';

END;

END TRY

BEGIN CATCH

-- There was an error

IF @@TRANCOUNT > 0

ROLLBACK;

-- Raise an error with the details of the exception

SELECT @ErrMsg = ERROR_MESSAGE() ,

@ErrSeverity = ERROR_SEVERITY();

RAISERROR (

@ErrMsg

,@ErrSeverity

,1

);

--DECLARE @emailmessage VARCHAR(350)

-- SET @emailmessage='Purge script failed: ' + @errmsg

--EXECUTE msdb.dbo.sp_notify_operator @name=N'TestOperator',@body=@emailmessage

END CATCH;

-----WAITFOR DELAY '00:00:05';

END;

IF OBJECT_ID('tempdb..#tempdata') IS NOT NULL

DROP TABLE #tempdata;

-- End PurgeProd_Address

Indianrock SSC-Insane Points: 20333 More actions · Answer 4

Attached plans for the alternative query. The "top 5 million" version is still running. To change this delete logic I'd have to run it through QA which could take weeks, but worth testing anyway. The new indexes on early payoff would also take weeks to get to prod.

-- 20k in 10 minutes in prod without new indexes

SELECT TOP (500000) ADDRESS_ID

FROM dbo.ADDRESS

WHERE NOT EXISTS (SELECT * FROM EARLY_PAYOFF_INFO EPI1 WHERE EPI1.ADDRESS_ID = ADDRESS.ADDRESS_ID)

AND NOT EXISTS (SELECT * FROM EARLY_PAYOFF_INFO EPI2 WHERE EPI2.ADDRESS_ID1 = ADDRESS.ADDRESS_ID)

AND NOT EXISTS (SELECT * FROM EARLY_PAYOFF_INFO EPI3 WHERE EPI3.ADDRESS_ID2 = ADDRESS.ADDRESS_ID)

AND NOT EXISTS (SELECT * FROM CHECK_REQUEST_DEFAULT_ADDRESS WHERE CHECK_REQUEST_DEFAULT_ADDRESS.ADDRESS_ID = ADDRESS.ADDRESS_ID)

AND NOT EXISTS (SELECT * FROM DIRECT_LENDING_SERVICE WHERE DIRECT_LENDING_SERVICE.PREV_LIENHOLDER_MAIL_TO_ADDRESS_ID = ADDRESS.ADDRESS_ID)

AND NOT EXISTS (SELECT * FROM PACKET WHERE PACKET.ADDRESS_ID = ADDRESS.ADDRESS_ID)

AND NOT EXISTS (SELECT * FROM WORK_QUEUE_ITEM WHERE WORK_QUEUE_ITEM.ADDRESS_ID = ADDRESS.ADDRESS_ID)

AND NOT EXISTS (SELECT * FROM LEGAL_ENTITY WHERE LEGAL_ENTITY.ADDRESS_ID = ADDRESS.ADDRESS_ID)

AND NOT EXISTS (SELECT * FROM SERVICE_REQUEST WHERE SERVICE_REQUEST.ADDRESS_ID = ADDRESS.ADDRESS_ID)

AND NOT EXISTS (SELECT * FROM LIENHOLDER WHERE LIENHOLDER.ADDRESS_ID = ADDRESS.ADDRESS_ID)

ORDER BY ADDRESS_ID;

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 5

That delete logic looks pretty inefficient. I think you can use this as the delete statement:

-- Delete first 1,000 addresses

DELETE FROM collateralmanager.DBO.ADDRESS

WHERE ADDRESS_ID IN

(SELECT TOP(1000) ADDRESS_ID

FROM #tempdata

ORDER BY ADDRESS_ID);

-- Remove those same addresses from the temp table

DELETE FROM #tempdata

WHERE ADDRESS_ID IN

(SELECT TOP(1000) ADDRESS_ID

FROM #tempdata

ORDER BY ADDRESS_ID);

You also don't need a MAXDOP hint on the delete, as far as I know a delete is always executed single-threaded. (And with the single-row deletes in your version, or the 1000-row deletes in my example, you'd get a serial plan even if my memory is incorrect).

This will allow you to drop the identity column, and subsequently the ORDER BY in the query that you are actually having trouble with. However, as mentioned before, I do not expect this to matter much since that ordering is already used by the plan for all the merge joins.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Indianrock SSC-Insane Points: 20333 More actions · Answer 6

Indianrock

SSC-Insane

Points: 20333

February 14, 2016 at 7:50 am

#1858660

Thanks again Hugo. I will test all of those.

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 7

Thanks for posting the new plans. 20K in 10 minutes is defintely an improvement and I can imagine that you are happy with this. But I think this can go faster still.

If I see it correct, the plan you posted is for the code Orlando posted, without the modification I suggested to replace the TOP by using ranges of AddressID values. Correct?

For almost all the tables, I see a huge difference between estimated and actual rows. This can be due to seriously outdated statistics, and I would defintely suggest that you look into that - check how old they are, verify that auto_update_statistics is on and/or that you have statistics maintenance scheduled. But I think that in this case, the actual cause is something else. Especially since all tables that have to be read completely have a perfect estimate, and all tables that only have to be read until the 20,000th row is found have a serious under-estimate.

This plan is for a TOP(20000), so SQL Server attempts to estimate what it needs to do in order to find 20000 rows that meet the criteria (all the different NOT EXISTS conditions). For the main (ADDRESS) table, all the way on the top right, SQL Server estimates that it will have to process approximately 800,000 rows before it finds the 20,000th matching row; at that point it can stop working. In reality, as shown by the actual rowcount, it is processing almost 500 million rows. Apparently, you have a huge amount of rows in the table (or at least at the "start" of the table when accessing in Address_ID order) that fail one of the NOT EXISTS tests. My version of the code that is based on the AddressID ranges would still process those rows, but only in one of the iterations. Yours will process them again and again in every iteration.

I also noticed that the order in which SQL Server tests the various NOT EXISTS clauses matches the order of these clauses in your query exacatly. This can be because based on the information the optimizer has this is deemed the most efficient. Or it can be because the optimizer has no idea and just leaves the order untouched. However, looking at the actual rowcounts I see that the biggest reduction is made based on the last three tables checked: Legal_Entity, Service_Request, and LienHilder. If you move those three NOT EXISTS clauses to the start of the query, you might gain even more performance (but test it, I can not be sure). And that same performance gain might well apply to both the TOP version of the query as well as the AddressID range version.

I understand the importance and relevance of QA, but a delay of weeks is long. Especially for something as (relatively) simple as adding an index. Perhaps if you can tune this further in QA to get a really impressive speed gain, you can then convince management to push it through faster? (Of course, I don't know how much your databases and your company are hurt by currently not being able to purge - if it doesn't hurt sufficiently, then you will have a harder time convincing management).

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Indianrock SSC-Insane Points: 20333 More actions · Answer 8

Well I certainly hope sql server central.com doesn't lose any of this as it will take a while to go through. 🙂 We do have extensive index maintenance including update statistics with full scan run for a few hours every night and longer on Saturdays. I'll get back on this later today

creation complete temp table in prod, old query form with no TOP, no new indexes and no order by 18 minutes in prod

-- Production create top 1 temp table old query form with no order by 73 seconds ( new indexes on Early payoff do not exist )

new query form:

top 20k in 10 minutes in prod without new indexes including order by

10 minutes for all 5 million in prod without new indexes including order by

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 9

Great analysis Hugo. I am learning by watching your thought process on this one. Thanks.

Hugo Kornelis (2/14/2016)
Here are my thougts, based on the query, the execution plan you posted, and the comments. I have not looked at the CREATE TABLE / CREATE INDEX script; it is too long to analyze within the maximum time I can justify spending on unpaid work.
First: The plan shows that all tables you used are read completely. All except two are read using an index on Address_ID, that matches both the ORDER BY order and the join column used for all joins, which enables SQL Server to use the relatively cheap Merge Join operator. The only exceptions are two of the three Early_Payoff_Info tables. Apparently there is an index on Address_ID2 in this table, but not on Address_ID and Address_ID1, so here SQL Server has to read the clustered index and sort the data, It's just two sorts of just 727K rows each, so it will not have a very big impact, but you probably will gain performance by indexing those two columns. (Orlando posted this already, allthough he also includes an index on the Address_ID2 column - that one is not needed, you already have that index).

I missed its existence because while the table definition for EARLY_PAYOFF_INFO and other of its index definitions, the one in question on Address_ID2 (index_Early_Payoff_Info_Address_Id2) was not provided. I now see the reference to the index in the plan as an index scan, but with little cost. I suspect SQL would rather scan the narrow index than the clustered so even if adding the other two on AddressID still result in scan they should be better (relative) scans.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 10

Here is another query form I worked up that should allow you to better guide the QO into filtering data in table-specific-order rather than giving it a flat WHERE-clause and leaving the QO to its own devices. I am showing that by changing the order in which the tables are checked across the ctes the estimates change such that the top operations change table-order however this could be an anecdotal effect. Without your data (let alone your statistics) I'll have to wait to see if you can check it out and see if it results in a material improvement on your system. Note I added dbo to all tables as well, in general you should start doing that everywhere.

Note that I wrote the query so you can easily swap entire lines containing a WHERE-clause between ctes, just watch out for column-name diffs like PREV_LIENHOLDER_MAIL_TO_ADDRESS_ID.

with cte1 as (SELECT TOP (20000) parent.ADDRESS_ID

FROM dbo.ADDRESS parent

WHERE NOT EXISTS (SELECT * FROM dbo.EARLY_PAYOFF_INFO child WHERE child.ADDRESS_ID = parent.ADDRESS_ID)),

cte2 as (SELECT parent.ADDRESS_ID

FROM cte1 parent

WHERE NOT EXISTS (SELECT * FROM dbo.EARLY_PAYOFF_INFO child WHERE child.ADDRESS_ID1 = parent.ADDRESS_ID)),

cte3 as (SELECT parent.ADDRESS_ID

FROM cte2 parent

WHERE NOT EXISTS (SELECT * FROM dbo.EARLY_PAYOFF_INFO child WHERE child.ADDRESS_ID2 = parent.ADDRESS_ID)),

cte4 as (SELECT parent.ADDRESS_ID

FROM cte3 parent

WHERE NOT EXISTS (SELECT * FROM dbo.CHECK_REQUEST_DEFAULT_ADDRESS child WHERE child.ADDRESS_ID = parent.ADDRESS_ID)),

cte5 as (SELECT parent.ADDRESS_ID

FROM cte4 parent

WHERE NOT EXISTS (SELECT * FROM dbo.DIRECT_LENDING_SERVICE child WHERE child.PREV_LIENHOLDER_MAIL_TO_ADDRESS_ID = parent.ADDRESS_ID)),

cte6 as (SELECT parent.ADDRESS_ID

FROM cte5 parent

WHERE NOT EXISTS (SELECT * FROM dbo.PACKET child WHERE child.ADDRESS_ID = parent.ADDRESS_ID)),

cte7 as (SELECT parent.ADDRESS_ID

FROM cte6 parent

WHERE NOT EXISTS (SELECT * FROM dbo.WORK_QUEUE_ITEM child WHERE child.ADDRESS_ID = parent.ADDRESS_ID)),

cte8 as (SELECT parent.ADDRESS_ID

FROM cte7 parent

WHERE NOT EXISTS (SELECT * FROM dbo.LEGAL_ENTITY child WHERE child.ADDRESS_ID = parent.ADDRESS_ID)),

cte9 as (SELECT parent.ADDRESS_ID

FROM cte8 parent

WHERE NOT EXISTS (SELECT * FROM dbo.SERVICE_REQUEST child WHERE child.ADDRESS_ID = parent.ADDRESS_ID)),

cte10 as (SELECT parent.ADDRESS_ID

FROM cte9 parent

WHERE NOT EXISTS (SELECT * FROM dbo.LIENHOLDER child WHERE child.ADDRESS_ID = parent.ADDRESS_ID))

SELECT *

FROM cte10;

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 11

Indianrock (2/14/2016)
Well I certainly hope sql server central.com doesn't lose any of this as it will take a while to go through. 🙂 We do have extensive index maintenance including update statistics with full scan run for a few hours every night and longer on Saturdays. I'll get back on this later today
creation complete temp table in prod, old query form with no TOP, no new indexes and no order by 18 minutes in prod
-- Production create top 1 temp table old query form with no order by 73 seconds ( new indexes on Early payoff do not exist )
new query form:
top 20k in 10 minutes in prod without new indexes including order by
10 minutes for all 5 million in prod without new indexes including order by

if populating a temp table with all address IDs that are confirmed for deletion then the ORDER BY is absolutely wasted overhead

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato

Sergiy SSC Guru Points: 110208 More actions · Answer 12

Deleting of addresses from database does not look like a bright idea.

There should not be such task in the first place.

It only makes sense if an address cannot be reused ever after the entity it was used for has been deleted.

Which means - each new entry into any correlated table inserts a new record into Address table without checking if such address has been already recorded.

And it must be done in a "free text" form, where no formatting or validation applied to the address inserted into the database.

If that's the case than it's better to have addresses recorded in the correlated tables without wasting resources on doing joins to Address table.

Such fake normalisation is much worse than not having normalisation at all.

I don't know how big is you influence on the development process, but I'd suggest to direct your efforts on changing the data entry process to make it look up for existing address and reusing its ID for new entries, rather than continuously chasing the tails created by the bad design.

_____________
Code for TallyGenerator

Indianrock SSC-Insane Points: 20333 More actions · Answer 13

The application and database were developed by a .net development team ten years ago. A re-write project is about to start. We are an out-source destination for banks and credit unions who want someone else to handle much of their business after a car loan is made. I suspect the addresses are a mix of automobile dealerships and car buyers.

I'm just trying to fix some purge jobs that weren't performing well.

ADDRESS

Records Can be deleted

497,032,3284,489,351

Indianrock SSC-Insane Points: 20333 More actions · Answer 14

Latest test in QA ( somewhat slow systems ). This uses the original criteria to find "delete-able" records, simplified temp table with just the single address_id column, top 100k records and no order by in creating temp table and Hugo's modified delete statements.

Temp table took 50 minutes ( which I could live with, but will test in proeduction ) -- after the deletes went pretty rapidly. Plan attached.

-- Delete first 1,000 addresses

DELETE FROM cmsperformance3.dbo.ADDRESS

WHERE ADDRESS_ID IN ( SELECT TOP ( 1000 )

ADDRESS_ID

FROM #tempdata

ORDER BY ADDRESS_ID );

SET @rowcount = @@rowcount;

SET @counter = @counter - @rowcount;

-- Remove those same addresses from the temp table

DELETE FROM #tempdata

WHERE ADDRESS_ID IN ( SELECT TOP ( 1000 )

ADDRESS_ID

FROM #tempdata

ORDER BY ADDRESS_ID );

Starting at Feb 14 2016 1:30PM

temp table for Address has 100000 at Feb 14 2016 2:19PM

Counter down to 99000 and deleted this loop=1000 at Feb 14 2016 2:19PM

Counter down to 98000 and deleted this loop=1000 at Feb 14 2016 2:19PM

Counter down to 97000 and deleted this loop=1000 at Feb 14 2016 2:20PM

Counter down to 96000 and deleted this loop=1000 at Feb 14 2016 2:20PM

Counter down to 95000 and deleted this loop=1000 at Feb 14 2016 2:20PM

Counter down to 94000 and deleted this loop=1000 at Feb 14 2016 2:20PM

Counter down to 93000 and deleted this loop=1000 at Feb 14 2016 2:20PM

Counter down to 92000 and deleted this loop=1000 at Feb 14 2016 2:20PM

Counter down to 91000 and deleted this loop=1000 at Feb 14 2016 2:20PM

Counter down to 90000 and deleted this loop=1000 at Feb 14 2016 2:20PM

Counter down to 89000 and deleted this loop=1000 at Feb 14 2016 2:20PM

Counter down to 88000 and deleted this loop=1000 at Feb 14 2016 2:20PM

Counter down to 87000 and deleted this loop=1000 at Feb 14 2016 2:20PM

Counter down to 86000 and deleted this loop=1000 at Feb 14 2016 2:21PM

Counter down to 85000 and deleted this loop=1000 at Feb 14 2016 2:21PM

Counter down to 84000 and deleted this loop=1000 at Feb 14 2016 2:21PM

Counter down to 83000 and deleted this loop=1000 at Feb 14 2016 2:21PM

Counter down to 82000 and deleted this loop=1000 at Feb 14 2016 2:21PM

Counter down to 81000 and deleted this loop=1000 at Feb 14 2016 2:21PM

Counter down to 80000 and deleted this loop=1000 at Feb 14 2016 2:21PM

Counter down to 79000 and deleted this loop=1000 at Feb 14 2016 2:21PM

Counter down to 78000 and deleted this loop=1000 at Feb 14 2016 2:21PM

Counter down to 77000 and deleted this loop=1000 at Feb 14 2016 2:21PM

Counter down to 76000 and deleted this loop=1000 at Feb 14 2016 2:21PM

Counter down to 75000 and deleted this loop=1000 at Feb 14 2016 2:21PM

Counter down to 74000 and deleted this loop=1000 at Feb 14 2016 2:22PM

Counter down to 73000 and deleted this loop=1000 at Feb 14 2016 2:22PM

Counter down to 72000 and deleted this loop=1000 at Feb 14 2016 2:22PM

Counter down to 71000 and deleted this loop=1000 at Feb 14 2016 2:22PM

Counter down to 70000 and deleted this loop=1000 at Feb 14 2016 2:22PM

Counter down to 69000 and deleted this loop=1000 at Feb 14 2016 2:22PM

Counter down to 68000 and deleted this loop=1000 at Feb 14 2016 2:22PM

Counter down to 67000 and deleted this loop=1000 at Feb 14 2016 2:22PM

Counter down to 66000 and deleted this loop=1000 at Feb 14 2016 2:22PM

Counter down to 65000 and deleted this loop=1000 at Feb 14 2016 2:22PM

Counter down to 64000 and deleted this loop=1000 at Feb 14 2016 2:22PM

Counter down to 63000 and deleted this loop=1000 at Feb 14 2016 2:22PM

Counter down to 62000 and deleted this loop=1000 at Feb 14 2016 2:23PM

Counter down to 61000 and deleted this loop=1000 at Feb 14 2016 2:23PM

Counter down to 60000 and deleted this loop=1000 at Feb 14 2016 2:23PM

Counter down to 59000 and deleted this loop=1000 at Feb 14 2016 2:23PM

Counter down to 58000 and deleted this loop=1000 at Feb 14 2016 2:23PM

Counter down to 57000 and deleted this loop=1000 at Feb 14 2016 2:23PM

Counter down to 56000 and deleted this loop=1000 at Feb 14 2016 2:23PM

Counter down to 55000 and deleted this loop=1000 at Feb 14 2016 2:23PM

Counter down to 54000 and deleted this loop=1000 at Feb 14 2016 2:23PM

Counter down to 53000 and deleted this loop=1000 at Feb 14 2016 2:23PM

Counter down to 52000 and deleted this loop=1000 at Feb 14 2016 2:23PM

Counter down to 51000 and deleted this loop=1000 at Feb 14 2016 2:23PM

Counter down to 50000 and deleted this loop=1000 at Feb 14 2016 2:23PM

Counter down to 49000 and deleted this loop=1000 at Feb 14 2016 2:24PM

Counter down to 48000 and deleted this loop=1000 at Feb 14 2016 2:24PM

Counter down to 47000 and deleted this loop=1000 at Feb 14 2016 2:24PM

Counter down to 46000 and deleted this loop=1000 at Feb 14 2016 2:24PM

Counter down to 45000 and deleted this loop=1000 at Feb 14 2016 2:24PM

Counter down to 44000 and deleted this loop=1000 at Feb 14 2016 2:24PM

Counter down to 43000 and deleted this loop=1000 at Feb 14 2016 2:24PM

Counter down to 42000 and deleted this loop=1000 at Feb 14 2016 2:24PM

Counter down to 41000 and deleted this loop=1000 at Feb 14 2016 2:24PM

Counter down to 40000 and deleted this loop=1000 at Feb 14 2016 2:24PM

Counter down to 39000 and deleted this loop=1000 at Feb 14 2016 2:24PM

Counter down to 38000 and deleted this loop=1000 at Feb 14 2016 2:24PM

Counter down to 37000 and deleted this loop=1000 at Feb 14 2016 2:24PM

Counter down to 36000 and deleted this loop=1000 at Feb 14 2016 2:25PM

Counter down to 35000 and deleted this loop=1000 at Feb 14 2016 2:25PM

Counter down to 34000 and deleted this loop=1000 at Feb 14 2016 2:25PM

Counter down to 33000 and deleted this loop=1000 at Feb 14 2016 2:25PM

Counter down to 32000 and deleted this loop=1000 at Feb 14 2016 2:25PM

Counter down to 31000 and deleted this loop=1000 at Feb 14 2016 2:25PM

Counter down to 30000 and deleted this loop=1000 at Feb 14 2016 2:25PM

Counter down to 29000 and deleted this loop=1000 at Feb 14 2016 2:25PM

Counter down to 28000 and deleted this loop=1000 at Feb 14 2016 2:25PM

Counter down to 27000 and deleted this loop=1000 at Feb 14 2016 2:25PM

Counter down to 26000 and deleted this loop=1000 at Feb 14 2016 2:25PM

Counter down to 25000 and deleted this loop=1000 at Feb 14 2016 2:25PM

Counter down to 24000 and deleted this loop=1000 at Feb 14 2016 2:25PM

Counter down to 23000 and deleted this loop=1000 at Feb 14 2016 2:26PM

Counter down to 22000 and deleted this loop=1000 at Feb 14 2016 2:26PM

Counter down to 21000 and deleted this loop=1000 at Feb 14 2016 2:26PM

Counter down to 20000 and deleted this loop=1000 at Feb 14 2016 2:26PM

Counter down to 19000 and deleted this loop=1000 at Feb 14 2016 2:26PM

Counter down to 18000 and deleted this loop=1000 at Feb 14 2016 2:26PM

Counter down to 17000 and deleted this loop=1000 at Feb 14 2016 2:26PM

Counter down to 16000 and deleted this loop=1000 at Feb 14 2016 2:26PM

Counter down to 15000 and deleted this loop=1000 at Feb 14 2016 2:26PM

Counter down to 14000 and deleted this loop=1000 at Feb 14 2016 2:26PM

Counter down to 13000 and deleted this loop=1000 at Feb 14 2016 2:26PM

Counter down to 12000 and deleted this loop=1000 at Feb 14 2016 2:26PM

Counter down to 11000 and deleted this loop=1000 at Feb 14 2016 2:26PM

Counter down to 10000 and deleted this loop=1000 at Feb 14 2016 2:27PM

Counter down to 9000 and deleted this loop=1000 at Feb 14 2016 2:27PM

Counter down to 8000 and deleted this loop=1000 at Feb 14 2016 2:27PM

Counter down to 7000 and deleted this loop=1000 at Feb 14 2016 2:27PM

Counter down to 6000 and deleted this loop=1000 at Feb 14 2016 2:27PM

Counter down to 5000 and deleted this loop=1000 at Feb 14 2016 2:27PM

Counter down to 4000 and deleted this loop=1000 at Feb 14 2016 2:27PM

Counter down to 3000 and deleted this loop=1000 at Feb 14 2016 2:27PM

Counter down to 2000 and deleted this loop=1000 at Feb 14 2016 2:27PM

Counter down to 1000 and deleted this loop=1000 at Feb 14 2016 2:27PM

Counter down to 0 and deleted this loop=1000 at Feb 14 2016 2:27PM

Counter down to 0 and deleted this loop=0 at Feb 14 2016 2:27PM

Sergiy SSC Guru Points: 110208 More actions · Answer 15

Indianrock (2/14/2016)
A re-write project is about to start.

I believe it's a perfect opportunity to address the issue.

I'm just trying to fix some purge jobs that weren't performing well.
ADDRESS
Records Can be deleted
497,032,3284,489,351

In this case the best solution would be simply to stop the job.

Removing 1% of the records as an ultimate goal does not make much sense. Achieving it won't make any difference at all.

The purging would make more damage (in terms of resource consuming, locking, etc.) than any benefits it can add.

Half a billion addresses - how big are the car dealerships and credit unions you work for?

How many countries do they cover?

I guess if you analyse the data stored in the Address table, apply some formatting standards, eliminate duplicates - the total number of remaining records would be hardly over a million.

Before they start attacking me - do not alter data in the Prod table. Copy unique properly formatted address records to a newly created table.

Must to say - without changing the way the Application records addresses it can be only a "proof of concept" exercise. But if you've got it right it will help you to get rid of number of useless, time consuming maintenance jobs.

_____________
Code for TallyGenerator