October 19, 2021 at 8:02 pm
I believe that the fast forward cursor of SQL Server is a great tool to use if, and only if, multiple actions need to be taken on each record returned.
Opening Pandora's box on this one. I can't remember the last time I used one. Almost all code that needs one can be resolved with a set-based solution, but there are the outliers.
For better, quicker answers, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
October 19, 2021 at 8:19 pm
I indicated that the need for a cursor is rare but you need to be willing to use them. If you need to update 6 tables based on the data in a 7th table, SQL Server does not have a multi-table update statement. How would you do this with 1 set based statement?
October 20, 2021 at 4:26 am
I would not fixate on sticking to one set based statement as I would to transactions. Each loop in a cursor can be thought of as a transaction . Where as I could update multiple tables with parallel transactions from one source (no exclusive lock needed).
Such a use case would be updating multiple test environments to be in sync.
----------------------------------------------------
October 20, 2021 at 9:24 am
@hmbacon: you could either do 6 Updates each joining the base table or - if the select of the base table is slow / difficult - write the ids, that needs to be updated into a #tmp and join the #tmp
When you have to work with 100 mio rows you could do it with a set based statement, which runs 3 hours (bad when it fails for whatever reasons) or break it up into smaller chunks of maybe 500k rows. Of course it would cripple your system when you would write a cursor that perfoms single row actions or with only 10 rows in this case, but the "smoother" running of a good sized cursor vs. a gigantic monolith statement oftens makes it a viable solution.
God is real, unless declared integer.
October 20, 2021 at 10:06 am
Each loop in a cursor can be thought of as a transaction
..Can be thought of? Are you saying that if a loop in a cursor contains multiple actions, and any one of those actions fails, that all of the other actions are rolled back? The following code suggests not:
DROP TABLE IF EXISTS #TranTest;
CREATE TABLE #TranTest
(
n INT NOT NULL
);
INSERT #TranTest
(
n
)
VALUES
(1 )
,(2)
,(3);
DECLARE @n INT;
DECLARE test_cursor CURSOR STATIC FORWARD_ONLY READ_ONLY LOCAL FOR
SELECT tt.n
FROM #TranTest tt;
OPEN test_cursor;
BEGIN TRY
FETCH NEXT FROM test_cursor
INTO @n;
WHILE (@@FETCH_STATUS = 0)
BEGIN
--BEGIN TRAN;
INSERT #TranTest
(
n
)
VALUES
(@n + 10);
IF @n = 2
PRINT 1 / 0;
--COMMIT;
FETCH NEXT FROM test_cursor
INTO @n;
END;
CLOSE test_cursor;
DEALLOCATE test_cursor;
END TRY
BEGIN CATCH
IF XACT_STATE() <> 0
ROLLBACK;
SELECT *
FROM #TranTest tt;
CLOSE test_cursor;
DEALLOCATE test_cursor;
END CATCH;
If you uncomment the BEGIN TRAN and COMMIT rows and run this, the value '12' does not appear in the final result set.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
October 22, 2021 at 5:24 am
THank you for that piece of information. However my comment centered specifially on the post about inserting into multiple tables from one source , and how that can be done in parrallel (each its own transaction) . This is similar, though not exact , do using a cursor to achieve the same outcome. Hence the absrtaction "can be thought of" in this context.
----------------------------------------------------
August 30, 2024 at 7:18 am
I still like this solution when I need to delete a lot of data,
https://michaeljswart.com/2014/09/take-care-when-scripting-batches/
August 30, 2024 at 3:18 pm
If I knew I needed a regular purge job based on record creation datetime then I would be looking at whether that datetime column should be a candidate for the clustered index.
One of the things I miss having moved to other DB platforms is that SQL Server allows a clustered index to be something other than the primary key.
If the clustered index is on any form of sequential column then there is the possibility of WHERE column BETWEEN. The downside to that is if the sequential column values rotate.
At a previous employer we used partition switching to do massive deletes.
As Jeff mentioned earlier, its worth working out whether it would be more efficient to copy the records you need, switch tables and truncate the original.
Back in the days before partition switching we used an approach that had physical tables aligned to date/time segments. There was a view over the top of the tables that did a UNION ALL so the month-end job created a new table, renamed the previous tables and did an sp_refreshview. Sounds horribly clunky now but it was a reliable process.
August 30, 2024 at 3:28 pm
At a previous employer we used partition switching to do massive deletes.
when you already have partitions, you could simply do a
TRUNCATE TABLE dbo.big_stuff WITH (PARTITION(1 TO $partition.pf_big_stuff_function(date_key)));
Or instead of using 1 TO you could just provide a single partition, that you either get by the $partition-syntax or e.g. from a CURSOR which is joining sys.partition_functions and sys.partition_range_values (boundary_value is the partition number).
If there are some lines at this day that must not be deleted e.g. for legal reasons you could insert them into a #tmp and write them back after the TRUNCATE
God is real, unless declared integer.
August 30, 2024 at 5:19 pm
Your reply reminded me of a problem I ran into long ago. Although it was Oracle, my thoughts were similar to your questions. It was a big table that became unmanageable with time. I remember the whole IT department getting a classy PTSD when they needed to delete a large amount of data from it, and the only time window in a year for the deletion turned out to be almost too short. It would be a disaster if a bank suspends the cash operations at the opening.
How could it happen in a bank to use a wrongly designed database system? Did anyone ask the proper questions at the start of the database design? Didn’t anyone expect multimillion-rows in a table at a large bank?
When I read this article, I thought the same way you did. OK, here are x million rows; we must delete 30m from them. I understand this is an imaginary situation; they don’t need to care about why it happened or the other circumstances. But you can’t answer “How to Delete Large Amounts of Data” without those answers. I got the critique a few times: You spent too long analysing the problem. Maybe they were right, but I’m sure I won’t start it with a simple DELETE, even not with 5000 or 10,000 rows at once.
August 30, 2024 at 7:16 pm
We had the same problem today.
I wrote a script that does the following:
The script cleaned up 36 million records in a fraction of a second.
September 1, 2024 at 11:23 am
This is a very relevant article and far too often code is written in such a way that Log space is consumed as a result of attempting to delete too much data in one action. Knowing the "sweet spot" in how many rows of data can be easily deleted quickly is crucial. This will differ based on the construction of the table: number of columns, data types, amount of data per row, etc...
Here is a simple code I use for handling such deletes.
Declare @CT INT -- variable used to compute number of records to be deleted
Set @CT = (select count(ID) from table where CreateDate < GetDate()-60)
While @CT > 0
Begin
Delete top 4000 from Table where CreateDate < Getdate() -60
End
In this scenario, 4000 is the optimal rows of data to be deleted in a single pass. This simple loop will quickly and easily cycle through the data to be deleted without impacting the LOG file. Each cycle is fully committed so in case of a ROLLBACK, only 4000 records would need to be recovered via ROLLBACK.
Yes, I fully understand there are other ways to code this, and that is what makes SQL such a great tool. The goal is to produce the end result in an efficient manner with minimal impact. I hope this simple tip helps you become a better at your job.
September 1, 2024 at 11:58 am
Sorry, but this script isn't really good in my opionion.
God is real, unless declared integer.
September 1, 2024 at 12:18 pm
Thank you for your feedback Thomas, but to clarify, my script was based on the example used in the original article. At that time, the originating author showed "4000" to be the optimal number of records to delete in his example. I was simply offering a very valid solution based on his example. As I stated, there are numerous ways to achieve the same result, some will just have less impact (resource wise, performance wise, etc) than others.
September 3, 2024 at 2:21 pm
when you already have partitions, you could simply do a
TRUNCATE TABLE dbo.big_stuff WITH (PARTITION(1 TO $partition.pf_big_stuff_function(date_key)));
I leverage partitions to truncate data on large tables as well, and I love your implementation that dynamically gets the set of partition IDs to be truncate using the partition function.
"Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho
Viewing 15 posts - 16 through 30 (of 32 total)
You must be logged in to reply to this topic. Login to reply