Delete takes a long time

  • CREATE TABLE [dbo].[t_Transactions](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [AccountID] [int] NULL,

    [TransactionID] [varchar](255) NULL,

    [TransactionIDAlt] [nvarchar](50) NULL,

    [TransactionIDSpecific] [varchar](255) NULL,

    [Hash] varbinary(20) NULL,

    [Narrative] [nvarchar](510) NULL,

    [ActivityProfile] [int] NULL,

    [Amount] [money] NULL,

    [TransactionFee] [decimal](10, 2) NULL,

    [SaturationCount] [int] NULL,

    CONSTRAINT [PK_t_Transactions] PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    CREATE NONCLUSTERED INDEX [NCI_t_TransactionsHash] ON [dbo].[t_Transactions]

    (

    [Hash] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [NCI_t_TransactionsTransactionID] ON [dbo].[t_Transactions]

    (

    [TransactionID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IX_t_TransactionsTransactionIDSpecific] ON [dbo].[t_Transactions]

    (

    [AccountID] ASC,

    [TransactionIDAlt] ASC,

    [TransactionID] ASC

    )

    INCLUDE ( [TransactionIDSpecific],

    [ActivityProfile],

    [Amount]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    CREATE NONCLUSTERED INDEX [IX_t_TransactionsTransactionID] ON [dbo].[t_Transactions]

    (

    [AccountID] ASC,

    [TransactionID] ASC

    )

    INCLUDE ( [Amount]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    CREATE PROCEDURE [dbo].DeleteAccount

    @AccountID INT

    AS

    DELETE

    FROM dbo.t_Transactions

    WHERE AccountID = @AccountID

    [p]In this table we keep about 7 million rows for approximately 600 accounts. Some accounts have 10 or less transactions, some have half a million transactions. For all accounts, several times a day we refresh an account by deleting all transactions for an account and bulk insert fresh transactions from a csv. In the majority of cases, the transaction id's and much of the other information will be identical in value to what was deleted and the approximately the same amount of rows can be expected. The delete takes a disproportionate amount of time, increasingly so with each index we added. Cardinality on AccountID + TransactionID is very high. What could be done to increase the processing speed? Ideally I would like to have the effect of a truncate command but I cannot erase the accounts which I am not ready to bulk insert at that particular moment. Any ideas? Thanks![/p]

  • You could bulk insert into a staging table and insert the missing transactions. If needed, you could update the existing transactions.

    If it's not used, I would change the clustered index to something more meaningful and used, such as the IX_t_TransactionsTransactionID. Of course, this should be tested correctly.

    Some indexes seem redundant. Do you really use them all?

    EDIT:

    Instead of using these 3 indexes:

    - NCI_t_TransactionsTransactionID

    - IX_t_TransactionsTransactionIDSpecific

    - IX_t_TransactionsTransactionID

    The following should cover all 3:

    CREATE NONCLUSTERED INDEX [IX_t_TransactionsTransactionIDSpecific] ON [dbo].[t_Transactions]

    (

    [TransactionID] ASC,

    [AccountID] ASC,

    [TransactionIDAlt] ASC

    )

    INCLUDE ( [TransactionIDSpecific],

    [ActivityProfile],

    [Amount]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    Or not. Test before implementing any suggestion.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • You could bulk insert into a staging table and insert the missing transactions.

    Sorry wasn't clear, I already bulk insert from csv into a staging table, then I do delete on this table and copy all from staging to this table.

    If needed, you could update the existing transactions.

    That would not help either. I would have to do one join on the update for the transactions found, a delete for transactions not found followed by an insert of those not found. This sounds like it would be a lot worse than it is now.

    If it's not used, I would change the clustered index to something more meaningful and used, such as the IX_t_TransactionsTransactionID. Of course, this should be tested correctly.

    It is used

    Some indexes seem redundant. Do you really use them all?

    EDIT:

    Instead of using these 3 indexes:

    - NCI_t_TransactionsTransactionID

    - IX_t_TransactionsTransactionIDSpecific

    - IX_t_TransactionsTransactionID

    The following should cover all 3:

    1) Where is my hash index?

    2) I need to be able to access efficiently via accountID "and" transactionID etc., the order of the columns vary in the indexes and are important. I need these 4 indexes as they are coded.

  • That would not help either. I would have to do one join on the update for the transactions found, a delete for transactions not found followed by an insert of those not found. This sounds like it would be a lot worse than it is now.

    It may sound 'a lot worse' (I presume that you are referring to process and code complexity here?) but it's probably also a lot faster, because database reads are so much faster than writes.

    This can be done in one hit using MERGE, by the way.

    If you find that the deletions specifically are the main problem, there is a possible alternative approach. Instead of physically deleting the rows, add an IsDeleted bit column to your table and set this to 1 for rows which are 'deleted'. This is known as a 'soft delete'. Obviously, you'll need to modify your queries to take account of the change afterwards.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Is TransactionID independent from AccountID? Or do you need to include AccountID to uniquely identify a transaction?

    How common is the use of the ID column?

    Do you have Foreign keys referencing this table?

    Do transactions get deleted between loads? Do they get modified?

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Is TransactionID independent from AccountID? Or do you need to include AccountID to uniquely identify a transaction?

    AccountID is needed to uniquely identify a transaction.

    How common is the use of the ID column?

    ID is used as clustered index to avoid page splits on inserts. Otherwise is it not used at all.

    Do you have Foreign keys referencing this table?

    I do not have any Foreign Keys. I do have queries that join to this table by columns as specified in one of the indexes.

    Do transactions get deleted between loads? Do they get modified?

    No and no.

    Please note, that when I drop the non-clustered indexes, performance is very good.

  • That's why my first suggestion was to insert only the missing transactions. You reduce your I/O especially with the accounts with lots of transactions.

    IF OBJECT_ID('Staging_Transactions') > 0

    DROP TABLE [dbo].[Staging_Transactions];

    CREATE TABLE [dbo].[Staging_Transactions](

    [AccountID] [int] ,

    [TransactionID] [varchar](255) ,

    [TransactionIDAlt] [nvarchar](50) NULL,

    [TransactionIDSpecific] [varchar](255) NULL,

    [Hash] varbinary(20) NULL,

    [Narrative] [nvarchar](510) NULL,

    [ActivityProfile] [int] NULL,

    [Amount] [money] NULL,

    [TransactionFee] [decimal](10, 2) NULL,

    [SaturationCount] [int] NULL

    )

    CREATE CLUSTERED INDEX [CI_Staging_Transactions] ON [dbo].[Staging_Transactions]

    (

    [AccountID] ASC,

    [TransactionID] ASC

    )

    --BULK INSERT [dbo].[Staging_Transactions]

    SELECT

    [AccountID],

    [TransactionID],

    [TransactionIDAlt],

    [TransactionIDSpecific],

    [Hash],

    [Narrative],

    [ActivityProfile],

    [Amount],

    [TransactionFee],

    [SaturationCount]

    FROM [Staging_Transactions] st

    WHERE NOT EXISTS( SELECT *

    FROM t_Transactions t

    WHERE st.AccountID = t.AccountID

    AND st.TransactionID = t.TransactionID)

    I'm sure you can reduce indexes as they are the ones slowing inserts and deletes, but I don't have the full picture. I'd also say that ID is not a great clustered index if it's only preventing page splits.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply