Temporal Tables - Delete Conflict

  • Well, I'm trying to figure out a way around another massive hole in a SQL Server feature.  I have temporal tables, yay!, now I can actually query my data and have it be temporally accurate....well, sort of, as long as you have a use case where it can actually work.

    I have a sales person, John, who is no longer with the company.  Great, we have this temporal table thingy, so we can simply delete John.  Now when we run all of our queries against sales people for everything that's current, John is not there, as it should be.  We can also run these same queries backwards in time using the FOR SYSTEM_TIME clause and John only appears during the time frames he was actually employed at the company.  Great!......sort of.......

    With a temporal table, when a DELETE is issued against current, we aren't actually saying "remove the row".  Instead, we are actually saying "end date this row such that it is not valid after this point in time".  There is a HUGE difference in these two and SQL Server absolutely fails in this regard.  How? In order to end date a row if you are using temporal tables, you have to issue a DELETE against the table in order to move it from current to history.  BUT, you can't delete John's employee record unless you also delete ALL children.  You did apply best practices and actually created foreign keys in your database right?  Foreign keys are enforced, so before you delete John, you must also delete all children.

    Well, John was an employee and we did pay him, several times.  While John has employee and other stuff tied to him such as his personal details, manager, sales region, etc., he also has stuff like customer orders, commissions, pay statements, etc.  Actually issuing a DELETE against the employee table to move John to history and end date his record means that we have to issue a DELETE against everything that is a child of John.  While it might be valid to delete(end date) his personal details and reassign his customers and their orders to someone else, it is absolutely not valid to delete (end date) commission payments, pay roll information, or the general ledger entries that go along with them.  The commission payments, payroll information, general ledger entries, etc. are CURRENT data and should NOT  be end dated.

    So, now you have a major issue and so far I've only found 4 options top get around it:
    1. Quit using temporal tables and go back to the standard data warehouse structures where we maintain all row versions in a single table, end dating as appropriate, and in order to accurately query it, you need to write extremely nasty queries which are limited to a small group of people with the training to construct them correctly.
    2. Do a logical delete instead of a physical delete and then move all of the heavy lifting for temporally querying to the applications and reporting tools.
    3. Remove all foreign keys and enforce all referential integrity via code.
    4. Shred the database structure by moving EVERY relationship out to cross reference tables so that you always have to go through a cross reference table to join between two business entities.  For example, instead of putting the EmployeeID from the Employee table in the PayCheck table, you would create a cross reference table that stored the relationship between an Employee and a PayCheck.

    1. Been there, done that.  It works, but has serious drawbacks in the level of knowledge required and the limited group of people who can actually work with a data warehouse.
    2. Really no different that option 1.
    3. It works, but you now lose all RI enforcement and you lose the entire table relationship map that most applications can consume to build queries.
    4. It works, but introduces incredible data structure complexity.
    5. Well, there is a 5th option that involves essentially inverting your entire database where children are turned into parents, but that would require fundamentally changing the way every application works as well as the way people think that it isn't even worth considering - picture trying to explain why children have to exist before their parents can be created......

    This would have been SOOOOOOOOOOO much easier if the database engine simply recognized what is EFFECTIVELY being done with a DELETE against a temporal table and considered the "temporal table" (current PLUS history) as ONE table in terms of RI instead of the history part of temporal tables being treated like a completely separate table for FK checks.  A SELECT statement doesn't make a distinction and allows you to simply query the "current" table and see data from history if you are using the FOR SYSTEM_TIME clause.

    Michael Hotek

  • Mike Hotek - Friday, March 30, 2018 2:04 PM

    Well, I'm trying to figure out a way around another massive hole in a SQL Server feature.  I have temporal tables, yay!, now I can actually query my data and have it be temporally accurate....well, sort of, as long as you have a use case where it can actually work.

    I have a sales person, John, who is no longer with the company.  Great, we have this temporal table thingy, so we can simply delete John.  Now when we run all of our queries against sales people for everything that's current, John is not there, as it should be.  We can also run these same queries backwards in time using the FOR SYSTEM_TIME clause and John only appears during the time frames he was actually employed at the company.  Great!......sort of.......

    With a temporal table, when a DELETE is issued against current, we aren't actually saying "remove the row".  Instead, we are actually saying "end date this row such that it is not valid after this point in time".  There is a HUGE difference in these two and SQL Server absolutely fails in this regard.  How? In order to end date a row if you are using temporal tables, you have to issue a DELETE against the table in order to move it from current to history.  BUT, you can't delete John's employee record unless you also delete ALL children.  You did apply best practices and actually created foreign keys in your database right?  Foreign keys are enforced, so before you delete John, you must also delete all children.

    Well, John was an employee and we did pay him, several times.  While John has employee and other stuff tied to him such as his personal details, manager, sales region, etc., he also has stuff like customer orders, commissions, pay statements, etc.  Actually issuing a DELETE against the employee table to move John to history and end date his record means that we have to issue a DELETE against everything that is a child of John.  While it might be valid to delete(end date) his personal details and reassign his customers and their orders to someone else, it is absolutely not valid to delete (end date) commission payments, pay roll information, or the general ledger entries that go along with them.  The commission payments, payroll information, general ledger entries, etc. are CURRENT data and should NOT  be end dated.

    So, now you have a major issue and so far I've only found 4 options top get around it:
    1. Quit using temporal tables and go back to the standard data warehouse structures where we maintain all row versions in a single table, end dating as appropriate, and in order to accurately query it, you need to write extremely nasty queries which are limited to a small group of people with the training to construct them correctly.
    2. Do a logical delete instead of a physical delete and then move all of the heavy lifting for temporally querying to the applications and reporting tools.
    3. Remove all foreign keys and enforce all referential integrity via code.
    4. Shred the database structure by moving EVERY relationship out to cross reference tables so that you always have to go through a cross reference table to join between two business entities.  For example, instead of putting the EmployeeID from the Employee table in the PayCheck table, you would create a cross reference table that stored the relationship between an Employee and a PayCheck.

    1. Been there, done that.  It works, but has serious drawbacks in the level of knowledge required and the limited group of people who can actually work with a data warehouse.
    2. Really no different that option 1.
    3. It works, but you now lose all RI enforcement and you lose the entire table relationship map that most applications can consume to build queries.
    4. It works, but introduces incredible data structure complexity.
    5. Well, there is a 5th option that involves essentially inverting your entire database where children are turned into parents, but that would require fundamentally changing the way every application works as well as the way people think that it isn't even worth considering - picture trying to explain why children have to exist before their parents can be created......

    This would have been SOOOOOOOOOOO much easier if the database engine simply recognized what is EFFECTIVELY being done with a DELETE against a temporal table and considered the "temporal table" (current PLUS history) as ONE table in terms of RI instead of the history part of temporal tables being treated like a completely separate table for FK checks.  A SELECT statement doesn't make a distinction and allows you to simply query the "current" table and see data from history if you are using the FOR SYSTEM_TIME clause.

    IMHO, you shouldn't delete John from anything.  He's a part of the "data of record".  I don't know your data or your queries but there should be an "enddate" on John's row in whatever you consider to be the "primary" table for employees and your code should be written to accommodate that.  While I have no great love for the way MS implemented temporal tables, this one (again, IMHO) is not the fault of the MS implementation of temporal tables.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Saturday, March 31, 2018 8:36 AM

    Mike Hotek - Friday, March 30, 2018 2:04 PM

    Well, I'm trying to figure out a way around another massive hole in a SQL Server feature.  I have temporal tables, yay!, now I can actually query my data and have it be temporally accurate....well, sort of, as long as you have a use case where it can actually work.

    I have a sales person, John, who is no longer with the company.  Great, we have this temporal table thingy, so we can simply delete John.  Now when we run all of our queries against sales people for everything that's current, John is not there, as it should be.  We can also run these same queries backwards in time using the FOR SYSTEM_TIME clause and John only appears during the time frames he was actually employed at the company.  Great!......sort of.......

    With a temporal table, when a DELETE is issued against current, we aren't actually saying "remove the row".  Instead, we are actually saying "end date this row such that it is not valid after this point in time".  There is a HUGE difference in these two and SQL Server absolutely fails in this regard.  How? In order to end date a row if you are using temporal tables, you have to issue a DELETE against the table in order to move it from current to history.  BUT, you can't delete John's employee record unless you also delete ALL children.  You did apply best practices and actually created foreign keys in your database right?  Foreign keys are enforced, so before you delete John, you must also delete all children.

    Well, John was an employee and we did pay him, several times.  While John has employee and other stuff tied to him such as his personal details, manager, sales region, etc., he also has stuff like customer orders, commissions, pay statements, etc.  Actually issuing a DELETE against the employee table to move John to history and end date his record means that we have to issue a DELETE against everything that is a child of John.  While it might be valid to delete(end date) his personal details and reassign his customers and their orders to someone else, it is absolutely not valid to delete (end date) commission payments, pay roll information, or the general ledger entries that go along with them.  The commission payments, payroll information, general ledger entries, etc. are CURRENT data and should NOT  be end dated.

    So, now you have a major issue and so far I've only found 4 options top get around it:
    1. Quit using temporal tables and go back to the standard data warehouse structures where we maintain all row versions in a single table, end dating as appropriate, and in order to accurately query it, you need to write extremely nasty queries which are limited to a small group of people with the training to construct them correctly.
    2. Do a logical delete instead of a physical delete and then move all of the heavy lifting for temporally querying to the applications and reporting tools.
    3. Remove all foreign keys and enforce all referential integrity via code.
    4. Shred the database structure by moving EVERY relationship out to cross reference tables so that you always have to go through a cross reference table to join between two business entities.  For example, instead of putting the EmployeeID from the Employee table in the PayCheck table, you would create a cross reference table that stored the relationship between an Employee and a PayCheck.

    1. Been there, done that.  It works, but has serious drawbacks in the level of knowledge required and the limited group of people who can actually work with a data warehouse.
    2. Really no different that option 1.
    3. It works, but you now lose all RI enforcement and you lose the entire table relationship map that most applications can consume to build queries.
    4. It works, but introduces incredible data structure complexity.
    5. Well, there is a 5th option that involves essentially inverting your entire database where children are turned into parents, but that would require fundamentally changing the way every application works as well as the way people think that it isn't even worth considering - picture trying to explain why children have to exist before their parents can be created......

    This would have been SOOOOOOOOOOO much easier if the database engine simply recognized what is EFFECTIVELY being done with a DELETE against a temporal table and considered the "temporal table" (current PLUS history) as ONE table in terms of RI instead of the history part of temporal tables being treated like a completely separate table for FK checks.  A SELECT statement doesn't make a distinction and allows you to simply query the "current" table and see data from history if you are using the FOR SYSTEM_TIME clause.

    IMHO, you shouldn't delete John from anything.  He's a part of the "data of record".  I don't know your data or your queries but there should be an "enddate" on John's row in whatever you consider to be the "primary" table for employees and your code should be written to accommodate that.  While I have no great love for the way MS implemented temporal tables, this one (again, IMHO) is not the fault of the MS implementation of temporal tables.

    I second Jeff on this one, there should be a status property of each entry indicating the status of the entry, the entries should not be deleted. The normal way of handling this would be using an SCD pattern or a temporal table.
    😎

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply