May 24, 2021 at 11:12 pm
Jeffrey Williams wrote:Jeff Moden wrote:Gosh... I agree that it's been quite a while but it seems to me that SQL Server 6.5 and 7 worked just fine in this area and I've not had cause to need such functionality since then, but I wonder why they felt compelled to change it.
If SQL Server 6.5 and 7 worked that way - then it was incorrect. An insert or update statement is implemented as a single transaction that occurs at a specific time and all rows affected are affected as if the transaction occurred all at once, even if the actual insert or update were to take several hours to complete.
Opinion noted. I strongly disagree but noted. 😀 Like I said, it's been a long time since I've needed such a thing but it would be nice to have the choice. And, don't say it doesn't need to be done. You may not know of a reason to do it but someone else (like the OP) may have a perfectly valid reason to do it just as surely as you'd need RAND() to produce a different value for each row upon insert or, even better, you have a UDF (inline or otherwise) use the return of NEWID() or the return of the output of a stored procedure. While it may not usually be something that people would need to do and would sometimes have to be very careful about how they used such a thing, it is nice to have such options (all of the above, which I've used a lot for more than one reason).
It's like saying that overloading a variable in a SELECT statement is a bad thing... when I briefly used Oracle (only 3 years), that pithy little restriction caused me a huge amount of development pain. I did it all the time in SQL Server and still do because SQL Server didn't have the ability to do string aggregation. Even the FOR XML PATH hack had some major pitfalls that were easily avoided by variable overloading.
When you said "If SQL Server 6.5 and 7 worked that way - then it was incorrect.", the only thing that was incorrect there was you thinking so. You have to understand that, as always and mostly forever, "It Depends". 😉
Insrting a row is more than inserting a row.
there is an indexing update involved also. Bulk insert gives you a luxury of redoing index once per the whole set, but it means all records are in the same boat, waiting to be included into updated index at once.
There is also such thing as parallel execution. Which ruins the beautiful world of those developers who believe a clustered index guarantees the order of rows in a table. With 2 or more parallel insert processes none of them is finalised until all of them are stitched together.
Therefore, to me, it makes perfect sense to not distinguish insert timing for different rows in the same INSERT set. They all start together, and none of them is done until all of them are done.
In the end of the day, it's under one of those letters in the acronym ACID. Kinda required.
_____________
Code for TallyGenerator
May 25, 2021 at 12:49 am
I agree with that... but only 99.9% of the time., 😀 "It Depends".
--Jeff Moden
Change is inevitable... Change for the better is not.
May 25, 2021 at 1:48 am
To cover that 0.1% I can add triggers to the mix.
they must have table "inserted" available with all the records in it, the whole set. And it cannot be formed until all records have passed checks for data type conformity, referential integrity and other form of constraints.
BTW, in a trigger GETDATE() must return the time of the trigger start, which is insert transaction end. Gotta check it when on the computer.
_____________
Code for TallyGenerator
May 25, 2021 at 2:23 pm
Jeff Moden wrote:When you said "If SQL Server 6.5 and 7 worked that way - then it was incorrect.", the only thing that was incorrect there was you thinking so. You have to understand that, as always and mostly forever, "It Depends". 😉
Not really - again, an insert or update is considered to have been committed all at once. At the time of the insert or update - all rows affected by that insert were affected at the same time.
The implementation details of how those rows are affected is meaningless. How would you determine the column value of each row? The insert or update would have to be ordered by some mechanism to determine when that row was inserted or updated.
It seems you are looking for a way to identify when a row was either logically or physically inserted or updated. Logically - the row was inserted/updated when the statement was executed. Not when it completed - not during the process - but at the moment the statement was executed.
As for identifying when the row was physically inserted/updated - that isn't possible since we cannot determine when that actually occurs.
Things like IDENTITY, SEQUENCE, NEWID(), NEWSEQUENTIALID() - can create different values because they are not determined by when the insert/update occurred. They are determined by the logical ordering of the data in the set being inserted OR are defined as row identifier functions in SQL Server (and therefore were specifically built to generate a different value for each row in a result set).
They actually are NOT affected at the same time. They are inserted just as they are in any file system. Just because it's a relational database doesn't change the fact that, underneath, it's JUST a piece of software for a file system running on JUST a normal (non quantum) von Nuemann based computer with waits and interrupt handling.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 25, 2021 at 2:26 pm
To cover that 0.1% I can add triggers to the mix.
they must have table "inserted" available with all the records in it, the whole set. And it cannot be formed until all records have passed checks for data type conformity, referential integrity and other form of constraints.
BTW, in a trigger GETDATE() must return the time of the trigger start, which is insert transaction end. Gotta check it when on the computer.
Heh... good luck adding triggers for that 1%. If you can do that, the OP of this thread would be interested in your help.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 25, 2021 at 6:19 pm
They actually are NOT affected at the same time. They are inserted just as they are in any file system. Just because it's a relational database doesn't change the fact that, underneath, it's JUST a piece of software for a file system running on JUST a normal (non quantum) von Nuemann based computer with waits and interrupt handling.
Logically all rows are affected at the same time - that does not mean the physical implementation of the RDBMS does not take any time. Implementation details - like how the data is physically saved to disk, just don't make sense to expose as an attribute.
I would very much like to see how you would identify the value for each row in an insert statement. Let's say we insert 100 rows and the insert duration is 100ms. What row in the insert gets the first millisecond time - and which row gets the last millisecond?
What time is going to be used - when the page is updated in memory, the time when the log is updated, when the log is hardened - or when the page is written to disk? And which time do you really need?
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
May 25, 2021 at 6:54 pm
Jeff Moden wrote:They actually are NOT affected at the same time. They are inserted just as they are in any file system. Just because it's a relational database doesn't change the fact that, underneath, it's JUST a piece of software for a file system running on JUST a normal (non quantum) von Nuemann based computer with waits and interrupt handling.
Logically all rows are affected at the same time - that does not mean the physical implementation of the RDBMS does not take any time. Implementation details - like how the data is physically saved to disk, just don't make sense to expose as an attribute.
I would very much like to see how you would identify the value for each row in an insert statement. Let's say we insert 100 rows and the insert duration is 100ms. What row in the insert gets the first millisecond time - and which row gets the last millisecond?
What time is going to be used - when the page is updated in memory, the time when the log is updated, when the log is hardened - or when the page is written to disk? And which time do you really need?
Like I said, you're preaching to the choir here for 99.9% of the cases.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply