November 2, 2010 at 8:25 pm
Comments posted to this topic are about the item Writing Nearly Codeless Apps: Part 5
November 3, 2010 at 12:21 pm
This is a very interesting area that I've also been working on for 20+ years. I've found that using a standard POJO makes high level productivity for medium to high complexity applications not possible. Here is an example of actual code less development. 8 minute overview video:
http://www.planetjavainc.com/videos/WOW_WAS/WOW_WAS.html (takes a minute to load with camtasia). -Paul
November 3, 2010 at 6:26 pm
pholm (11/3/2010)
This is a very interesting area that I've also been working on for 20+ years. I've found that using a standard POJO makes high level productivity for medium to high complexity applications not possible. Here is an example of actual code less development. 8 minute overview video:http://www.planetjavainc.com/videos/WOW_WAS/WOW_WAS.html (takes a minute to load with camtasia). -Paul
That is pretty cool. WOW kind of takes over where RAP leaves off. RAP is about building the database and the data layer, then it leaves you to code the upper levels by hand (although it provides strict templates for these). WOW presumes the existence of a database already. It strikes me as being similar to the MS Access form generation tools, with a wizard built in (proabaly MS Access has such a wizard too; I haven't really used it in years).
November 4, 2010 at 3:01 am
Hi David,
I'm surprised to see so little comments yet. But who knows, maybe I'm not the only one with a reading backlog.
In the beginning of your article, you write that RAP rewards developers for creating their foreign keys by generating lookup routines. But in the rest of the article, you only describe routines generated upon primary keys and unique constraints. So technicallly, the developer can still get away with not defining the foreign key, as long as the PK and UQ constraints are defined.
More important is that the performance of your generated code on larger tables will be miserable. This is because you use multi-statement table-valued functions to retrieve rows. These are bad news for the optimizer - it has no other option but to execute them first, store the results, then combine these (unindexed) result sets for thte final result. Ouch!
You should convert to inline table-valued functions. These, like views, are treated as macros - the reference to the function is first replaced by its definition, and only then will the optimizer be invoked to work out an execution plan. You'll see a major performance gain!
As an example:
create function UFTBadmUser##PK
(
-- declare the 'As Of' parameter
@AsOf datetime,
-- declare other parameters
@userid bigint
)
returns table
as
return (select UserId,
LoginName,
Notes,
AuditDate,
AuditUserId,
AuditStatus
from TBadmUser
where UserId = @userid
and @AsOf is null
union all
select UserId,
LoginName,
Notes,
AuditDate,
AuditUserId,
AuditStatus
from TBadmUser#
where UserId = @userid
and AuditStatus <> 'D'
and A.AuditDate = (select max(AuditDate) from TBadmUser#
where UserId = A.UserId
and AuditDate <= @AsOf)
and @AsOf is not null
);
Another possible performance optimization is to create seperate function for querying live and archive data. That reduces complexity in the functions (making it easier for the optimizer to come up with an efficient plan). In my experience, it is quite uncommon to use the same queries on both live and archived data. (And since you generate the code, you could also go the whole way and simply generate functions for only live, for only archive, and for the combination, with a note to the developer to only use the latter when neeed, because the other perform better).
November 4, 2010 at 8:02 am
Hugo Kornelis (11/4/2010)
Hi David,... But in the rest of the article, you only describe routines generated upon primary keys and unique constraints. So technicallly, the developer can still get away with not defining the foreign key, as long as the PK and UQ constraints are defined.
...
You should convert to inline table-valued functions. These, like views, are treated as macros - the reference to the function is first replaced by its definition, and only then will the optimizer be invoked to work out an execution plan. You'll see a major performance gain!
Hugo: Thanks so much for your posts, especially this one. First, RAP does generate functions and procedures for every foreign key; the particular table I chose for illustration purposes just doesn't happen to contain any foreign business keys - I chose it for the sake of brevity. If you look at the ExampleCRM tables that have foreign keys that are defined for business purposes (not just the status-field foreign key) you will see the function and procedure definitions.
Secondly, I like your code here. It never occurred to me to use a union. Are you sure that the union would execute faster than the "if" statement and the two simpler separate queries in RAP's generated code? I guess you are, so I am definitely going to check this out.
November 4, 2010 at 9:03 am
David Ziffer (11/4/2010)
If you look at the ExampleCRM tables that have foreign keys that are defined for business purposes (not just the status-field foreign key) you will see the function and procedure definitions.
I have only read the articels; I did not download the full code. *IF* I can find some extra time, I'll check it out. Thanks for pointing out the obvious to me.
Are you sure that the union would execute faster than the "if" statement and the two simpler separate queries in RAP's generated code?
When used by itself? No, probably not.
But a query that references an inline table-valued function almost always runs rings around a query that uses a multi-statement table-valued function, for the reason I explained in my previous post. If you build the Person and PersonName tables you use in the last example in your article and populate them with a few thousand rows of sample data, you should see the difference. Bump the tables up to a few million rows, and it becomes painfully obvious.
November 4, 2010 at 10:56 am
from TBadmUser#
where UserId = @userid
and AuditStatus <> 'D'
and A.AuditDate = (select max(AuditDate) from TBadmUser#
where UserId = A.UserId
and AuditDate <= @AsOf)
and @AsOf is not null
);
Several problems. 1, what is @AsOf is a future date? Your function returns nothing from live or archive table. 2, the select max audit date subquery does not specify "and AuditStatus <> 'D'", so it can return a date that will be excluded by the select statement. The first one is hypothetical, but the second one is a failure type error.
3, why select the max(AuditDate) in the subquery instead of selecting the primary key of the archive table, that seems like a better choice since then the top level select is selecting where PK = known value.
November 4, 2010 at 7:46 pm
wbrianwhite (11/4/2010)
from TBadmUser#
where UserId = @userid
and AuditStatus <> 'D'
and A.AuditDate = (select max(AuditDate) from TBadmUser#
where UserId = A.UserId
and AuditDate <= @AsOf)
and @AsOf is not null
);
Several problems. 1, what is @AsOf is a future date? Your function returns nothing from live or archive table. 2, the select max audit date subquery does not specify "and AuditStatus <> 'D'", so it can return a date that will be excluded by the select statement. The first one is hypothetical, but the second one is a failure type error.
3, why select the max(AuditDate) in the subquery instead of selecting the primary key of the archive table, that seems like a better choice since then the top level select is selecting where PK = known value.
You could also find the archive row with just a left outer self join instead of a correlated subquery. Something like this (haven't tested it):
from TBadmUser# A with(nolock)
left join TBadmUser# B with(nolock)
on A.UserId = B.UserId
and A.UserId = @userid
and A.AuditStatus <> 'D'
and B.AuditStatus <> 'D'
and B.AuditDate < A.AuditDate
where A.AuditDate <= @AsOf
and B.UserId is null
I've seen left outer self joins perform much better than correlated subqueries that use aggregated functions. Though they're a bit harder to read at first, once you're familiar with the pattern they're fine.
November 4, 2010 at 8:13 pm
wbrianwhite (11/4/2010)
wbrianwhite (11/4/2010)
Several problems. 1, what is @AsOf is a future date? Your function returns nothing from live or archive table. 2, the select max audit date subquery does not specify "and AuditStatus <> 'D'", so it can return a date that will be excluded by the select statement. The first one is hypothetical, but the second one is a failure type error.3, why select the max(AuditDate) in the subquery instead of selecting the primary key of the archive table, that seems like a better choice since then the top level select is selecting where PK = known value.
You could also find the archive row with just a left outer self join instead of a correlated subquery. Something like this (haven't tested it) ... I've seen left outer self joins perform much better than correlated subqueries that use aggregated functions. Though they're a bit harder to read at first, once you're familiar with the pattern they're fine.
Brian: Several answers:
1. It doesn't make much sense to specify a future audit date, so I would not sacrifice any performance to generate code to specially handle that circumstance. But in any event this code can't possibly be sensitive to whether the date you specify is in the future, because the data in the database has no idea what time it is when you run the query, and furthermore the query is not sensitive to the current time. If you think carefully about the algorithm (or just try it out in the RAP example) you will find that putting in a future date will guarantee that you get the latest version of everything, keeping in mind that the latest version of a deleted record is "no record". But you'll get the latest version expensively, because you'll be going through the more calculation-laden archive table query rather than the ultra-simple current-table query.
2. Either you're expecting a different result than I am or your analysis is flawed. The subquery correctly returns whatever record is the most recent one (meeting the search criterion, in this case matching the correct key) that precedes or equals the AuditDate. If that record happens to be a "D" record then the main query correctly returns no record (which is what we want, because the record has been deleted). If that record is not a "D" record then it returns the most recent version of the record preceding or equalling the date. It is not our intention to return some record no matter what; if a record has been deleted prior to or at the AsOfDate, then we don't want to see it at all.
3. The left-join logic for extracting the maximum date looks interesting and I must say that I never considered such a thing. However I think you'd need to reverse the sign so that we're looking for the "A" record where there is no GREATER "B" record (I believe your logic as published would find the minimum "A", not the maximum that we want). But I would have to test this carefully to see how the "where" condition interacts with the join. And we'd have to change the test for "D" because your query here would neglect to properly return "no record" if the AsOfDate follows the deletion time of the record. And I would be very suspicious of the claimed performance improvement ... it seems to me that the number of records this join would produce would rise exponentially with the number of archived records matching the search criterion.
Thanks for your inputs. I am always grateful when people take the time to analyze my stuff. It's just that in this case I am unable to agree with your assessments (except possibly that a left-join could be used in place of the subquery). I would be interested in your responses to my responses and/or your clarifications, in case there's something here that I'm just not getting.
November 4, 2010 at 8:19 pm
Hugo Kornelis (11/4/2010)
David Ziffer (11/4/2010)
... a query that references an inline table-valued function almost always runs rings around a query that uses a multi-statement table-valued function, for the reason I explained in my previous post. If you build the Person and PersonName tables you use in the last example in your article and populate them with a few thousand rows of sample data, you should see the difference. Bump the tables up to a few million rows, and it becomes painfully obvious.
Well the thing that makes me suspicious here is that your "union" code potentially always hits both tables, whereas the RAP-generated code can hit only the one table that it needs to hit. In the "union" code you presented there is the danger that the optimizer would not be smart enough to realize that in the event of the @AsOfDate being NULL, it doesn't need to even touch the archive table at all.
But I think I will try your suggested test soon to see if your assertion holds true.
November 5, 2010 at 11:10 am
David Ziffer (11/4/2010)
wbrianwhite (11/4/2010)
wbrianwhite (11/4/2010)
Several problems. 1, what is @AsOf is a future date? Your function returns nothing from live or archive table. 2, the select max audit date subquery does not specify "and AuditStatus <> 'D'", so it can return a date that will be excluded by the select statement. The first one is hypothetical, but the second one is a failure type error.3, why select the max(AuditDate) in the subquery instead of selecting the primary key of the archive table, that seems like a better choice since then the top level select is selecting where PK = known value.
You could also find the archive row with just a left outer self join instead of a correlated subquery. Something like this (haven't tested it) ... I've seen left outer self joins perform much better than correlated subqueries that use aggregated functions. Though they're a bit harder to read at first, once you're familiar with the pattern they're fine.
Brian: Several answers:
1. It doesn't make much sense to specify a future audit date, so I would not sacrifice any performance to generate code to specially handle that circumstance. But in any event this code can't possibly be sensitive to whether the date you specify is in the future, because the data in the database has no idea what time it is when you run the query, and furthermore the query is not sensitive to the current time. If you think carefully about the algorithm (or just try it out in the RAP example) you will find that putting in a future date will guarantee that you get the latest version of everything, keeping in mind that the latest version of a deleted record is "no record". But you'll get the latest version expensively, because you'll be going through the more calculation-laden archive table query rather than the ultra-simple current-table query.
2. Either you're expecting a different result than I am or your analysis is flawed. The subquery correctly returns whatever record is the most recent one (meeting the search criterion, in this case matching the correct key) that precedes or equals the AuditDate. If that record happens to be a "D" record then the main query correctly returns no record (which is what we want, because the record has been deleted). If that record is not a "D" record then it returns the most recent version of the record preceding or equalling the date. It is not our intention to return some record no matter what; if a record has been deleted prior to or at the AsOfDate, then we don't want to see it at all.
3. The left-join logic for extracting the maximum date looks interesting and I must say that I never considered such a thing. However I think you'd need to reverse the sign so that we're looking for the "A" record where there is no GREATER "B" record (I believe your logic as published would find the minimum "A", not the maximum that we want). But I would have to test this carefully to see how the "where" condition interacts with the join. And we'd have to change the test for "D" because your query here would neglect to properly return "no record" if the AsOfDate follows the deletion time of the record. And I would be very suspicious of the claimed performance improvement ... it seems to me that the number of records this join would produce would rise exponentially with the number of archived records matching the search criterion.
Thanks for your inputs. I am always grateful when people take the time to analyze my stuff. It's just that in this case I am unable to agree with your assessments (except possibly that a left-join could be used in place of the subquery). I would be interested in your responses to my responses and/or your clarifications, in case there's something here that I'm just not getting.
"because the data in the database has no idea what time it is when you run the query,"
GetDate()? You could select from your main table where @asOf is null or >= getDate(). To me, showing the main record would seem more likely to match users' expectations. If this function is a general purpose function, I would expect it would end up used in places you aren't anticipating. In our shop we use a history table, where the top row in the history table is the current data (trigger populates all data into the history table, even current data), so we can join against it for any time period, including the future. It avoids the issue of splitting queries between main and archive as well. What if someone wants to input that user Bob Smith will move from the R&D department to Implementation department in June of 2011? How could they join against your user data retrieval function when doing so? It seems like it would return 'no such user bob smith in june of 2011'. For us we have the same query to show user department changes for any time in the past or future. So that is definitely shaping my expectations here.
"If that record happens to be a "D" record then the main query correctly returns no record (which is what we want, because the record has been deleted). "
What if it's been un-deleted since then? The point of using soft-deletes in a status column is that they can be undone later, while physical deletes can't. If you stop reporting on deleted rows you might as well physically delete them and save the disk space and extra query overhead. You could also include a @IncludeDeleted parameter to allow them to be ignored/included as desired. At our shop someone could 'delete' a group. Then realize it was a mistake. Get the group list passing in @IncludeDeleted, and then undelete it.
"And I would be very suspicious of the claimed performance improvement ... it seems to me that the number of records this join would produce would rise exponentially with the number of archived records matching the search criterion."
Run some tests. I have. The only time I like to use correlated subqueries is when doing and "and exists" correlated subquery, because then sql server can exit as soon as it finds a single row. Regular correlated subqueries are expensive. SQL Server is built to handle set-based logic efficiently, and the left outer where b is null algorithm is exactly that kind of set based operation. You're right on the condition I put in though, it should be and B.AuditDate > A.AuditDate where ... and B.UserId is null. That will get the A row where there is no higher date row in that table that is still less than the input date. I use it on large data sets all the time with good results. You should just be able to do
select UserId,
LoginName,
Notes,
AuditDate,
AuditUserId,
AuditStatus
from TBadmUser#
where UserId = @userid
and AuditStatus <> 'D'
and A.AuditDate = (select max(AuditDate) from TBadmUser#
where UserId = A.UserId
and AuditDate <= @AsOf)
and @AsOf is not null
select case when A.AuditStatus when 'D' then null else A.UserId end [UserId],
case when A.AuditStatus when 'D' then null else A.LoginName end [LoginName],
case when A.AuditStatus when 'D' then null else A.Notes end [Notes],
case when A.AuditStatus when 'D' then null else A.AuditDate [AuditDate],
case when A.AuditStatus when 'D' then null else A.AuditUserId [AuditUserId],
case when A.AuditStatus when 'D' then null else A.AuditStatus [AuditStatus]
from TBadmUser# A
left join TBadmUser# B
on A.UserId = B.UserId
and A.UserId = @userid
and B.AuditDate > A.AuditDate
where A.AuditDate <= @AsOf
and B.UserId is null
And then include the actual execution plan, and see which select uses more than 50% of the query time. I'd be interested in your results.
November 5, 2010 at 4:05 pm
Er, my case statement is a little off. Too many whens :ermm:
November 5, 2010 at 10:06 pm
Brian: Your comments suggest to me that you are accustomed to an archiving model that is entirely different from RAP's archiving model, and that you are presuming that RAP works in the same way. I imagine that you are accustomed to the traditional model in which all versions of every record are stored in the same table as the most recent version, and in which queries for a given record must go looking for the most recent one based on the records' AuditDates, and that "deleted" records are merely marked for deletion rather than actually being deleted. In this model deleted records can be brought back to life by simply unmarking them. If this is your view of the RAP model, then I ask that you reread the article carefully and also reread the code sample carefully.
In the RAP model, each conceptual table is really two tables: a "current" table that stores only current records, and an "archive" table that stores a log of every version of every record. Queries for current data look only in the "current" table, where they will find only current records. Such queries never have to check for AuditDates because there is always exactly one version of every record in the "current" table, namely the current version. Queries for current data also never need to check for "deleted" flags because records in the "current" table are, by definition, not deleted.
Queries for current data never look in the "archive" table. The only purpose of the "archive" table is to store past versions of every record for the purpose of reconstructing snapshots of past data as of a given time (for example seeing a report or your application screens as they would have appeared six months ago). There is no need to incorporate any sense of the current time (i.e. the time at which the query is executed) because such data is timeless. There is no reason for a query into this table using a future AsOf date to return anything other than what it currently returns, which is the last recorded version of every record that hasn't been deleted yet.
While it would be possible to reconstruct a new record in the "current" table containing the same business key (and other data) as a previously deleted record, that reconstructed record would necessarily have a new surrogate key (since SQL Server properly would not allow us to reuse a surrogate key in the "current" table) and as such, from the perspective of the "archive" table, that reconstructed record would be a new record (since it has a new key) and not part of the old record's audit trail.
There is no way for RAP to "undelete" a record in the manner you describe because RAP does not delete records by marking them. Rather it deletes records by actually deleting them from the "current" table. The corresponding record with the "D" flag in the "archive" table is merely a historical marker indicating when the lifetime of the record ended. Since queries for current data never look in the "archive" table, and since snapshot queries (i.e. queries with a non-null AsOf date) intentionally never return deleted records, the data in a "D" record is never returned in a query, nor is it intended to be returned.
If you reread the article carefully, I think you will find that the RAP model for creating, updating, deleting, and querying data is radically different from what you are imagining. It is intentionally designed to be radically different, and I am hoping you will agree that this intentionally different design is radically better.
November 8, 2010 at 5:56 am
David Ziffer (11/5/2010)
Brian: Your comments suggest to me that you are accustomed to an archiving model that is entirely different from RAP's archiving model, and that you are presuming that RAP works in the same way. I imagine that you are accustomed to the traditional model in which all versions of every record are stored in the same table as the most recent version, and in which queries for a given record must go looking for the most recent one based on the records' AuditDates, and that "deleted" records are merely marked for deletion rather than actually being deleted. In this model deleted records can be brought back to life by simply unmarking them. If this is your view of the RAP model, then I ask that you reread the article carefully and also reread the code sample carefully.In the RAP model, each conceptual table is really two tables: a "current" table that stores only current records, and an "archive" table that stores a log of every version of every record. Queries for current data look only in the "current" table, where they will find only current records. Such queries never have to check for AuditDates because there is always exactly one version of every record in the "current" table, namely the current version. Queries for current data also never need to check for "deleted" flags because records in the "current" table are, by definition, not deleted.
Queries for current data never look in the "archive" table. The only purpose of the "archive" table is to store past versions of every record for the purpose of reconstructing snapshots of past data as of a given time (for example seeing a report or your application screens as they would have appeared six months ago). There is no need to incorporate any sense of the current time (i.e. the time at which the query is executed) because such data is timeless. There is no reason for a query into this table using a future AsOf date to return anything other than what it currently returns, which is the last recorded version of every record that hasn't been deleted yet.
While it would be possible to reconstruct a new record in the "current" table containing the same business key (and other data) as a previously deleted record, that reconstructed record would necessarily have a new surrogate key (since SQL Server properly would not allow us to reuse a surrogate key in the "current" table) and as such, from the perspective of the "archive" table, that reconstructed record would be a new record (since it has a new key) and not part of the old record's audit trail.
There is no way for RAP to "undelete" a record in the manner you describe because RAP does not delete records by marking them. Rather it deletes records by actually deleting them from the "current" table. The corresponding record with the "D" flag in the "archive" table is merely a historical marker indicating when the lifetime of the record ended. Since queries for current data never look in the "archive" table, and since snapshot queries (i.e. queries with a non-null AsOf date) intentionally never return deleted records, the data in a "D" record is never returned in a query, nor is it intended to be returned.
If you reread the article carefully, I think you will find that the RAP model for creating, updating, deleting, and querying data is radically different from what you are imagining. It is intentionally designed to be radically different, and I am hoping you will agree that this intentionally different design is radically better.
And yet your sample query did what? Look in the current table or the archive table 🙂
"Rather it deletes records by actually deleting them from the "current" table. The corresponding record with the "D" flag in the "archive" table is merely a historical marker indicating when the lifetime of the record ended".
This seems odd to me. Why not mark a status on the current row, and insert a D row in the archive table? You can treat it as deleted without deleting it.
I do not agree this approach has any advantages over a _hist table populated with triggers. It is reliant on all developers only using the 'approved' procs to manipulate data, a convention that will eventually fail, while triggers will not.
Did you try the performance test I suggested above?
November 8, 2010 at 9:23 am
I imagine that you are accustomed to the traditional model in which all versions of every record are stored in the same table as the most recent version, and in which queries for a given record must go looking for the most recent one
Also, no. I am imaging a user table and a user_hist table, where the hist table is populated by an insert/update and a delete trigger. Are you familiar with this pattern? If not it would explain some of your earlier comments. I have literally never heard of someone storing their history data in their main table before, the mind boggles at why someone would do that.
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply