Am I stupid? Version History

  • I can't get started with the SQL required for this. Could someone point me in the correct direction?

    I have a table which stores project issues. Each issue is referenced by a unique integer. Every time a user wants to supply a progress update the application creates a new row in the table, copies the details of the issue into the fields including the progres update and saves it. This way in order to see the full version history of issue 2 as it gets updated and closed, you can use some simple SQL such as:

    SELECT * FROM IssuesTable WHERE IssueNumber = 2 ORDER BY CreatedDate DESC

    Obviously the one at the top of the list is the most recent one and therefore the active record for that issue and the rest of the records return are the old versions of the issue.

    But... the client wants to be able to see a list of issues as of a certain date in the past to answer questions as such "what were the open issues as of July 1st?"

    What SQL can I write which will pick whichever record was active at the time? I know I can exclude all rows with a create date of more than July 1st but that still leaves me with multiple records per issue. I just want one record returned per issue and it has to be the one which was active at the time.

    Should I be useing recursion or SELECT statements within SELECT statements somehow? I don't know where to start. 🙁

  • So am I understandin this correctly in saying you need to return all issues that where open on the 1st of July, so all issues where the data is smaller than the 2nd? And you only want one record per ission. If you care getting multiple fields, that are the same accross the records, you can specify the Distinct keyword e.g.:

    SELECT DISTINCT IssueNumber, IssueState, IssueDescription FROM IssuesTable

    WHERE CreatedDate < '2008/07/02'

    GROUP BY IssueNumber

    ORDER BY CreateDate DESC

    This will work if the the columns that you return are the same across all rows per Issue. You might have to add the other Columns, like IssueState and IssueDescription to the Group by clause, depending on your data. i.e. you add the columns that you group the data by.

    I hope this helps, please let me know 🙂

  • I think you will end up with a correlated sub-query any way you handle this. Here are a couple of options:

    [font="Courier New"]SELECT * FROM IssuesTable I

    WHERE I.CreatedDate <= '2008/07/02'

    AND NOT EXISTS (SELECT * FROM IssuesTable I2

    WHERE I2.IssueNumber = I.IssueNumber

    AND I2.CreatedDate <= '2008/07/02'

    AND I2.CreatedDate > I.CreatedDate)

    SELECT * FROM IssuesTable I

    WHERE I.CreatedDate = (SELECT MAX(I2.CreatedDate) FROM IssuesTable I2

    WHERE I2.IssueNumber = I.IssueNumber

    AND I2.CreatedDate <= '2008/07/02')[/font]

  • We have very similar requirements. We do it like this:

    SELECT...

    FROM MyTable m

    WHERE m.VersionId = (SELECT TOP(1) m2.VersionId

    FROM MyTable m2

    WHERE m2.Date < @somedate

    ORDER BY m2.VersionId DESC)

    When you get into a JOIN or CROSS APPLY it looks like this:

    SELECT...

    FROM MainTable m

    JOIN OtherTable o

    ON m.Id = o.Id

    AND m.VersionId <= (SELECT TOP(1) o2.VersionId

    FROM OtherTable o2

    WHERE o2.Id = m.Id

    AND o2.Date < @somedate

    ORDER BY o2.Id,o2.versionId DESC)

    ...

    This would get all versions prior to the date requested for the given ID of the main table. After a lot of experimentation, we found that TOP generally (but not always) performs better than MAX. For some tests of this, look here.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • If you just want the one row of one entity that was in effect as of the date/time specified, the query is quite simple:

    select top 1 *

    from table

    where Key-field = @KeyValue

    and EffectiveDate <= @AsOfDate

    order by Key-field asc, EffectiveDate desc;

    It gets interesting when you want to see the version of all entities that was in effect on that date.

    select t1.*

    from table t1

    where t1.EffectiveDate = (

    select Max( t2.EffectiveDate )

    from table t2

    where t2.Key-field = t1.Key-field

    and t2.EffectiveDate <= @AsOfDate

    );

    The @AsOfDate parameter is the date of interest so you get back the data for the entities that were in effect on that date. Make this into a view and you can then filter any way you wish.

    For the view, the select is the same except the key test is removed and the @AsOfDate is replaced with GetDate().

    The key to efficiency is that the primary key of the table is a composite key defined as the field named Key-field in the code above (what would normally be the primary key by itself) and the EffectiveDate field. However, in defining the key, Key-field is defined ascending (the default) and EffectiveDate is defined descending. Like so:

    create table TableName(

    key-field somedatatype,

    EffectiveDate datetime,

    .,

    .,

    .,

    constraint PK_TableName primary key clustered (

    key-field asc,

    EffectiveDate desc

    )

    );

    I have done a lot of testing in this area and the queries above are based on what I have found to be the most efficient in most uses. I have tested this on large tables (2ook+ rows) and find the performance indistinguishable (using the eye-blink timing method) from just selecting any random row from the same size table.

    To take this a step further, you will have noticed that defining a foreign key in another table which refers to your versioned table is, well, impossible to maintain across versions.

    I have developed a comprehensive versioning pattern which allows the previous versions of an entity to exist in the same table as the current version (much as you have done) yet allows for normal, system-enforced referential integrity checking (i.e foreign keys that refer to the versioned data that are unaffected by the versioning). Views hide the versioning details and allow for normal CRUD operations against the entities. I was willing to sacrifice some performance for the advantages of versioning and normal RI, so I was pleasantly surprised to discover that, so far, such sacrifices have been unnoticeable to the naked eye. The details are much too involved to put into a post reply like this. I really should finish writing up a paper on it because I feel it would probably be of some general interest.

    Until then, anyone interested in reading what is probably the most in-depth treatment of versioning (though incomplete) can read the series by Tom Johnson and Randall Weis for DM Review starting at http://www.dmreview.com/issues/20070501/1082475-1.html. Be prepared, however. They are currently on part 29 of something like a 60-part series. They state it will be another year and a half to complete it at 2 parts each month. The authors are also PHDs so the reading can get a bit tedious at times for an implementation of versioning that, imnsho, is not going to turn out to be nearly as usable as my own. Where do I go to get a gig like that? :Wow:

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply