December 12, 2012 at 5:33 pm
I'm having one of those moments where I'm sure I'm just over thinking the problem, but none the less I have hit a wall.
We have a view that lists all ItemID's, and says if they are in one status or another. In truth there are around 10 statuses that the item could be in, but we only care about if it is currently Status = 'Allowed' or not. There are additional statues like 'Not Allowed' that may remove the item from the 'Allowed' status even after it has been in that status before, but there are also additional statues such as 'Marked For Removal' or 'Removal Complete'.
Our items table presents no issues. The ItemsStatusLog table also stores all the data without an issue. Getting the current status however is what is getting me.
The way the view was written by the last DBA was via a union statement such as:
Create View CurrentItemStatus
As
Select ItemID
,'Allowed' As CurrentStatus
From dbo.Items
Where ItemID In
(Select ItemID
From dbo.ItemStatusLog
Where Status = 'Allowed')
Union
Select ItemID
,'Not Allowed' As CurrentStatus
From dbo.Items
Where ItemID Not In
(Select ItemID
From dbo.ItemStatusLog
Where Status = 'Allowed')
So far no one has really seen the logic bomb, but when I brought it up they said it needs to be changed to reflect the most recent status. And that is where I'm getting stuck. I can get all ItemIDs, and select the Max(StatusDate), but there are issues with just using that:
- Multiple statuses can all share the exact same StatusDate to the second. This includes being marked 'Allowed' and 'Not Allowed' in the same time frame.
- If status is marked 'Allowed', then later marked 'Not Allowed', but the Max(StatusDate) status is set to 'Checking On Item', then there is no real clear value for my view.
I am thinking of presenting the idea that whatever the Max(Row_Number()) is for the same date statuses should be the one we go with to eliminate the multiple values on the same day, but I am open to other suggestions (since I know that the Row_Number() value means next to nothing due to the backend SQL engine functioning). Optimally I need this to get the ID, and then find the Max(StatusDate) where Status In ('Allowed', 'Not Allowed'), and set the value in my view accordingly. I am flexible on this being a view vs. scheduling a job to run twice a day as these values usually don't change that often. I won't bother with including all the various attempts that I've made as none are correct, and I don't want to confuse the issue any further than I already have.
The closest that I've come to it is something like update all ItemID's that are included in a sub-select of Status = 'Allowed', then update those back to 'Not Allowed' where StatusDate > the 'Allowed' status date. Also update to 'Not Allowed' where ItemID in sub-select of Status = 'Not Allowed'...uggg, I can't even really explain my logic any more without hitting a logic circle that I can't find my way out of.
Anyone have any ideas?
[font="Arial"]“Any fool can know. The point is to understand.”
- Albert Einstein
"DOH!"
- Homer Simpson[/font]
December 12, 2012 at 7:20 pm
It would help us if you could post the DDL for the table(s) involved, sample data for the tables, and expected results based on the sample data.
December 14, 2012 at 10:24 am
Sorry for the delay, I was working on another project yesterday.
I've attached all the live data (I've removed all the items that can't be released to the public). This is in the form of a SQL script to build and populate the 3 tables, and the current way the view is created.
I was trying to keep the question as simple as I could in my original post. Even though the rmPhotos table does not impact my question one way or the other I figured I'd err on adding too much information. On the rmPhotos table, the field photoSourceID is a foreign key to rvlProperties.rvlPropertyID. Aside from that this should be fairly straight forward on the naming conventions.
[font="Arial"]“Any fool can know. The point is to understand.”
- Albert Einstein
"DOH!"
- Homer Simpson[/font]
December 14, 2012 at 11:32 am
I haven't looked at your attached file yet, but unless you're storing the datetime value for when the status was updated in something other than an actual datetime field, down to the second can be the same, but the actual values for datetime fields can store considerably greater accuracy, so perhaps the real issue is that the value supplied to populate the field is only accurate to the second? If that's the case, no amount of querying is going to give you an accurate answer, so you're going to have to either get a more accurate time source to populate that field (e.g. using GETDATE() as part of the INSERT or UPDATE query), or application logic is somehow going to have to know the difference because of other data that might be handy (but probably isn't). Unfortunately, that won't solve the problem until ALL the status records get at least one update of some kind to ensure the datetime value is sufficiently accurate for each item.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 14, 2012 at 11:45 am
Great point, and it is one that I've brought up to the team. They say we'll look at that when we get the budget, but that may be months off.
For the time being they seem to like the idea of using a Max(Row_Number()) approach. There are very few records this actually will impact, so they are willing to take the hit on accuracy. The worst that will happen is someone shows up to a property where access has been revoked, and will discuss it with the owner again. If they don't get new permissions then they'll update the system with the new record, and that one will be taken care of.
Even with that, I keep chasing the logic circle around. I'll be working on it for most of the rest of the afternoon, so if I find the answer then I'll be sure to post it.
[font="Arial"]“Any fool can know. The point is to understand.”
- Albert Einstein
"DOH!"
- Homer Simpson[/font]
December 14, 2012 at 12:33 pm
Budget ? Seriously? Somebody must not be thinking. Whatever application is populating the database needs what is likely a 5 second change and then either a redeploy or a recompile and redeploy, and if the whole process were to take more than an hour, I'd be floored. Even a regression test shouldn't take all that long, and having to move something through DEV. TEST, QA, UAT, and PRODUCTION might take a tad longer, but still... this is a seriously simple fix to any query that populates a datetime field.
Alternatively (and now that I think about it), you might be able to add a trigger and do an insert into a new table that would tie the primary key from the original table to a datetime field whose DEFAULT value is GETDATE(). That way, the TRIGGER only has to insert the primary key from the other table, and it keeps the overall cost of the trigger on iinserts and updates fairly small. You'd also have to trigger on DELETE to remove the appropriate records from the new table as well. In the overall scheme of things, might be an easy fix. It doesn't do much for the existing records, but if the volume of those is small, have somebody decide what the status is for each one affected, and manually update this new table to reflect the decisions made.
Your thoughts?
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 14, 2012 at 12:58 pm
I agree, and I like the idea of at least setting up the trigger. The issue that we run into is the application is built for a federal agency. All the hours have to be recorded within 6 minute intervals. Getting the government to approve the spending is a slow process, and because of how our contract is written my company can't do work without explicitly billing.
The good news is I believe I've solved my issue. I have the logic worked out in my head, but I haven't tried it yet. It is ugly, but here's the concept:
Select rvlPropertyID
Max(StatusDate) As MaxDate
Into #TempDataHolder
From dbo.rvlPropertyAccess
Select TDH.rvlPropertyID
,TDH.MaxDate
,rPA.Status
,Row_Number() As RowNum
Into #TempDataHolder2
From #TempDataHolder TDH Inner Join dbo.rvlPropertyAccess rPA
On TDH.rvlPropertyID = rPA.rvlPropertyID
And TDH.MaxDate = rPA.StatusDate
Group By TDH.rvlPropertyID
,TDH.MaxDate
,rPA.Status
With CTE
(ID,RowNum)
As
(Select rvlPropertyID
,Max(RowNum)
From #TempDataHolder2)
Select TDH2.rvlPropertyID
,TDH2.StatusDate
,TDH2.Status
From #TempDataHolder2 TDH2 Inner Join CTE
On TDH2.rvlPropertyID = CTE.rvlPropertyID
And TDH2.RowNum = CTE.RowNum
That's the logic anyway. I'll see how it goes, and post the final results.
[font="Arial"]“Any fool can know. The point is to understand.”
- Albert Einstein
"DOH!"
- Homer Simpson[/font]
December 14, 2012 at 1:08 pm
It's only ugly to the extent that it might not produce consistent results for the records affected by lack of date/time accuracy, and to the extent that performance goes bad (for whatever reason). In other words, it might say that the status of an affected record is one thing today, and another thing toomorrow, without the status actually changing. If they can live with that, then go for it. Otherwise, plan on the trigger and a requirement to make the decision on the affected rows.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
December 17, 2012 at 3:44 pm
As promised I'm posting results. After getting the advice from sgmunson I revisited the original table instead of looking at the Excel version (which is where that whole mess of lack of accurate time lived...sorry about that), and the good news is for the view they had accurate DateTime values. There were no duplicates when reviewing the Max(Date) as I had originally feared. Here is the live code (because I was able to work it into a view still instead of a sproc fired from a job twice a day):
If Object_ID('dbo.GISPropertyAccessFlex') > 0
Drop View dbo.GISPropertyAccessFlex;
Go
----------
Create View dbo.GISPropertyAccessFlex
As
Select rProp.rvlPropertyID
,rProp.rmIncidentID
,rProp.propertyName
,rProp.rmVvlPropertyType
,Rtrim(Ltrim(IsNull
(Case Rtrim(Ltrim(StreetNumber))
When ''
Then Null
Else Rtrim(Ltrim(StreetNumber))
+ ' '
End,'')
+ IsNull
(Case Rtrim(Ltrim(StreetPrefix))
When ''
Then Null
When ' '
Then Null
Else Rtrim(Ltrim(StreetPrefix))
+ ' '
End,'')
+ IsNull
(Case StreetName
When ''
Then Null
Else Rtrim(Ltrim(StreetName))
+ ' '
End,PropertyDesc)
+ IsNull
(Case StreetSuffix
When ''
Then Null
Else StreetSuffix
+ ' '
End,'')
+ IsNull
(Case UnitNUmber
When ''
Then Null
Else UnitNUmber
+ ' '
End,''))) As Address
,rProp.cityName
,rProp.state
,rProp.county
,rProp.zipCode
,rProp.latitude
,rProp.longitude
,cP.PhotoCounter
,rProp.taxID
,Case
When PropNoRes.MaxDate = '1/1/1899'
Then Null
Else PropNoRes.MaxDate
End As LastAttemptDate
,PropRes.rmVvlAccessAttemptResult As CurrentStatus
,GEOMETRY::STGeomFromText('Point('
+ Cast(Longitude As Varchar(15))
+ ' '
+ Cast(Latitude As Varchar(15))
+ ')',4326) As geoData
From
(Select rvlPropertyID
,Max(IsNull(attemptDate,'1/1/1899')) As MaxDate
From dbo.rvlPropertyAccess
Group By rvlPropertyID) PropNoRes
Inner Join
(Select rvlPropertyID
,Max(IsNull(attemptDate,'1/1/1899')) As MaxDate
,rmVvlAccessAttemptResult
From dbo.rvlPropertyAccess
Group By rvlPropertyID
,rmVvlAccessAttemptResult) PropRes
On PropNoRes.rvlPropertyID = PropRes.rvlPropertyID
And PropNoRes.MaxDate = PropRes.MaxDate
Right Join dbo.rvlProperties rProp
On rProp.rvlPropertyID = PropNoRes.rvlPropertyID
Join
(Select rvlProperties.rvlPropertyID
,Count(rmPhotos.photoName) As PhotoCounter
From dbo.rmPhotos Right Outer Join dbo.rvlProperties
On rmPhotos.photoSourceID = rvlProperties.rvlPropertyID
Group By rvlProperties.rvlPropertyID) cP
On rProp.rvlPropertyID = cP.rvlPropertyID
Where rProp.deleted = 0
Or rProp.deleted = ''
Or rProp.deleted Is Null;
Go
----------------------------------------
Select *
From dbo.GISPropertyAccessFlex;
Go
Then we have the less than glorious news that the other view I was working on (with similar issues) had several thousand records where AssessmentDate Is Null. So I used my hack version as well, and in case someone ever wants to see how to do it (because it is a little more involved) here you go:
If Object_ID('dbo.GISPropertyAssessmentFlex') > 0
Drop View dbo.GISPropertyAssessmentFlex;
Go
----------
Create View dbo.GISPropertyAssessmentFlex
As
Select rProp.rvlPropertyID
,rProp.rmIncidentID
,rProp.propertyName
,rProp.rmVvlPropertyType
,Rtrim(Ltrim(IsNull
(Case Rtrim(Ltrim(StreetNumber))
When ''
Then Null
Else Rtrim(Ltrim(StreetNumber))
+ ' '
End,'')
+ IsNull
(Case Rtrim(Ltrim(StreetPrefix))
When ''
Then Null
When ' '
Then Null
Else Rtrim(Ltrim(StreetPrefix))
+ ' '
End,'')
+ IsNull
(Case StreetName
When ''
Then Null
Else Rtrim(Ltrim(StreetName))
+ ' '
End,PropertyDesc)
+ IsNull
(Case StreetSuffix
When ''
Then Null
Else StreetSuffix
+ ' '
End,'')
+ IsNull
(Case UnitNUmber
When ''
Then Null
Else UnitNUmber
+ ' '
End,''))) As Address
,rProp.cityName
,rProp.state
,rProp.county
,rProp.zipCode
,rProp.latitude
,rProp.longitude
,cP.PhotoCounter
,rProp.taxID
,Group2.AssessmentDate As LastAssessmentDate
,Group2.rmVvlStatus As CurrentStatus
,GEOMETRY::STGeomFromText('Point('
+ Cast(Longitude As Varchar(15))
+ ' '
+ Cast(Latitude As Varchar(15))
+ ')',4326) As geoData
From
(Select rvlPropertyID
,Max(RowNum) As MaxRow
From
(Select rvlPropertyID
,AssessmentDate
,rmVvlStatus
,Row_Number() Over(Partition By rvlPropertyID Order By AssessmentDate) As RowNum
From
(Select rPA1.rvlPropertyID As rvlPropertyID
,rPA1.MaxDate As AssessmentDate
,rPA2.rmVvlStatus As rmVvlStatus
From
(Select rvlPropertyID
,Max(IsNull(assessmentDate,'1/1/1899')) As MaxDate
From dbo.rvlPropAssessments
Group By rvlPropertyID) rPA1
Inner Join
(Select rvlPropertyID
,Max(IsNull(assessmentDate,'1/1/1899')) As MaxDate
,rmVvlStatus
From dbo.rvlPropAssessments
Group By rvlPropertyID
,rmVvlStatus) rPA2
On rPA1.rvlPropertyID = rPA2.rvlPropertyID
And rPA1.MaxDate = rPA2.MaxDate) rPA3) rPA4
Group By rvlPropertyID) Group1
Inner Join
(Select rvlPropertyID
,AssessmentDate
,rmVvlStatus
,Row_Number() Over(Partition By rvlPropertyID Order By AssessmentDate) As RowNum
From
(Select rPA1.rvlPropertyID As rvlPropertyID
,rPA1.MaxDate As AssessmentDate
,rPA2.rmVvlStatus As rmVvlStatus
From
(Select rvlPropertyID
,Max(IsNull(assessmentDate,'1/1/1899')) As MaxDate
From dbo.rvlPropAssessments
Group By rvlPropertyID) rPA1
Inner Join
(Select rvlPropertyID
,Max(IsNull(assessmentDate,'1/1/1899')) As MaxDate
,rmVvlStatus
From dbo.rvlPropAssessments
Group By rvlPropertyID
,rmVvlStatus) rPA2
On rPA1.rvlPropertyID = rPA2.rvlPropertyID
And rPA1.MaxDate = rPA2.MaxDate) rPA3) As Group2
On Group1.rvlPropertyID = Group2.rvlPropertyID
And Group1.MaxRow = Group2.RowNum
Right Join dbo.rvlProperties rProp
On rProp.rvlPropertyID = Group1.rvlPropertyID
Join
(Select rvlProperties.rvlPropertyID
,Count(rmPhotos.photoName) As PhotoCounter
From dbo.rmPhotos Right Outer Join dbo.rvlProperties
On rmPhotos.photoSourceID = rvlProperties.rvlPropertyID
Group By rvlProperties.rvlPropertyID) cP
On rProp.rvlPropertyID = cP.rvlPropertyID
Where rProp.deleted = 0
Or rProp.deleted = ''
Or rProp.deleted Is Null;
Go
----------------------------------------
Select *
From dbo.GISPropertyAssessmentFlex
And finally if any one is curious these scripts run as follows on my Alienware M18X, Intel Core i7-3610QM @2.30 GHz, 8 GB Ram, Windows 7 Ultimate 64-bit:
GISPropertyAccessFlex - 46,763 records in 1326 Milliseconds duration
GISPropertyAssessmentFlex - 46,763 records in 1586 Milliseconds duration
[font="Arial"]“Any fool can know. The point is to understand.”
- Albert Einstein
"DOH!"
- Homer Simpson[/font]
December 17, 2012 at 5:55 pm
Glad to see this is finally sorted... and with good results, too...
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply