June 1, 2009 at 12:44 pm
I'm looking for suggestions, practices for situations such as the one I'm about to describe. I have a table like this:
CREATE TABLE [dbo].[Person_Unit_Relationships]
(
[Person_Unit_Relationship_Id] [int] NOT NULL IDENTITY(1, 1),
[Person_ID] [int] NOT NULL,
[Unit_ID] [int] NOT NULL,
[Relationship_Start_Date] [smalldatetime] NOT NULL,
[Relationship_End_Date] [smalldatetime] NULL,
[Relationship_Type_ID] [int] NOT NULL,
[Relationship_Status_ID] [int] NOT NULL,
[Relationship_End_Reason_ID] [int] NULL,
[Retire_Date] [smalldatetime] NULL,
[Is_Primary] [bit] NULL CONSTRAINT [DF_Person_Sending_Entities_Is_Primary] DEFAULT ((0)),
)
ALTER TABLE [dbo].[Person_Unit_Relationships] ADD CONSTRAINT [PK_Person_Unit_Relationships] PRIMARY KEY CLUSTERED ([Person_Unit_Relationship_Id])
GO
CREATE UNIQUE NONCLUSTERED INDEX [UX_Person_Unit_Relationships_Person_Unit_Start_Date] ON [dbo].[Person_Unit_Relationships] ([Person_ID], [Unit_ID], [Relationship_Start_Date])
GO
Where over time a person could have multiple relationships with the same unit. The most common queries will only want currently active relationships (Relationship_End_Date is null or >= GetDate() AND Retire_Date is null or >= GetDate()) or relationships withing a time range.
How do you handle this and why? The why is the important part.
The options I know of are:
2. Computed Column
3. IsNull or Coalesce
4. Use the OR syntax in the query
And here's some sample data:
INSERT INTO dbo.Person_Unit_Relationships (
Person_ID,
Unit_ID,
Relationship_Start_Date,
Relationship_End_Date,
Relationship_Type_ID,
Relationship_Status_ID,
Relationship_End_Reason_ID,
Retire_Date,
Is_Primary)
Select
2,
67,
'20040825',
'20060519',
3,
2,
NULL,
NULL,
1
Union ALL
Select
4,
4,
'20010212',
'20011219',
3,
2,
NULL,
NULL,
1
Union ALL
Select
4,
4,
'20040526',
'20050513',
3,
2,
NULL,
NULL,
1
Union ALL
Select
4,
32,
'20050514',
'20060531',
3,
2,
NULL,
NULL,
1
Union ALL
Select
5,
30,
'19881201',
'19891130',
3,
2,
NULL,
NULL,
1
Union ALL
Select
5,
32,
'19891201',
'19900531',
3,
2,
NULL,
NULL,
1
Union ALL
Select
5,
61,
'19880101',
'19881130',
3,
2,
NULL,
NULL,
1
Union ALL
Select
5,
61,
'19900801',
'19950101',
5,
23,
NULL,
NULL,
1
Union ALL
Select
8,
28,
'19720601',
'19730531',
3,
2,
NULL,
NULL,
1
Union ALL
Select
8,
32,
'19760501',
'19770430',
3,
2,
NULL,
NULL,
1
Union ALL
Select
8,
52,
'19750201',
'19760430',
3,
2,
NULL,
NULL,
1
Union ALL
Select
8,
62,
'19780501',
'19980101',
5,
7,
NULL,
NULL,
1
Union ALL
Select
8,
62,
'20010701',
'20040501',
5,
3,
NULL,
NULL,
1
Union ALL
Select
8,
67,
'19710901',
'19720531',
3,
2,
NULL,
NULL,
1
Union ALL
Select
9,
28,
'20030916',
'20031222',
3,
2,
NULL,
NULL,
1
Union ALL
Select
11,
61,
'19990915',
'20010912',
3,
2,
NULL,
NULL,
1
Union ALL
Select
13,
4,
'20030205',
'20030824',
3,
4,
NULL,
NULL,
1
Union ALL
Select
14,
4,
'19860901',
'19870331',
3,
2,
NULL,
NULL,
1
Union ALL
Select
14,
62,
'19870501',
'19941218',
5,
7,
NULL,
NULL,
1
Union ALL
Select
15,
14,
'20020520',
'20040114',
3,
2,
NULL,
NULL,
1
Union ALL
Select
15,
28,
'20000902',
'20020519',
3,
2,
NULL,
NULL,
1
Union ALL
Select
15,
32,
'20040115',
'20041231',
3,
2,
NULL,
NULL,
1
Union ALL
Select
19,
18,
'19760901',
'19770826',
3,
2,
NULL,
NULL,
1
Union ALL
Select
19,
32,
'19770827',
'19780712',
3,
2,
NULL,
NULL,
1
Union ALL
Select
19,
62,
'19780713',
'19790214',
5,
4,
NULL,
NULL,
1
Union ALL
Select
4,
62,
'20060601',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
11,
61,
'20010913',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
15,
62,
'20050101',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
21,
62,
'20020101',
NULL,
5,
28,
NULL,
NULL,
1
Union ALL
Select
28,
62,
'19950101',
NULL,
5,
9,
NULL,
'19950101',
1
Union ALL
Select
35,
42,
'19940101',
NULL,
5,
28,
NULL,
NULL,
1
Union ALL
Select
37,
62,
'19850801',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
39,
28,
'20070821',
NULL,
3,
2,
NULL,
NULL,
1
Union ALL
Select
44,
62,
'20080516',
NULL,
5,
3,
NULL,
NULL,
1
Union ALL
Select
45,
6,
'19700101',
NULL,
5,
9,
NULL,
'19700101',
1
Union ALL
Select
51,
62,
'19950201',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
58,
62,
'20020613',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
61,
62,
'19920801',
NULL,
5,
28,
NULL,
NULL,
1
Union ALL
Select
64,
61,
'20040716',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
73,
62,
'19861201',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
74,
5,
'20000601',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
78,
62,
'19680801',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
83,
36,
'20050601',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
90,
25,
'20080609',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
96,
62,
'19950601',
NULL,
5,
28,
NULL,
NULL,
1
Union ALL
Select
97,
6,
'19960224',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
117,
62,
'20030901',
NULL,
5,
9,
NULL,
'20030901',
1
Union ALL
Select
119,
26,
'20090101',
NULL,
3,
2,
NULL,
NULL,
1
Union ALL
Select
125,
62,
'19720801',
NULL,
5,
7,
NULL,
NULL,
1
Union ALL
Select
130,
6,
'20051230',
NULL,
5,
7,
NULL,
NULL,
1
Edit: Fixed sample code so it would work.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
June 1, 2009 at 12:55 pm
Always an ISNULL for me, interested to hear other's opinions
SELECT * FROM XXX WHERE
BeginDate @Date
June 1, 2009 at 1:01 pm
Jack,
Would a "Person_Unit_Relationships_Hist" table be an option? You keeps you main table very small and you are still able to get historic information. Since your table is a many to many relationship table there shouldn't be many affected modules.
If not, I would use a computed column (e.g. Is_Active) and an index on (Person_Id, Unit_Id, Is_Active).
Hope this helps!
Flo
June 1, 2009 at 1:28 pm
Flo,
In this particular application I could do anything I wanted, well, after discussion with the rest of the dev team, but I only provided the structure because I know I can usually understand problems better when structures are provided and this is one I currently have.
I'm really looking at, what is the best, most performant way to handle situations like this. In the future I may have to deal with issues like this on a large database and I want to know the best way and if it's the best way for large DB's then I want to use it on my small ones too!
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
June 1, 2009 at 1:58 pm
If I have to query data based on the above, I've found that the OR syntax gets me better index utilization than IsNULL. Beyond that, a computed column of active as Flo mentioned would seem to be a more interesting way to go.
BTW your sample data no worky...
Server: Msg 8115, Level 16, State 2, Line 1
Arithmetic overflow error converting expression to data type smalldatetime.
The statement has been terminated.
June 1, 2009 at 2:11 pm
I've used calculated columns for that kind of thing a few times, but I've generally found the best is a history table. Set up the indexes on them correctly, and active/historical tables can really work well. If you set them up to follow the rules of an updatable Union view (as per BOL "Create View"), it makes querying the whole thing quite convenient.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 1, 2009 at 2:18 pm
Luke L (6/1/2009)
If I have to query data based on the above, I've found that the OR syntax gets me better index utilization than IsNULL. Beyond that, a computed column of active as Flo mentioned would seem to be a more interesting way to go.BTW your sample data no worky...
Server: Msg 8115, Level 16, State 2, Line 1
Arithmetic overflow error converting expression to data type smalldatetime.
The statement has been terminated.
Fixed the sample data in the original post. I generated the insert statement from actual data and left out some quotes (') around a date and when it treats it as an integer it won't fit!
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
June 1, 2009 at 2:43 pm
No love for the ISNULL it seems- can you guys explain the negatives of doing it that way? Thanks 😀
I know there is the small performance hit, is this something that will add up to something meaningful over your data, or are there other reasons? (Index was mentioned- is that a factor? I wouldn't think so, so if it is please, let me know!)
June 1, 2009 at 3:19 pm
Mike McQueen (6/1/2009)
No love for the ISNULL it seems- can you guys explain the negatives of doing it that way? Thanks 😀I know there is the small performance hit, is this something that will add up to something meaningful over your data, or are there other reasons? (Index was mentioned- is that a factor? I wouldn't think so, so if it is please, let me know!)
The IsNull solution can end up with non-SARGable queries, which means index scans instead of index seeks. That can (almost always does) make a significant difference in performance.
It can also have a negative effect on the execution plan SQL Server comes up with. In some cases, this will matter more than the SARGability of the Where clause. What if, for example, the first run of the proc has a null value, and that results in 10 rows of data, which then have to be joined to another table? For 10 rows, it'll probably pick Nested Loops for the join mechanism. What if the next query has a specific value, and that makes it 10,000 rows? Suddenly, Nested Loops looks like a bad idea, but the optimizer is quite possibly stuck with the prior plan. Or it has to recompile the query each time it's run, and you end up with compilation locks, which reduce efficient concurrency.
If you split the table, then there's less question on it, and you'll generally end up with better execution plans and better index use.
Also, if you use the "arbitrarily late date" you are, to one extent or another, setting up a Y2K-type situation.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
June 1, 2009 at 3:22 pm
Indexing is definitely a reason for NOT using IsNull. Any time you use a function on a column in a WHERE or JOIN clause SQL Server has to apply that function against every row in the table/index. In the example I provided, it is not a huge deal as the column(s) I mention are not in an index and if they were they would either be one of the right-most columns or an included column in the index. So if I were to have a query like this:
SELECT
PUR.Person_ID,
PUR.Unit_ID
FROM
dbo.Person_Unit_Relationships AS PUR
WHERE
PUR.person_Id = 10 AND
PUR.Unit_ID = 62 AND
ISNULL(PUR.Relationship_End_Date, '20790602') >= GETDATE()
I can still get a Index seek with a bookmark lookup and then a filter because I have a unique index on Person_ID, Unit_ID, Relationship_Start_Date, but if my Index was on Relationship_End_Date, Person_Id, Unit_Id I only get an index scan.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
June 1, 2009 at 8:09 pm
Cardinal say to Confusedcius 😛 one day, "To avoid pork chop, must look eye.... BIG EYE." SO! When come to fork in road, use fork on pork chop. :hehe:
[font="Courier New"]--===== Do this in a nice safe place
USE TempDB
--===== Conditionally drop the test table for reruns
IF OBJECT_ID('dbo.JBMTest') IS NOT NULL
DROP TABLE dbo.JBMTest
--===== Create and populate a 1,000,000 row test table.
-- Column "RowNum" has a range of 1 to 100,000 unique numbers
-- Column "SomeInt" has a range of 1 to 50,000 non-unique numbers
-- Column "SomeLetters2" has a range of "AA" to "ZZ" non-unique 2 character strings
-- Column "SomeMoney has a range of 0.0000 to 99.9999 non-unique numbers
-- Column "SomeDate" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times
-- Column "SomeCSV" contains 'Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10'
-- for all rows.
-- Column "SomeHex12" contains 12 random hex characters (ie, 0-9,A-F)
--Jeff Moden
SELECT TOP 1000000
SomeID = IDENTITY(INT,1,1),
SomeInt = ABS(CHECKSUM(NEWID()))%50000+1,
SomeLetters2 = CHAR(ABS(CHECKSUM(NEWID()))%26+65)
+ CHAR(ABS(CHECKSUM(NEWID()))%26+65),
SomeCSV = CAST('Part01,Part02,Part03,Part04,Part05,Part06,Part07,Part08,Part09,Part10' AS VARCHAR(80)),
SomeMoney = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS MONEY),
SomeDate = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),
SomeHex12 = RIGHT(NEWID(),12)
INTO dbo.JBMTest
FROM Master.dbo.SysColumns t1,
Master.dbo.SysColumns t2
--===== Make 50k of the dates the end of time date. Since they're in random order, it should
-- affect all time frames equally.
UPDATE dbo.JBMTest
SET SomeDate = NULL
WHERE SomeID <= 50000
--===== A table is not properly formed unless a clustered index has been assigned
ALTER TABLE dbo.JBMTest
ADD PRIMARY KEY CLUSTERED (SomeID)
--===== Looky here. Big medicine for 2k5. Keeps pork chop out of nostril.
CREATE INDEX IX_JBMTest_SomeDate
ON dbo.JBMTest (SomeDate) INCLUDE (SomeID,SomeMoney)
PRINT '--===== OR IS NULL ==================================='
SET STATISTICS TIME ON
SELECT SomeID,SomeDate,SomeMoney
FROM dbo.JBMTest
WHERE SomeDate >= GETDATE()
OR SomeDate IS NULL
SET STATISTICS TIME OFF
PRINT '--===== ISNULL ==================================='
SET STATISTICS TIME ON
SELECT SomeID,SomeDate,SomeMoney
FROM dbo.JBMTest
WHERE ISNULL(SomeDate,GETDATE()) >= GETDATE()
SET STATISTICS TIME OFF
PRINT '--===== Coalesce ==================================='
SET STATISTICS TIME ON
SELECT SomeID,SomeDate,SomeMoney
FROM dbo.JBMTest
WHERE COALESCE(SomeDate,GETDATE()) >= GETDATE()
SET STATISTICS TIME OFF
--===== Update the Nulls to the end of SQL time
UPDATE dbo.JBMTest
SET SomeDate = '99991231'
WHERE SomeDate IS NULL
--===== Rebuild that index after such a large update
DBCC DBREINDEX ('dbo.JBMTest','IX_JBMTest_SomeDate')
PRINT '--===== End-of-Time Date ==================================='
SET STATISTICS TIME ON
SELECT SomeID,SomeDate,SomeMoney
FROM dbo.JBMTest
WHERE SomeDate >= GETDATE()
SET STATISTICS TIME OFF
[/font]
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply