November 12, 2008 at 7:57 am
Adam Haines (11/12/2008)
Good article!This type of query can be handled much easier in SQL 2005, but I would still try to avoid a correlated subquery in SQL 2000. You can move the query into a derived table and make it inline. Obviously, this is over simplifying the solution but you get the idea.
SELECT
PersonRecord.*,
Improper.[IsImProper]
FROM PersonRecord
INNER JOIN(
SELECT
a.PersonID,
MAX(CASE
WHEN a.Version B.DEDate
THEN 1
ELSE 0
END) as IsImProper
FROM PersonRecord A
INNER JOIN PersonRecord B
ON A.PersonID = B.PersonID AND
A.Version = B.Version -1
GROUP BY A.PersonId
) AS Improper
ON Improper.[PersonID] = PersonRecord.PersonID
correction: the CASE clause should read:
WHEN a.DEDate > B.DEDate
but this does give the correct results.
The 2005 (partition) technique does not give the desired result of marking all rows for the personID.
November 12, 2008 at 8:01 am
Reply to :Why use the cross join?
Yes it could be done in a way you have shown but then the whole personid should be marked as improper if any DEDATE is found mismatched
Divya,
Your given query is essentially marking the personID as improper. In that case a query like:
SELECT PersonID, 1 as improper
FROM #PersonRecord A
INNER JOIN #PersonRecord B
ON A.PersonID = B.PersonID
AND A.Version = B.Version-1
WHERE A.DEDate >= B.DEDate
GROUP BY PersonID
Seems much more simple than the original code.
In Addition, the CROSS JOIN is not necessary (and probably removed by the optimizer) in the original code. The following is the same logic with the Cross join replaced with an INNER JOIN:
SELECT *,
(SELECT CASE
WHEN (SUM(CASE WHEN B.DEDate =1
THEN 1 ELSE 0
END AS isawk
from #PersonRecord B INNER JOIN #PersonRecord C
ON B.PersonID = C.PersonID
AND B.Version < C.Version
WHERE B.PersonID=A.PersonID) AS [Is ImProper]
from #PersonRecord A
If I were aiming for this result, I would probably use something like:
SELECT A.PersonID, Version, DEDate, COALESCE(Improper,0) as Improper
FROM #PersonRecord A
LEFT JOIN (SELECT DISTINCT B.PersonID, 1 as improper
FROM #PersonRecord B
INNER JOIN #PersonRecord C
ON B.PersonID = C.PersonID
AND B.Version = C.Version-1
WHERE B.DEDate >= C.DEDate) as t
ON A.PersonID = t.PersonID
I prefer to put the subquery in the FROM Clause instead of in the SELECT Clause, if possible. The benefit of this is seen when you want to move that subquery into a VIEW so that it can be used as a source in multiple queries.
--
JimFive
November 12, 2008 at 8:06 am
derailed from the topic and question.....
do not use more than two instances of given table...
try to give output like this.
PersonID IsImproper
1 1
2 1
3 0
4 1
November 12, 2008 at 8:09 am
Carla Wilson (11/12/2008)
Adam Haines (11/12/2008)
Good article!This type of query can be handled much easier in SQL 2005, but I would still try to avoid a correlated subquery in SQL 2000. You can move the query into a derived table and make it inline. Obviously, this is over simplifying the solution but you get the idea.
SELECT
PersonRecord.*,
Improper.[IsImProper]
FROM PersonRecord
INNER JOIN(
SELECT
a.PersonID,
MAX(CASE
WHEN a.Version B.DEDate
THEN 1
ELSE 0
END) as IsImProper
FROM PersonRecord A
INNER JOIN PersonRecord B
ON A.PersonID = B.PersonID AND
A.Version = B.Version -1
GROUP BY A.PersonId
) AS Improper
ON Improper.[PersonID] = PersonRecord.PersonID
correction: the CASE clause should read:
WHEN a.DEDate > B.DEDate
but this does give the correct results.
The 2005 (partition) technique does not give the desired result of marking all rows for the personID.
Thanks. The site completely stripped my code :(. I will edit my post.
November 12, 2008 at 9:31 am
I was curious how the different techniques performed, so I loaded up the PersonRecord table with more data using a Numbers table (please see article http://www.sqlservercentral.com/articles/TSQL/62867/ [/url]for Numbers table)
insert into PersonRecord
select N,0,'11/12/2008'
from Numbers
where N between 4 and 11000
insert into PersonRecord
select N,1,'11/13/2008'
from Numbers
where N between 4 and 11000
insert into PersonRecord
select N,2,'11/10/2008'
from Numbers
where (N between 4 and 11000) and N % 100 = 0
So total of 11000 PersonIds, and 22118 total rows in the PersonRecord table.
I then recorded the execution time for 4 of the techniques.
Original Query 610 Milliseconds duration
Adam Haines
- inner joins and CASE 576 Milliseconds duration
James Goodwin
- Left join on inner join 656 Milliseconds duration
Matt C - SQL2005
row_number() over (partition) 860 Milliseconds duration
Just thought you would find this interesting.
(edited to add link to Numbers table article)
November 12, 2008 at 9:49 am
Very interesting on the elapsed times... I would have figured I would have been the quickest without having to perform a table join, but I guess either the partition or the row_nuber generation makes it quite a bit slower. Thanks for putting that together!
At any rate, is was a good article and I enjoyed learning a couple of new aproaches!
November 12, 2008 at 12:06 pm
I'd be more interested in seeing the execution times where the data size is non-trivial. My suspicion would lead me to believe the row_number() solution might scale better based on my own experience using the related dense_rank() function.
You could even use two dense rank functions that are partitioned identically but have an order by that is based on the version number in one and the date field in the other, then using that to compare when the resulting ranks are not equal. It would require a derived table joined to the original table, but you could get the results you are looking for.
November 12, 2008 at 7:54 pm
Carla Wilson (11/12/2008)
I was curious how the different techniques performed, so I loaded up the PersonRecord table with more data using a Numbers table (please see article http://www.sqlservercentral.com/articles/TSQL/62867/ [/url]for Numbers table)
insert into PersonRecord
select N,0,'11/12/2008'
from Numbers
where N between 4 and 11000
insert into PersonRecord
select N,1,'11/13/2008'
from Numbers
where N between 4 and 11000
insert into PersonRecord
select N,2,'11/10/2008'
from Numbers
where (N between 4 and 11000) and N % 100 = 0
So total of 11000 PersonIds, and 22118 total rows in the PersonRecord table.
I then recorded the execution time for 4 of the techniques.
Original Query 610 Milliseconds duration
Adam Haines
- inner joins and CASE 576 Milliseconds duration
James Goodwin
- Left join on inner join 656 Milliseconds duration
Matt C - SQL2005
row_number() over (partition) 860 Milliseconds duration
Just thought you would find this interesting.
(edited to add link to Numbers table article)
Aye, nicely done. What would be more interesting is the CPU time and the I/O hit...
--Jeff Moden
Change is inevitable... Change for the better is not.
November 12, 2008 at 8:37 pm
Sorry... looks like those were CPU times.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 12, 2008 at 8:42 pm
You should see what happens with the correct and incorrect indexing. Adams code (using his because Carla rated it as the fastest) is more than twice as fast as the cross-join if the Clustered PK of PersonID,Version is correctly added to the table. If a non-unique clustered index is added to the same columns instead of the clustered PK, Adams code is almost 3 times as slow.
Neither method of indexing either helped nor hurt the cross-join.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 12, 2008 at 9:03 pm
Sorry... almost forgot... borrowed on Carla's good code and expanded the test data set a bit for my testing...
CREATE TABLE PersonRecord ( PersonID int NOT NULL, Version int NOT NULL,DEDate datetime NOT NULL)
go
insert into PersonRecord
select N,0,'11/12/2008'
from Tally
where N between 1 and 11000
insert into PersonRecord
select N,1,'11/13/2008'
from Tally
where N between 1 and 11000
insert into PersonRecord
select N,2,'11/10/2008'
from Tally
where (N between 1 and 11000) and N % 100 = 0
insert into PersonRecord
select N,2,'11/14/2008'
from Tally
where (N between 1 and 11000) and N % 100 > 0
insert into PersonRecord
select N,3,'11/15/2008'
from Tally
where N between 1 and 11000
insert into PersonRecord
select N,4,'11/16/2008'
from Tally
where N between 1 and 11000
--===== Uncomment to test with Clustered PK
--ALTER TABLE PersonRecord
--ADD PRIMARY KEY CLUSTERED (PersonID,Version)
--Jeff Moden
Change is inevitable... Change for the better is not.
November 12, 2008 at 11:10 pm
Hi Members,
I have two tables named 'result' and 'status' for which i have 'status_key_id' in my result table with default 'null'. I have to update the 'key_id' of 'status' table to 'status_key_id' in 'result' table.
Here in status table there will be only one row with 'started_at' and 'ended_at' and in result table there will be multiple rows within the date range of 'status' started_at and ended_at.
Here no inputs are given, just i have to check the 'status' table 'started_at >= result table 'started_at' and 'status table ended_at <= result table ended_at' and i have to update the 'status' key_id to 'status_key_id' in result table.
Please provide me a sample query with these requirement...
Regards,
Prabhu.
November 12, 2008 at 11:55 pm
Carla's comparison was really very nice.
Thanks I have learnt a lot many new things by writing this article
--Divya
November 13, 2008 at 5:01 am
It is also possible to do the query with just left joins,
SELECT a.PersonID, a.Version , a.dedate,
[is improper] = CASE WHEN MAX(c.version) IS NOT NULL THEN 1
ELSE 0 END
FROM personrecord a
LEFT JOIN personrecord b ON b.personid = a.personid
LEFT JOIN personrecord c ON c.personid = a.personid
AND c.version >= b.version
AND c.dedate < b.dedate
GROUP BY a.PersonID, a.Version, a.dedate
with the primary key on the table (personid, version) this performs slightly better than the cross join with both the small(8) and medium(circa .75m) rowcount.
I would avoid using the version = version -1 because this does not cater for missing versions (try doubling the version numbers in test data)
November 13, 2008 at 6:12 am
prabhu.rangan (11/12/2008)
Hi Members,I have two tables named 'result' and 'status' for which i have 'status_key_id' in my result table with default 'null'. I have to update the 'key_id' of 'status' table to 'status_key_id' in 'result' table.
Here in status table there will be only one row with 'started_at' and 'ended_at' and in result table there will be multiple rows within the date range of 'status' started_at and ended_at.
Here no inputs are given, just i have to check the 'status' table 'started_at >= result table 'started_at' and 'status table ended_at <= result table ended_at' and i have to update the 'status' key_id to 'status_key_id' in result table.
Please provide me a sample query with these requirement...
Regards,
Prabhu.
Prabhu,
It would probably be better for you to open your own thread on this... this particular thread is a discussion about an article on the subject of finding out of order rows based on dates and versions.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 41 total)
You must be logged in to reply to this topic. Login to reply