June 7, 2010 at 1:11 pm
I went through the available posts I could find on this topic and couldn't find a situation quite like mine. Essentially, I need to return one record from the left side of the join with one record on the right side. My problem is getting the correct record. I have a feeling I'm missing something obvious, but I'm not able to get it.
The problem lies here. I need to find the number of days between an inmates booking date and their release date (inclusive). All the records are kept in a log table. There are two possible event types for a booking event (BOOK and REBK) and one for a release (RELS). Some inmates (name_id 123, below) are nice and clean and have only a single booking event and a single release event. Some are not (name_id 456). For name_id 456 the first book_id (22) is fine because there is only one booking event and one release event. The second book_id (33) is not so much because of the release and rebook events.
Thanks for any help.
Code is below.
--remove temp table if necessary
IF OBJECT_ID('tempdb..#jlog') IS NOT NULL
DROP TABLE #jlog
--make temp table
CREATE TABLE #jlog (
jlog_id int NOT NULL,
name_id int NOT NULL,
book_id int NOT NULL,
eventtype char(4) NOT NULL,
eventdate datetime NULL
)
--poplulate temp table
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (1,123,11,'BOOK','2009-09-10 16:21:00.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (15,123,11,'INMT','2009-09-16 16:37:59.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (65,123,11,'RELS','2009-09-17 01:23:34.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (445,456,22,'OTH ','2009-09-19 00:01:00.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (852,456,22,'BOOK','2009-09-19 04:10:00.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (22,456,22,'INMT','2009-09-19 04:20:40.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (28647,456,22,'BOND','2009-09-19 10:37:32.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (224756,456,22,'BOND','2009-09-19 10:37:33.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (4553,456,22,'BOND','2009-09-19 10:37:35.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (536,456,22,'RELS','2009-09-19 10:45:31.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (8797,456,33,'BOOK','2010-01-08 10:00:00.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (455,456,33,'INMT','2010-01-08 10:11:02.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (756,456,33,'BOND','2010-01-10 09:06:44.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (3115,456,33,'RELS','2010-01-10 09:07:03.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (8648,456,33,'REBK','2010-01-16 09:01:40.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (4389,456,33,'INMT','2010-01-16 09:09:22.000')
INSERT INTO #jlog (jlog_id,name_id,book_id,eventtype,eventdate) VALUES (1315,456,33,'RELS','2010-01-17 13:59:33.000')
--select data from temp table
SELECT jbook.name_id,jbook.book_id,jbook.eventtype,jbook.eventdate
,jrels.eventtype,jrels.eventdate
,DATEDIFF(day, jbook.eventdate, jrels.eventdate) servedDays
FROM #jlog jbook
INNER JOIN #jlog jrels
ON jbook.name_id = jrels.name_id
AND jbook.book_id = jrels.book_id
AND jbook.eventdate < jrels.eventdate
WHERE (jbook.eventtype = 'BOOK' OR jbook.eventtype = 'REBK')
AND jrels.eventtype = 'RELS'
ORDER BY jbook.name_id, jbook.book_id
June 7, 2010 at 2:42 pm
Here is my take on it. basically I would use Row_Number() function to create an event_ID for the BOOK/RELS events and then create the same for the RELS event. This would allow you to pair them up based on the event ID. This pairing allows you to pull the correct dates together. then you can do what you want with them.
DECLARE @InEvent Table
(name_id int NOT NULL,
book_id int NOT NULL,
eventdate datetime NULL,
eventtype char(4) NOT NULL,
eventID int NULL)
insert into @InEvent
Select
name_id
,book_id
,eventdate
,eventtype
,ROW_NUMBER() OVER(Partition BY name_id,book_id ORDER BY eventdate) as EventCnt
from #jlog
where eventtype in('BOOK','REBK')
order by eventdate
DECLARE @OutEvent Table
(name_id int NOT NULL,
book_id int NOT NULL,
eventdate datetime NULL,
eventtype char(4) NOT NULL,
eventID int NULL)
insert into @OutEvent
Select
name_id
,book_id
,eventdate
,eventtype
,ROW_NUMBER() OVER(Partition BY name_id,book_id ORDER BY eventdate) as EventCnt
from #jlog
where eventtype in('RELS')
order by eventdate
Select a.name_id,a.book_id,a.eventdate As BookDate,b.eventdate As RelsDate
from @InEvent a
join @outEvent b on a.name_id=b.name_id and a.book_id=b.book_id and a.eventID=b.eventID
Dan
If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.
June 7, 2010 at 3:01 pm
Looks like Dan beat me - he did this pretty much the same way I was working on.
Here's my solution. The basic difference from Dan's solution is that I'm using CTEs instead of table variables... if the underlying tables have indexes, this could be faster.
;WITH jbook AS
(
select *, RN = ROW_NUMBER() OVER (PARTITION BY name_id, book_id ORDER BY eventdate)
FROM #jlog
WHERE eventtype IN ('BOOK', 'REBK')
)
, jrels AS
(
select *, RN = ROW_NUMBER() OVER (PARTITION BY name_id, book_id ORDER BY eventdate)
FROM #jlog
WHERE eventtype = 'RELS'
)
SELECT jbook.name_id,jbook.book_id,jbook.eventtype,jbook.eventdate
,jrels.eventtype,jrels.eventdate
,DATEDIFF(day, jbook.eventdate, jrels.eventdate) servedDays
FROM jrels
JOIN jbook
ON jbook.name_id = jrels.name_id
AND jbook.book_id = jrels.book_id
AND jbook.RN = jrels.RN
ORDER BY jbook.name_id, jbook.book_id
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
June 7, 2010 at 3:10 pm
and here's the same approach, just without using intermediate tables (I'm using subqueries, or CTE's):
Side note (valid for both solutions provided so far): Those solution will only work if you don't have "dirty data" (e.g. two consecutive 'BOOK' or 'REBK' or 'RELS' rows per name_id, book_id or any othe invalid combination of current eventtype and previous eventtype.
As an alternative you might want to have a look at the "Quirky update" article[/url] by Jeff Moden. The concept described in this article would allow you to handle (almost?) any special scenario. But here's the downside of it: ... as long as you follow ALL the rules provided in that article (for example, having the correct clustered index, in this case I'd say it needs to be name_id, book_id, eventdate).
This "quirky update" solution is most probably the best you can get in terms of performance...
WITH cte AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY name_id, book_id ORDER BY eventdate ) AS ROW
FROM #jlog
WHERE eventtype IN('BOOK','REBK','RELS')
)
SELECT
cte1.name_id,
cte1.book_id,
cte1.eventdate AS book,
cte2.eventdate AS rels,
DATEDIFF(dd,cte1.eventdate,cte2.eventdate) AS days
FROM cte cte1
LEFT OUTER JOIN cte cte2
ON cte1.name_id=cte2.name_id
AND cte1.book_id=cte2.book_id
AND cte1.row=cte2.row-1
WHERE cte2.eventtype ='RELS'
June 8, 2010 at 6:29 am
Thanks all,
All three solutions work great and returned the same results (once I reconciled the selection). The only difference in run time seems to be that the first solution with temp tables runs slightly slower when applied to the full data table (3 seconds compared to sub 1 second). Nice to see it though because it provided for me a good comparison to the CTE solutions.
It's apparent I need to bone up on CTE's.
Now to make sure the data is clean enough to use these ... this is a third party application and I have no control over the data or the users...
Again, thanks to all three,
Steve.
June 8, 2010 at 3:20 pm
Here is a different approach. Not sure if this will work better or worse than other methods, you'd just have to test it on your own data.
If you don't already have an index on these columns, in order:
book_id int NOT NULL,
eventtype char(4) NOT NULL,
eventdate datetime NULL
add one, then you should be able to do this:
SELECT *, DATEDIFF(DAY, eventdate, releasedate) AS ServedDays
FROM (
SELECT jbook.name_id,jbook.book_id,jbook.eventtype,jbook.eventdate
,'RELS' AS [eventtype2]
,(SELECT TOP 1 jrels.eventdate
FROM #jlog jrels
WHERE jrels.book_id = jbook.book_id
AND jrels.eventtype = 'RELS'
AND jrels.eventdate > jbook.eventdate
ORDER BY jrels.eventdate) AS ReleaseDate
FROM #jlog jbook
WHERE (jbook.eventtype = 'BOOK' OR jbook.eventtype = 'REBK')
) AS derived
ORDER BY name_id, book_id
Scott Pletcher, SQL Server MVP 2008-2010
June 9, 2010 at 6:25 am
Yup. That works too. Time is about 1 second against the production table.
Of the four solutions, it is the only one that returns bookings with no releases. Not a requirement, but might be handy information to have available.
Luckily, two of the three indexes were already in place. I have not added the third because this is a third party database, and I try to be as light handed as possible when picking data from someone else's db.
Thanks!
Steve
January 18, 2012 at 11:34 pm
Dan, I'm studying this solution. Please tell me where the tables are being created with syntax 'DECLARE @InEvent Table' and 'INSERT into @InEvent'. This is neither a concrete table nor a temp table....where are tables created with this syntax stored? What is this method of DDL called so that I can google it and learn. Thank you.
-
January 18, 2012 at 11:52 pm
They are "Table Variables".
January 19, 2012 at 1:07 am
Bingo. Thanks for that. That led me to some useful links eg http://www.sql-server-performance.com/2007/temp-tables-vs-variables/
-
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply