October 6, 2010 at 6:53 am
Greetings,
I have a bit of a strange problem. I am trying to find out how to group data together but keep the data in a different order. For example, lets say I have a table in the following format.
DECLARE @BookLoans table
(
BookID int,
StudentID int,
LoanDate datetime,
ReturnDate datetime
)
Then load it with the following data:
INSERT INTO @BookLoans
SELECT 1, 56, '2010-01-01', '2010-01-07'
UNION
SELECT 1, 56, '2010-01-08', '2010-01-10'
UNION
SELECT 1, 78, '2010-01-11', '2010-01-18'
UNION
SELECT 1, 56, '2010-01-19', '2010-01-23'
UNION
SELECT 5, 23, '2010-03-23', '2010-04-01'
UNION
SELECT 5, 23, '2010-04-02', '2010-04-07'
UNION
SELECT 5, 23, '2010-04-08', '2010-04-26'
UNION
SELECT 8, 56, '2010-01-01', '2010-01-07'
UNION
SELECT 8, 56, '2010-01-08', '2010-01-10'
UNION
SELECT 8, 78, '2010-01-11', '2010-01-18'
UNION
SELECT 8, 56, '2010-01-19', '2010-01-23'
UNION
SELECT 8, 56, '2010-01-24', '2010-02-14'
UNION
SELECT 8, 56, '2010-02-14', '2010-03-22'
UNION
SELECT 8, 23, '2010-03-23', '2010-04-01'
UNION
SELECT 8, 23, '2010-04-02', '2010-04-07'
UNION
SELECT 8, 23, '2010-04-08', '2010-04-26'
The desire is to list each book in order. Then list each student that had loaned the book and the first LoanDate and the last ReturnDate with in the date sequence that they had sequencially loaned the book. Then do the same for the next student. The final result should look like the following table.
|---------------------------------------------|
|BookID | StudentID | LoanStart | LoanEnd |
|---------------------------------------------|
| 1 | 56 | 2010-01-01 | 2010-01-10 |
|---------------------------------------------|
| 1 | 78 | 2010-01-11 | 2010-01-18 |
|---------------------------------------------|
| 1 | 56 | 2010-01-19 | 2010-01-23 |
|---------------------------------------------|
| 5 | 23 | 2010-03-23 | 2010-04-26 |
|---------------------------------------------|
| 8 | 56 | 2010-01-01 | 2010-01-10 |
|---------------------------------------------|
| 8 | 78 | 2010-01-11 | 2010-01-18 |
|---------------------------------------------|
| 8 | 56 | 2010-01-19 | 2010-03-22 |
|---------------------------------------------|
| 8 | 23 | 2010-03-23 | 2010-04-26 |
|---------------------------------------------|
I have tried grouping by StudentID, but that gives the wrong answer as it also needs to group by the LoanDate first. But, this fields is changing with each record so each row becomes its own group. I am currently using a CLR to recursively run through the table in proper order and find the start and end of each section, but the performance is poor.
Is there a way to retrieve the desired results with a set based SQL script or am I doomed to have to use the "RBAR" method (thanks Jeff Moden).
Thank you for your time.
Terry Steadman
October 6, 2010 at 7:21 am
Here is what I did, and thank you for making the testing easy!
Select BookID, StudentID, LoanDate, ReturnDate
From @BookLoans
Group By BookID, StudentID, LoanDate, ReturnDate
Order by BookID ASC, LoanDate ASC, ReturnDate ASC
And the results:
1 56 2010-01-01 00:00:00.000 2010-01-07 00:00:00.000
1 56 2010-01-08 00:00:00.000 2010-01-10 00:00:00.000
1 78 2010-01-11 00:00:00.000 2010-01-18 00:00:00.000
1 56 2010-01-19 00:00:00.000 2010-01-23 00:00:00.000
5 23 2010-03-23 00:00:00.000 2010-04-01 00:00:00.000
5 23 2010-04-02 00:00:00.000 2010-04-07 00:00:00.000
5 23 2010-04-08 00:00:00.000 2010-04-26 00:00:00.000
8 56 2010-01-01 00:00:00.000 2010-01-07 00:00:00.000
8 56 2010-01-08 00:00:00.000 2010-01-10 00:00:00.000
8 78 2010-01-11 00:00:00.000 2010-01-18 00:00:00.000
8 56 2010-01-19 00:00:00.000 2010-01-23 00:00:00.000
8 56 2010-01-24 00:00:00.000 2010-02-14 00:00:00.000
8 56 2010-02-14 00:00:00.000 2010-03-22 00:00:00.000
8 23 2010-03-23 00:00:00.000 2010-04-01 00:00:00.000
8 23 2010-04-02 00:00:00.000 2010-04-07 00:00:00.000
8 23 2010-04-08 00:00:00.000 2010-04-26 00:00:00.000
October 6, 2010 at 7:26 am
Greetings g33kspeak,
Nice try, but this does not summarize the start and end dates for the book loan. It is only listing all records as they have been entered. Since you have grouped on the date fields, each row then became its own group and so you really don't have a group at all, just an order by. If you look at the result table I have included at the end of my question, you will see what the result should look like.
Thank you for your time.
Terry Steadman
October 6, 2010 at 7:28 am
I see... I missed the "totaling" of the time in the initial example... π
Back to the drawing board!
October 6, 2010 at 7:43 am
DECLARE @BookLoans table
(
BookID int,
StudentID int,
LoanDate datetime,
ReturnDate datetime
)
INSERT INTO @BookLoans
SELECT 1, 56, '2010-01-01', '2010-01-07'
UNION
SELECT 1, 56, '2010-01-08', '2010-01-10'
UNION
SELECT 1, 78, '2010-01-11', '2010-01-18'
UNION
SELECT 1, 56, '2010-01-19', '2010-01-23'
UNION
SELECT 5, 23, '2010-03-23', '2010-04-01'
UNION
SELECT 5, 23, '2010-04-02', '2010-04-07'
UNION
SELECT 5, 23, '2010-04-08', '2010-04-26'
UNION
SELECT 8, 56, '2010-01-01', '2010-01-07'
UNION
SELECT 8, 56, '2010-01-08', '2010-01-10'
UNION
SELECT 8, 78, '2010-01-11', '2010-01-18'
UNION
SELECT 8, 56, '2010-01-19', '2010-01-23'
UNION
SELECT 8, 56, '2010-01-24', '2010-02-14'
UNION
SELECT 8, 56, '2010-02-14', '2010-03-22'
UNION
SELECT 8, 23, '2010-03-23', '2010-04-01'
UNION
SELECT 8, 23, '2010-04-02', '2010-04-07'
UNION
SELECT 8, 23, '2010-04-08', '2010-04-26'
;WITH FirstOut AS (
SELECT
BookID
,StudentID
,MIN(LoanDate) AS LoanDate
FROM
@BookLoans
GROUP BY
BookID
,StudentID
)
, LastIn AS (
SELECT
BookID
,StudentID
,MAX(ReturnDate) AS ReturnDate
FROM
@BookLoans
GROUP BY
BookID
,StudentID
)
SELECT
f.BookID
,f.StudentID
,f.LoanDate
,l.ReturnDate
FROM
FirstOut f
JOIN
LastIn l
ON
f.BookID = l.BookID
AND
f.StudentID = l.StudentID
John
October 6, 2010 at 8:28 am
Greetings John,
Thank you for your reply, but it does not quite answer the problem. Your solution is grouping by the StudentID first, and then getting the Loan and Return dates. This loses the places where the book switched possession from student (56) to student (78) and then back to student (56) and their respective date ranges.
Terry Steadman
October 6, 2010 at 8:33 am
Greetings all,
As a heads up, I do have some code that will return a correct result. The problem that I am having is that it is too slow. If applied against a table with several thousand entries, it will start taking a very long time to process. This leads me to believe that it is using a "RBAR" method of findind the answer. Definitely not good when this script would be called quite often for larger reports.
The current, but slow, script I have is:
;WITH
BookLoans AS
(
SELECT
BookID,
StudentID,
LoanDate,
ReturnDate,
ROW_NUMBER() OVER (ORDER BY BookID, LoanDate) AS BookRow_ID,
DENSE_RANK() OVER (PARTITION BY BookID ORDER BY LoanDate) AS BookOrder_ID
FROM @BookLoans
),
BookList AS
(
SELECT
BookID,
StudentID,
LoanDate,
ReturnDate,
BookRow_ID
FROM BookLoans
WHERE
BookOrder_ID = 1
),
BookAnchor AS
(
SELECT
BookID,
StudentID,
LoanDate,
ReturnDate,
BookRow_ID,
1 AS BookStudentRow,
StudentID AS LastStudent
FROM BookList
UNION All
SELECT
bt.BookID,
bt.StudentID,
bt.LoanDate,
bt.ReturnDate,
bt.BookRow_ID,
(
CASE
WHEN bt.StudentID <> ba.LastStudent THEN
ba.BookStudentRow + 1
ELSE
ba.BookStudentRow
END
) AS BookStudentRow,
bt.StudentID AS LastStudent
FROM BookLoans AS bt
JOIN BookAnchor AS ba ON
bt.BookID = ba.BookID
AND (bt.BookRow_ID - 1) = ba.BookRow_ID
WHERE
bt.BookRow_ID > 1
)
SELECT
BookID,
StudentID,
MIN(LoanDate) AS LoanStart,
MAX(ReturnDate) AS LoanEnd
FROM BookAnchor
GROUP BY
BookID,
BookStudentRow,
StudentID
This script is actually a simplified version of the actual tables and data, but the logical problem and solution are still the same.
Thank you for your help.
Terry Steadman
October 6, 2010 at 8:43 am
Out of curiosity..what would you want to show up if this happened:
Student loans a book 01-01 to 01-07, then rents it again 01-09 to 01-10
Do you want it to just show up as the 1st to the 10th? Or as two separate, since it wasn't rented on the 8th?
Also...I don't see any way this can be done in a set.
October 6, 2010 at 8:52 am
Greetings Derrick,
In this case, it would show up as 1 line (2010-01-01 to 2010-01-10). I have extra code that would check for vacant or non-loaned lines as needed but for this problem, it would have only caused extra complexity and confusion.
I tend to agree that it does appear that there isn't a set based solution to this problem. That is sad as I had worked with other SQL databases that could group correctly based on an alternate sort method. I was just hoping that the SQL gurus here would notice my dilema and be able to provide a solution that also works with Microsoft SQL. I still hold hope that there is a way. π
Thank you for your help.
Terry Steadman
October 6, 2010 at 10:05 am
I think the order difference technique should work well here.
Try the following and let me know how it performs:
;WITH BLOrderDif
AS
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY BookID ORDER BY LoanDate)
- ROW_NUMBER() OVER (PARTITION BY BookID ORDER BY StudentID, LoanDate) AS OrderDiff
FROM @BookLoans
)
, BLRange
AS
(
SELECT BookID, StudentID, OrderDiff, MIN(LoanDate) AS LoanStart, MAX(ReturnDate) AS LoadEnd
FROM BLOrderDif
GROUP BY BookID, StudentID, OrderDiff
)
SELECT BookID, StudentID, LoanStart, LoadEnd
FROM BLRange
ORDER BY BookID, LoanStart
October 6, 2010 at 11:05 am
I have just had time to look at this again.
To allow for when a book is not on loan try using a numbers/tally table.
(I am just generating the numbers table here for convenience.)
;WITH Numbers5(N) AS ( SELECT 1 UNION ALL SELECT 0 )
,Numbers4(N) AS (SELECT 1 FROM Numbers5 N1 CROSS JOIN Numbers5 N2)
,Numbers3(N) AS (SELECT 1 FROM Numbers4 N1 CROSS JOIN Numbers4 N2)
,Numbers2(N) AS (SELECT 1 FROM Numbers3 N1 CROSS JOIN Numbers3 N2)
,Numbers1(N) AS (SELECT 1 FROM Numbers2 N1 CROSS JOIN Numbers2 N2)
,Numbers0(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM Numbers1)
,Numbers AS (SELECT N FROM Numbers0 WHERE N <= 1000)
,BRange
AS
(
SELECT BookID, MIN(LoanDate) AS LoanDate, MAX(ReturnDate) AS ReturnDate
FROM @BookLoans
GROUP BY BookID
)
, BL
AS
(
SELECT BookID, StudentID, LoanDate, ReturnDate
FROM @BookLoans
UNION ALL
SELECT R.BookID, -1, R.LoanDate + N.N -1, R.LoanDate + N.N -1
FROM BRange R
JOIN Numbers N
ON N.N <= DATEDIFF(day, R.LoanDate, R.ReturnDate) + 1
WHERE NOT EXISTS
(
SELECT *
FROM @BookLoans L1
WHERE L1.BookID = R.BookID
AND R.LoanDate + N.N -1 BETWEEN L1.LoanDate AND L1.ReturnDate
)
)
,BLOrderDif
AS
(
SELECT BookID, StudentID, LoanDate, ReturnDate
,ROW_NUMBER() OVER (PARTITION BY BookID ORDER BY LoanDate)
- ROW_NUMBER() OVER (PARTITION BY BookID ORDER BY StudentID, LoanDate) AS OrderDiff
FROM BL
)
, BLRange
AS
(
SELECT BookID, StudentID, OrderDiff, MIN(LoanDate) AS LoanStart, MAX(ReturnDate) AS LoanEnd
FROM BLOrderDif
GROUP BY BookID, StudentID, OrderDiff
)
SELECT BookID, StudentID, LoanStart, LoanEnd
FROM BLRange
WHERE StudentID > 0
ORDER BY BookID, LoanStart
October 6, 2010 at 11:07 am
Greetings Ken,
We Have A Winner!!! π
I can't say I fully understand how or why it works yet, but it does seem to do what I have been wanting to do before - reduce the date ranges into a single value that changes when the StudentID changes.
I have converted your example to my live test (not really books :-P) and the speed difference was enormous. Before, my fastest time was 29 seconds. Now, with your change in place, it sped up to less than a second (maybe .5 seconds).
Thank you for your help.
Terry Steadman
October 6, 2010 at 11:08 am
Nice one Ken. Very elegant solution to this..I didn't think of grouping on the rowdiff.
October 7, 2010 at 3:57 am
I am glad the order difference technique worked but unfortunately I cannot claim credit for the idea. The technique was used to win a competition but I cannot remember the details or who won it. (It may have been a databasejournal.com competition with either Joe Celko or Itzik Ben-Gan as one of the judges.)
It is an extremely elegant solution to work out when state changes β in this case who has borrowed a book. Like all good ideas, the underlying concept is simple β work out two orders so that the difference is the same while the state is the same.
October 7, 2010 at 5:54 am
That's nicely condensed, Ken. I'd been working on a version
;WITH CTE1 AS (
SELECT BookID, StudentID, LoanDate, ReturnDate,
RowNum1 = ROW_NUMBER() OVER (ORDER BY BookID, LoanDate),
RowNum2 = ROW_NUMBER() OVER (PARTITION BY StudentID ORDER BY BookID, LoanDate)
FROM @BookLoans
), CTE2 AS (
SELECT BookID, StudentID, LoanDate, ReturnDate,
LoanStretch = DENSE_RANK() OVER (ORDER BY BookID, StudentID, RowNum1-RowNum2)
FROM CTE1
)
SELECT BookID, StudentID, LoanStart = MIN(LoanDate), LoanEnd = MAX(ReturnDate)
FROM CTE2
GROUP BY BookID, StudentID, LoanStretch
ORDER BY BookID, MIN(LoanDate)
which does the same thing but includes unnecessary workings - sometimes it's handy to keep it in to help folks understand what's going on.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply