August 5, 2003 at 12:19 pm
I am trying to pull a result set of only the last record in a table using a date field. I need to get the entire record and only one record per grouping (by consumer). The record will have ID field that will link to other tables. When I use grouping I will get multiple rows as the field other than Consumer will have different values. This is the SQL statement I am using, simplified.
SELECT TOP 100 PERCENT Consumer_Id, Provider_Id, MAX(Effective_Date) AS LastVisit
FROM dbo.MHC_Admissions_Provider WHERE (Consumer_Id = 0x0000000000000158)
GROUP BY Consumer_Id, Provider_Id
ORDER BY Consumer_Id, LastVisit Desc
If I do a Max() on both the Effective_Date and Provider_ID I don't get the correct Provider_ID for the last visit.
I only need one row per consumer with the provider_ID for the last visit.
Any suggestions? thanks.
August 5, 2003 at 1:32 pm
If you're only looking for the last record for a specific consumer id, then try
SELECT Consumer_ID, Provider_ID, Effective_Date AS LastVisit
FROM dbo.MHC_Admissions_Provider
WHERE Consumer_ID = 0x0000000000000158
AND Effective_Date = (SELECT MAX(Effective_Date) FROM dbo.MHC_Admissions_Provider WHERE Consumer_ID = 0x0000000000000158)
August 5, 2003 at 1:35 pm
If you're only looking for the last record for a specific consumer id, then try
SELECT Consumer_ID, Provider_ID, Effective_Date AS LastVisit
FROM dbo.MHC_Admissions_Provider
WHERE Consumer_ID = 0x0000000000000158
AND Effective_Date = (SELECT MAX(Effective_Date) FROM dbo.MHC_Admissions_Provider WHERE Consumer_ID = 0x0000000000000158)
August 5, 2003 at 1:53 pm
This works for the specific consumer, but I need to get the last record for each consumer. Any help with this?
Thanks.
August 5, 2003 at 2:52 pm
This should (hopefully!!) work out for all the consumers...you'll have to alias the outer table so that you can reference it in the sub query...
SELECT MAP.Consumer_ID, MAP.Provider_ID, MAP.Effective_Date AS LastVisit
FROM dbo.MHC_Admissions_Provider As MAP
WHERE MAP.Effective_Date = (SELECT MAX(Effective_Date) FROM dbo.MHC_Admissions_Provider WHERE Consumer_ID = MAP.Consumer_ID)
August 6, 2003 at 5:26 am
This is a specific case of a general set of problems I've never found a good solution to, maybe someone else has.
Consider the result set of:
Select a, b, etc
from table
order by a, b
Now consider if you want to return only the top n in that order for each value of A. The top 1 case is a bit simpler, and can sometimes be done as the examples above (complications -- if you are about order within duplicate max dates, or if there are compound sorts that govern the "max).
Complicated inner selects are one approach.
Saving in a table and adding ranking numbers, and using the ranking number in an inner select bounded by the "n" portion of the top N is another.
And of course you can use a cursor, which works easily and straightfoward but for large result sets it is way too slow (seems like I never need this for 100 records, but usually 100,000,000).
Is there any other approach? I think SQL is just limited here.
I am totally unfamiliar with the OLAP tools for example, are there any tools there, or in ADO via openrecordset that might help shape the result?
August 6, 2003 at 8:42 am
This one seems to be coming up for me a lot lately... I'm glad to hear that I'm not the only one stumped by it.
Another approach I've used is to create a cache table populated by triggers. In the scenario from the original post, there'd be a row in the cache table for each consumer, updated to show only the most recent provider. It's redundant storage, but in the cases where I've used it the performance gain over complex subqueries has been well worth the compromise.
This is new to me, though:
quote:
Saving in a table and adding ranking numbers, and using the ranking number in an inner select bounded by the "n" portion of the top N is another.
How would you go about adding the ranking numbers?
August 6, 2003 at 8:47 am
FYI, here's another thread that includes a possible solution to the original question... I haven't tried it myself so I'm curious about the performance, but it looks pretty clean & readable:
August 6, 2003 at 3:34 pm
Re the other thread, that's a common solution but works only if it's one field you need. A related answer is to do that, then do another join of the result where you join over the selected field (which is hopefully unique in context). It can get really messy in the latter case.
As to putting it in a ranking table, I couldn't find an example though I know I did one. Basically create a new table with an identity and insert into it in order of your ranking. If you can then find the first (or last) row number for each grouping, you can do a range around it, so there's still an inner select.
I've also done really ugly things like this:
Select the data into a temp table, and create a null ranking column (or as many as you want if differnt kinds of ranking). Then do:
Update #tmp
set RankNbr = 1 +
select count(*)
from #tmp t2
where (#tmp.field < t2.field) and (#tmp.grouping = t2.grouping)
This gets you the ranking withing the grouping. If it's a composite field you have to do gross things like
(#tmp.f1 < t2.f1 or
(#tmp.f1 = #tmp.f1 and #tmp.f2 < t2.f2) )
Also, if you can have ties in your ranking criteria, what I usually do is select an arbitrary field from NewID() in the temp table, and include that in my ranking, so that there's always a tiebreaker and the results are meaningful (though equal rows are then assigned separate rankings).
But really, all of these stink. What I need is:
select a, b, c
from table
order by a, b, c
returning top 5 a, b
So that it's kind of like group by, but rather than consolidating within, it returns the top 5 within each of the specified grouping fields.
August 6, 2003 at 3:40 pm
i should make two caveats.
The ranking table and selecting a range around it fails if you don't have at least (for TOP N) the N in each grouping, otherwise you might get another group twice. Limit it with a careful where on the group-by fields.
And my example is only half the solution, but then selecting from it to show only the RankNbr < 5 or whatever TOP N is should be obvious.
August 7, 2003 at 9:44 am
Thanks for all of the help. I finally came up with a solution that gets me the last record for each consumer by date, and also gives the correct values for the other fields in the last record.
Here it is:
SELECT DISTINCT TOP 100 PERCENT dbo.MHC_Admissions_Provider.Consumer_Id,
dbo.MHC_Admissions_Provider.Effective_Date,
dbo.MHC_Admissions_Provider.Termination_Date,
dbo.MHC_Admissions_Provider.Provider_Id,
dbo.MHC_Admissions_Provider.MHC_Admissions_Provider_Id
FROM dbo.MHC_Admissions_Provider INNER JOIN
(SELECT Consumer_ID, MAX(Effective_Date) AS LastAdmit
FROM dbo.MHC_Admissions_Provider
GROUP BY Consumer_ID) ConsumerLastAdmission ON
dbo.MHC_Admissions_Provider.Effective_Date =
ConsumerLastAdmission.LastAdmit AND
dbo.MHC_Admissions_Provider.Consumer_Id =
ConsumerLastAdmission.Consumer_ID INNER JOIN
dbo.Contact ON ConsumerLastAdmission.Consumer_ID = dbo.Contact.Contact_Id
WHERE (dbo.MHC_Admissions_Provider.Consumer_Id IS NOT NULL) AND
(dbo.MHC_Admissions_Provider.Effective_Date <= GETDATE()) AND
(dbo.MHC_Admissions_Provider.Termination_Date > GETDATE() OR
dbo.MHC_Admissions_Provider.Termination_Date IS NULL) AND
(dbo.MHC_Admissions_Provider.MHC_Admissions_Provider_Id IS NOT NULL) AND
(dbo.Contact.Type = 'Consumer')
ORDER BY dbo.MHC_Admissions_Provider.Consumer_Id,
dbo.MHC_Admissions_Provider.Effective_Date DESC
By linking the sub query I was able to get the correct records. In my final query I added some additional criteria for selecting the records in case there were duplicates by the linked fields.
The performance on running this was very good and I have a very large database.
I hope this solution is helpful to you all.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply