April 30, 2012 at 10:06 pm
I have table with below structure
Create table ClientDetails
(ID int,
ClientID int,
InterviewDate Datetime)
Insert into ClientDetails values(1,123,'01/01/2009')
Insert into ClientDetails values(2,123,'01/01/2010')
Insert into ClientDetails values(3,123,'04/04/2011')
Insert into ClientDetails values(4,456,'10/10/2011')
Insert into ClientDetails values(5,456,'03/03/2012')
It should return the record with the latest record for each client ID and below should be the result.
3,123,'04/04/2011' -- This is the latest record for Client ID 123 based on the ID
5,456,'03/03/2012' -- This is the latest record for Client ID 456 based on the ID
could someone please help me with the task...Table has around 2 to 3 million records.
April 30, 2012 at 10:56 pm
Please do not cross post, it fragments and duplicates effort
April 30, 2012 at 10:59 pm
Try:
with CTE as
(
select ClientID, max(ID) as ID from ClientDetails group by ClientID
)
select d.ID, d.ClientID, d.InterviewDate
from CTE as c
join ClientDetails as d on d.ID = c.ID
Hope this helps.
May 1, 2012 at 2:10 am
This will work.
;With CTE
As
(Select *, ROW_NUMBER() Over (Partition By DateOfBirth, SSN Order By ID) As Rownum From Employee)
Select b.* From CTE as a
JOIN Employee As b ON a.DateofBirth = b.DateofBirth AND a.SSN = b.SSN
Where a.Rownum > 1
May 1, 2012 at 2:25 am
Sorry missed the group by Division part in my last post...here is the edited query:
;With CTE
As
(Select *, ROW_NUMBER() Over (Partition By DateOfBirth, SSN Order By ID) As Rownum From Employee)
Select b.Id, b.Division, MAX(b.DateofBirth) As DateofBirth, MAX(b.SSN) As SSN, b.EmployeeCode From CTE as a
JOIN Employee As b ON a.DateofBirth = b.DateofBirth AND a.SSN = b.SSN
Where a.Rownum > 1
Group By b.Division, b.EmployeeCode, b.ID
Order By b.Division, b.EmployeeCode DESC
May 1, 2012 at 4:28 am
CREATE TABLE dbo.ClientDetails
(
ID integer NOT NULL,
ClientID integer NOT NULL,
InterviewDate datetime NOT NULL
);
INSERT dbo.ClientDetails VALUES(1,123,'01/01/2009');
INSERT dbo.ClientDetails VALUES(2,123,'01/01/2010');
INSERT dbo.ClientDetails VALUES(3,123,'04/04/2011');
INSERT dbo.ClientDetails VALUES(4,456,'10/10/2011');
INSERT dbo.ClientDetails VALUES(5,456,'03/03/2012');
-- Table needs a primary key
ALTER TABLE dbo.ClientDetails
ADD CONSTRAINT [PK dbo.ClientDetails ID]
PRIMARY KEY (ID);
-- One possible optimal index for this specific query
CREATE NONCLUSTERED INDEX [IX dbo.ClientDetails ClientID, InterviewDate]
ON dbo.ClientDetails (ClientID, InterviewDate);
-- Query (returns multiple rows per client if they have the same InterviewDate)
SELECT
cd.ID,
cd.ClientID,
cd.InterviewDate
FROM dbo.ClientDetails AS cd
WHERE
cd.InterviewDate =
(
SELECT MAX(cd2.InterviewDate)
FROM dbo.ClientDetails AS cd2
WHERE
cd2.ClientID = cd.ClientID
);
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 1, 2012 at 6:19 am
this can be achived using corelated sub query
select * from ClientDetails a where InterviewDate =
(select MAX(InterviewDate) from ClientDetails b where a.ClientID=b.ClientID)
order by 1 asc
May 1, 2012 at 7:58 am
to those who are saying its a cross post look at the DDL and Sample data and output. clearly a different problem just that the OP used the same title. To the OP forum etiquette says to use distinct titles so we can be clear on what is going on. Here is my code to solve the problem:
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ClientID ORDER BY InterviewDate DESC) Row,* FROM ClientDetails
)
SELECT ID, ClientID, InterviewDate
FROM CTE AS c
WHERE Row = 1
and now a comparison of several different methods. If the OP only needs the ClientID and InterviewDate a simple MAX() query would be the best. Here is the output of my tests:
-----======Coralated sub query======-------
SQL Server Execution Times:
CPU time = 967 ms, elapsed time = 979 ms.
-----======Common Table Expression - MAX(ID) Join======-------
SQL Server Execution Times:
CPU time = 858 ms, elapsed time = 882 ms.
-----======Common Table Expression - MAX(InterviewDate) Join======-------
SQL Server Execution Times:
CPU time = 967 ms, elapsed time = 980 ms.
-----======Common Table Expression RowNumber======-------
SQL Server Execution Times:
CPU time = 1638 ms, elapsed time = 1834 ms.
-----======Simple Max Query======-------
SQL Server Execution Times:
CPU time = 531 ms, elapsed time = 531 ms.
Here is the code i used:
/*
CREATE TABLE ClientDetails
(ID INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,
ClientID INT,
InterviewDate DATETIME)
CREATE CLUSTERED INDEX Clus_ClientDetails ON ClientDetails(InterviewDate)
DECLARE @beginDate DATETIME, @endDate DATETIME
SET @beginDate = '1/1/2000'
SET @endDate = DATEADD(DD,0,DATEDIFF(DD,0,GETDATE()))
INSERT INTO ClientDetails (InterviewDate,ClientID)
SELECT TOP 1000000 DATEADD(d, CAST(-1 * DATEDIFF(d, @beginDate, @endDate)
* RAND(CHECKSUM(NEWID())) AS INT), @endDate), ABS(CAST(NEWID() AS BINARY(6)) % 100000) + 1
FROM Tally a, Tally b
*/
DECLARE @Shutup INT -- Make it so i dont have up to 3 million rows of data
DECLARE @Shutup2 DATETIME
PRINT '-----======Coralated sub query======-------'
SET STATISTICS TIME ON
SELECT @Shutup = a.ClientID FROM ClientDetails a WHERE InterviewDate =
(SELECT MAX(InterviewDate) FROM ClientDetails b WHERE a.ClientID=b.ClientID)
SET STATISTICS TIME OFF
PRINT '-----======Common Table Expression - MAX(ID) Join======-------'
SET STATISTICS TIME ON;
WITH CTE AS
(
SELECT ClientID, max(ID) AS ID FROM ClientDetails GROUP BY ClientID
)
SELECT @Shutup = d.ClientID
FROM CTE AS c
INNER JOIN ClientDetails AS d ON d.ID = c.ID
SET STATISTICS TIME OFF;
PRINT '-----======Common Table Expression - MAX(InterviewDate) Join======-------'
SET STATISTICS TIME ON;
WITH CTE AS
(
SELECT ClientID, max(InterviewDate) AS Interview FROM ClientDetails GROUP BY ClientID
)
SELECT @Shutup = d.ClientID
FROM CTE AS c
INNER JOIN ClientDetails AS d ON d.ClientID = c.ClientID AND c.Interview = d.InterviewDate
SET STATISTICS TIME OFF;
PRINT '-----======Common Table Expression RowNumber======-------'
SET STATISTICS TIME ON;
WITH CTE AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY ClientID ORDER BY InterviewDate DESC) Row,* FROM ClientDetails
)
SELECT @Shutup = c.ClientID
FROM CTE AS c
WHERE Row = 1
SET STATISTICS TIME OFF;
PRINT '-----======Simple Max Query======-------'
SET STATISTICS TIME ON;
SELECT @Shutup = ClientID, @Shutup2 = MAX(InterviewDate) FROM ClientDetails GROUP BY ClientID
SET STATISTICS TIME OFF;
EDIT: forgot to add an index to my table once added went a lot quicker, New code posted with teh execution times.
For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]
Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]
May 1, 2012 at 8:40 am
imex (4/30/2012)
Try:
with CTE as
(
select ClientID, max(ID) as ID from ClientDetails group by ClientID
)
select d.ID, d.ClientID, d.InterviewDate
from CTE as c
join ClientDetails as d on d.ID = c.ID
Hope this helps.
the problem with this though is if the interview dates are ever out of order (3,123,'2012-04-30'),(4,123,'2012-04-20') it fails to get the latest interview, just the last interview entered.
For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]
Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]
May 1, 2012 at 5:16 pm
CELKO (5/1/2012)
CREATE TABLE Client_Interviews(client_id INTEGER NOT NULL
REFERENCES Clients(client_id),
interview_date DATE NOT NULL,
PRIMARY KEY (client_id, interview_date ));
That assumes client_id and interview_date is a key - what if clients can have more than one interview on the same day?
CELKO (5/1/2012)
SELECT client_id, MAX(interview_date) AS last_interview_dateFROM Client_Interviews
GROUP BY client_id;
That doesn't return the whole record (row).
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 1, 2012 at 11:29 pm
May 2, 2012 at 8:32 am
vinu512 (5/1/2012)
This does:
SSelect * From ClientDetails
Where InterviewDate IN (Select MAX(InterviewDate) From ClientDetails Group By ClientID)
the problem with that query is the IN list contains the max interview date for each client. however if Client A had interviews on '2012-04-20' and '2012-04-30' and Client B Had interviews on '2012-04-01' and '2012-04-20' you would get '2012-04-20' and '2012-04-30' in your IN list giving the correct result for Client B but both interviews for Client A. you need to have the Client ID in the sub query like so (reposted from above):
SELECT
cd.ID,
cd.ClientID,
cd.InterviewDate
FROM dbo.ClientDetails AS cd
WHERE
cd.InterviewDate =
(
SELECT MAX(cd2.InterviewDate)
FROM dbo.ClientDetails AS cd2
WHERE
cd2.ClientID = cd.ClientID
);
For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]
Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
Jeff Moden's Cross tab and Pivots Part 1[/url]
Jeff Moden's Cross tab and Pivots Part 2[/url]
May 2, 2012 at 8:48 am
The easy way to do this with SQL 2008 (as per the forum this is posted in), is to Cross/Outer Apply from your Clients table to this table.
SELECT *
FROM dbo.Clients
OUTER APPLY
(SELECT TOP 1 *
FROM dbo.ClientDetails
WHERE ClientDetails.ClientID = Clients.ClientID
ORDER BY InterviewDate DESC) AS LastInterview;
It would look something like that. Usually gets a very efficient execution plan.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
May 2, 2012 at 9:05 am
GSquared (5/2/2012)
It would look something like that. Usually gets a very efficient execution plan.
Client ID would need to be a key to avoid duplicates there.
You could also add WITH TIES to the TOP to give the same semantic as the query I posted.
Strictly, the parameter to TOP should be enclosed in parentheses.
APPLY and the my Segment Top plan generally run each other pretty close for performance with proper indexing. I do like just touching the table once, however, so aesthetically at least I tend to prefer the latter. One could also argue that the single scan for Segment Top is more stable than the APPLY's nested loops as the number of groups varies.
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
May 2, 2012 at 9:19 am
SQL Kiwi (5/2/2012)
GSquared (5/2/2012)
It would look something like that. Usually gets a very efficient execution plan.Client ID would need to be a key to avoid duplicates there.
You could also add WITH TIES to the TOP to give the same semantic as the query I posted.
Strictly, the parameter to TOP should be enclosed in parentheses.
APPLY and the my Segment Top plan generally run each other pretty close for performance with proper indexing. I do like just touching the table once, however, so aesthetically at least I tend to prefer the latter. One could also argue that the single scan for Segment Top is more stable than the APPLY's nested loops as the number of groups varies.
On ClientID being a key, that's why I suggested using a hypothetical Clients table as the foundation for the query. Not sure it exists, but it usually does in most databases.
With Ties would definitely help if there are multiple interviews with the same timestamp. Or, if only one is wanted in those cases, there might be another column that could be used as a tie-breaker.
The parentheses on Top are only needed outside of Select statements, but whatever is wanted on the standards for that. That's style, not function, so whatever you like on that.
The Segment Top bit also applies mainly if there's no outer table to use as a foundation for the query. I'm assuming there is one, as per my post. If there isn't, then there are definitely better solutions to this whole thing. But there is also a need to actually normalize the database. Performance-tuning can often work around poor normalization, but you can also make a car get better gas mileage by limiting your driving to days when you will have a strong tail-wind. Neither is a good way to limit yourself.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 15 posts - 1 through 15 (of 19 total)
You must be logged in to reply to this topic. Login to reply