help with SQL query please

  • I have table with below structure

    Create table ClientDetails

    (ID int,

    ClientID int,

    InterviewDate Datetime)

    Insert into ClientDetails values(1,123,'01/01/2009')

    Insert into ClientDetails values(2,123,'01/01/2010')

    Insert into ClientDetails values(3,123,'04/04/2011')

    Insert into ClientDetails values(4,456,'10/10/2011')

    Insert into ClientDetails values(5,456,'03/03/2012')

    It should return the record with the latest record for each client ID and below should be the result.

    3,123,'04/04/2011' -- This is the latest record for Client ID 123 based on the ID

    5,456,'03/03/2012' -- This is the latest record for Client ID 456 based on the ID

    could someone please help me with the task...Table has around 2 to 3 million records.

  • Please do not cross post, it fragments and duplicates effort

  • Try:

    with CTE as

    (

    select ClientID, max(ID) as ID from ClientDetails group by ClientID

    )

    select d.ID, d.ClientID, d.InterviewDate

    from CTE as c

    join ClientDetails as d on d.ID = c.ID

    Hope this helps.

  • This will work.

    ;With CTE

    As

    (Select *, ROW_NUMBER() Over (Partition By DateOfBirth, SSN Order By ID) As Rownum From Employee)

    Select b.* From CTE as a

    JOIN Employee As b ON a.DateofBirth = b.DateofBirth AND a.SSN = b.SSN

    Where a.Rownum > 1

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • Sorry missed the group by Division part in my last post...here is the edited query:

    ;With CTE

    As

    (Select *, ROW_NUMBER() Over (Partition By DateOfBirth, SSN Order By ID) As Rownum From Employee)

    Select b.Id, b.Division, MAX(b.DateofBirth) As DateofBirth, MAX(b.SSN) As SSN, b.EmployeeCode From CTE as a

    JOIN Employee As b ON a.DateofBirth = b.DateofBirth AND a.SSN = b.SSN

    Where a.Rownum > 1

    Group By b.Division, b.EmployeeCode, b.ID

    Order By b.Division, b.EmployeeCode DESC

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • CREATE TABLE dbo.ClientDetails

    (

    ID integer NOT NULL,

    ClientID integer NOT NULL,

    InterviewDate datetime NOT NULL

    );

    INSERT dbo.ClientDetails VALUES(1,123,'01/01/2009');

    INSERT dbo.ClientDetails VALUES(2,123,'01/01/2010');

    INSERT dbo.ClientDetails VALUES(3,123,'04/04/2011');

    INSERT dbo.ClientDetails VALUES(4,456,'10/10/2011');

    INSERT dbo.ClientDetails VALUES(5,456,'03/03/2012');

    -- Table needs a primary key

    ALTER TABLE dbo.ClientDetails

    ADD CONSTRAINT [PK dbo.ClientDetails ID]

    PRIMARY KEY (ID);

    -- One possible optimal index for this specific query

    CREATE NONCLUSTERED INDEX [IX dbo.ClientDetails ClientID, InterviewDate]

    ON dbo.ClientDetails (ClientID, InterviewDate);

    -- Query (returns multiple rows per client if they have the same InterviewDate)

    SELECT

    cd.ID,

    cd.ClientID,

    cd.InterviewDate

    FROM dbo.ClientDetails AS cd

    WHERE

    cd.InterviewDate =

    (

    SELECT MAX(cd2.InterviewDate)

    FROM dbo.ClientDetails AS cd2

    WHERE

    cd2.ClientID = cd.ClientID

    );

  • this can be achived using corelated sub query

    select * from ClientDetails a where InterviewDate =

    (select MAX(InterviewDate) from ClientDetails b where a.ClientID=b.ClientID)

    order by 1 asc

  • to those who are saying its a cross post look at the DDL and Sample data and output. clearly a different problem just that the OP used the same title. To the OP forum etiquette says to use distinct titles so we can be clear on what is going on. Here is my code to solve the problem:

    WITH CTE AS

    (

    SELECT ROW_NUMBER() OVER (PARTITION BY ClientID ORDER BY InterviewDate DESC) Row,* FROM ClientDetails

    )

    SELECT ID, ClientID, InterviewDate

    FROM CTE AS c

    WHERE Row = 1

    and now a comparison of several different methods. If the OP only needs the ClientID and InterviewDate a simple MAX() query would be the best. Here is the output of my tests:

    -----======Coralated sub query======-------

    SQL Server Execution Times:

    CPU time = 967 ms, elapsed time = 979 ms.

    -----======Common Table Expression - MAX(ID) Join======-------

    SQL Server Execution Times:

    CPU time = 858 ms, elapsed time = 882 ms.

    -----======Common Table Expression - MAX(InterviewDate) Join======-------

    SQL Server Execution Times:

    CPU time = 967 ms, elapsed time = 980 ms.

    -----======Common Table Expression RowNumber======-------

    SQL Server Execution Times:

    CPU time = 1638 ms, elapsed time = 1834 ms.

    -----======Simple Max Query======-------

    SQL Server Execution Times:

    CPU time = 531 ms, elapsed time = 531 ms.

    Here is the code i used:

    /*

    CREATE TABLE ClientDetails

    (ID INT IDENTITY(1,1) PRIMARY KEY NONCLUSTERED,

    ClientID INT,

    InterviewDate DATETIME)

    CREATE CLUSTERED INDEX Clus_ClientDetails ON ClientDetails(InterviewDate)

    DECLARE @beginDate DATETIME, @endDate DATETIME

    SET @beginDate = '1/1/2000'

    SET @endDate = DATEADD(DD,0,DATEDIFF(DD,0,GETDATE()))

    INSERT INTO ClientDetails (InterviewDate,ClientID)

    SELECT TOP 1000000 DATEADD(d, CAST(-1 * DATEDIFF(d, @beginDate, @endDate)

    * RAND(CHECKSUM(NEWID())) AS INT), @endDate), ABS(CAST(NEWID() AS BINARY(6)) % 100000) + 1

    FROM Tally a, Tally b

    */

    DECLARE @Shutup INT -- Make it so i dont have up to 3 million rows of data

    DECLARE @Shutup2 DATETIME

    PRINT '-----======Coralated sub query======-------'

    SET STATISTICS TIME ON

    SELECT @Shutup = a.ClientID FROM ClientDetails a WHERE InterviewDate =

    (SELECT MAX(InterviewDate) FROM ClientDetails b WHERE a.ClientID=b.ClientID)

    SET STATISTICS TIME OFF

    PRINT '-----======Common Table Expression - MAX(ID) Join======-------'

    SET STATISTICS TIME ON;

    WITH CTE AS

    (

    SELECT ClientID, max(ID) AS ID FROM ClientDetails GROUP BY ClientID

    )

    SELECT @Shutup = d.ClientID

    FROM CTE AS c

    INNER JOIN ClientDetails AS d ON d.ID = c.ID

    SET STATISTICS TIME OFF;

    PRINT '-----======Common Table Expression - MAX(InterviewDate) Join======-------'

    SET STATISTICS TIME ON;

    WITH CTE AS

    (

    SELECT ClientID, max(InterviewDate) AS Interview FROM ClientDetails GROUP BY ClientID

    )

    SELECT @Shutup = d.ClientID

    FROM CTE AS c

    INNER JOIN ClientDetails AS d ON d.ClientID = c.ClientID AND c.Interview = d.InterviewDate

    SET STATISTICS TIME OFF;

    PRINT '-----======Common Table Expression RowNumber======-------'

    SET STATISTICS TIME ON;

    WITH CTE AS

    (

    SELECT ROW_NUMBER() OVER (PARTITION BY ClientID ORDER BY InterviewDate DESC) Row,* FROM ClientDetails

    )

    SELECT @Shutup = c.ClientID

    FROM CTE AS c

    WHERE Row = 1

    SET STATISTICS TIME OFF;

    PRINT '-----======Simple Max Query======-------'

    SET STATISTICS TIME ON;

    SELECT @Shutup = ClientID, @Shutup2 = MAX(InterviewDate) FROM ClientDetails GROUP BY ClientID

    SET STATISTICS TIME OFF;

    EDIT: forgot to add an index to my table once added went a lot quicker, New code posted with teh execution times.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • imex (4/30/2012)


    Try:

    with CTE as

    (

    select ClientID, max(ID) as ID from ClientDetails group by ClientID

    )

    select d.ID, d.ClientID, d.InterviewDate

    from CTE as c

    join ClientDetails as d on d.ID = c.ID

    Hope this helps.

    the problem with this though is if the interview dates are ever out of order (3,123,'2012-04-30'),(4,123,'2012-04-20') it fails to get the latest interview, just the last interview entered.


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • CELKO (5/1/2012)


    CREATE TABLE Client_Interviews

    (client_id INTEGER NOT NULL

    REFERENCES Clients(client_id),

    interview_date DATE NOT NULL,

    PRIMARY KEY (client_id, interview_date ));

    That assumes client_id and interview_date is a key - what if clients can have more than one interview on the same day?

    CELKO (5/1/2012)


    SELECT client_id, MAX(interview_date) AS last_interview_date

    FROM Client_Interviews

    GROUP BY client_id;

    That doesn't return the whole record (row).

  • This does:

    SSelect * From ClientDetails

    Where InterviewDate IN (Select MAX(InterviewDate) From ClientDetails Group By ClientID)

    Vinu Vijayan

    For better and faster solutions please check..."How to post data/code on a forum to get the best help" - Jeff Moden[/url] 😉

  • vinu512 (5/1/2012)


    This does:

    SSelect * From ClientDetails

    Where InterviewDate IN (Select MAX(InterviewDate) From ClientDetails Group By ClientID)

    the problem with that query is the IN list contains the max interview date for each client. however if Client A had interviews on '2012-04-20' and '2012-04-30' and Client B Had interviews on '2012-04-01' and '2012-04-20' you would get '2012-04-20' and '2012-04-30' in your IN list giving the correct result for Client B but both interviews for Client A. you need to have the Client ID in the sub query like so (reposted from above):

    SELECT

    cd.ID,

    cd.ClientID,

    cd.InterviewDate

    FROM dbo.ClientDetails AS cd

    WHERE

    cd.InterviewDate =

    (

    SELECT MAX(cd2.InterviewDate)

    FROM dbo.ClientDetails AS cd2

    WHERE

    cd2.ClientID = cd.ClientID

    );


    For faster help in answering any problems Please read How to post data/code on a forum to get the best help - Jeff Moden[/url] for the best way to ask your question.

    For performance Issues see how we like them posted here: How to Post Performance Problems - Gail Shaw[/url]

    Need to Split some strings? Jeff Moden's DelimitedSplit8K[/url]
    Jeff Moden's Cross tab and Pivots Part 1[/url]
    Jeff Moden's Cross tab and Pivots Part 2[/url]

  • The easy way to do this with SQL 2008 (as per the forum this is posted in), is to Cross/Outer Apply from your Clients table to this table.

    SELECT *

    FROM dbo.Clients

    OUTER APPLY

    (SELECT TOP 1 *

    FROM dbo.ClientDetails

    WHERE ClientDetails.ClientID = Clients.ClientID

    ORDER BY InterviewDate DESC) AS LastInterview;

    It would look something like that. Usually gets a very efficient execution plan.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (5/2/2012)


    It would look something like that. Usually gets a very efficient execution plan.

    Client ID would need to be a key to avoid duplicates there.

    You could also add WITH TIES to the TOP to give the same semantic as the query I posted.

    Strictly, the parameter to TOP should be enclosed in parentheses.

    APPLY and the my Segment Top plan generally run each other pretty close for performance with proper indexing. I do like just touching the table once, however, so aesthetically at least I tend to prefer the latter. One could also argue that the single scan for Segment Top is more stable than the APPLY's nested loops as the number of groups varies.

  • SQL Kiwi (5/2/2012)


    GSquared (5/2/2012)


    It would look something like that. Usually gets a very efficient execution plan.

    Client ID would need to be a key to avoid duplicates there.

    You could also add WITH TIES to the TOP to give the same semantic as the query I posted.

    Strictly, the parameter to TOP should be enclosed in parentheses.

    APPLY and the my Segment Top plan generally run each other pretty close for performance with proper indexing. I do like just touching the table once, however, so aesthetically at least I tend to prefer the latter. One could also argue that the single scan for Segment Top is more stable than the APPLY's nested loops as the number of groups varies.

    On ClientID being a key, that's why I suggested using a hypothetical Clients table as the foundation for the query. Not sure it exists, but it usually does in most databases.

    With Ties would definitely help if there are multiple interviews with the same timestamp. Or, if only one is wanted in those cases, there might be another column that could be used as a tie-breaker.

    The parentheses on Top are only needed outside of Select statements, but whatever is wanted on the standards for that. That's style, not function, so whatever you like on that.

    The Segment Top bit also applies mainly if there's no outer table to use as a foundation for the query. I'm assuming there is one, as per my post. If there isn't, then there are definitely better solutions to this whole thing. But there is also a need to actually normalize the database. Performance-tuning can often work around poor normalization, but you can also make a car get better gas mileage by limiting your driving to days when you will have a strong tail-wind. Neither is a good way to limit yourself.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 1 through 15 (of 19 total)

You must be logged in to reply to this topic. Login to reply