retrieving duplicate consecutive records.

  • Hi all

    I have a table in which I save a record each time when a user checks in and checks out the office through a time in/ time out device(this device takes thumb impression of user, thus inserts record)

    table definition is as follows:

    checkInOutID (PK,int,not null)

    employeeID (FK, int, not null)

    isCheckIn (bit, not null)

    checkInOutTime (datetime, not null)

    where ischeckIn is "true" if user checks in and "false" if user checks out.

    if there are alternative checkin/check out records in the table then it means that there's no suspect entry of user. but if there are any multiple consecutive records of check in or check out, then all those entries will be considered to be suspicious.

    PLEASE HELP ME IN MAKING A QUERY OR A STORED PROCEDURE TO RETRIEVE ALL THOSE SUSPECT ENTRIES.

    I will be extremely grateful!!

    Regards,

    Nazish

  • ok, I'm assuming that by consecutive, you mean that the PK is an identity field, and that if you self join the table on itself by way of PK = PK -1, you should find identical results:

    declare @CheckinTable table(

    checkInOutID int identity(1,1) not null primary key,

    employeeID int not null, --no FK reference in this example

    isCheckIn bit not null,

    checkInOutTime datetime not null )

    insert into @CheckinTable(employeeID,isCheckIn,checkInOutTime)

    SELECT 1,1,getdate()

    insert into @CheckinTable(employeeID,isCheckIn,checkInOutTime)

    SELECT 1,0,getdate()

    insert into @CheckinTable(employeeID,isCheckIn,checkInOutTime)

    SELECT 2,1,getdate()

    insert into @CheckinTable(employeeID,isCheckIn,checkInOutTime)

    SELECT 2,1,getdate()

    insert into @CheckinTable(employeeID,isCheckIn,checkInOutTime)

    SELECT 3,1,getdate()

    insert into @CheckinTable(employeeID,isCheckIn,checkInOutTime)

    SELECT 3,0,getdate()

    SELECT * FROM @CheckinTable

    SELECT EARLIER.*,LATER.*

    from @CheckinTable EARLIER

    INNER JOIN @CheckinTable LATER ON EARLIER.checkInOutID = LATER.checkInOutID -1

    AND EARLIER.employeeID = LATER.employeeID

    AND EARLIER.isCheckIn = LATER.isCheckIn

    Results:

    checkInOutID employeeID isCheckIn checkInOutTime checkInOutID employeeID isCheckIn checkInOutTime

    ------------ ----------- --------- ----------------------- ------------ ----------- --------- -----------------------

    3 2 1 2008-02-25 09:50:12.930 4 2 1 2008-02-25 09:50:12.930

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Thanks alot Lowell!! but what if there are more than two consecutive duplicate records????an employee may checkin three,four,five...records on the same time. Should I use a loop here? Note that I have to retrieve all those suspect entries for single employee.

    N.B.EmployeeId is, of course, a foreign key(PK of Employee table).

  • nazish,

    It would be a big help if you provided some sample data according to the following article... help us help you... thanks...

    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • --===== Create the employee info table with

    CREATE TABLE EMPLOYEE_INFO

    (

    employeeID INT IDENTITY(1,1) PRIMARY KEY,

    employeeName VARCHAR(100)

    )

    --===== Create the checkin/out table with

    CREATE TABLE EMPLOYEE_CHECKINOUT

    (

    checkInOutID INT IDENTITY(1,1) PRIMARY KEY,

    employeeID INT FOREIGN KEY(FROM EMPLOYEE_INFO TABLE),

    isCheckInOut BIT,

    timeInOut DATETIME

    )

    --===== Insert the test data into the EMPLOYEE_INFO table

    INSERT INTO EMPLOYEE_INFO

    (

    employeeName

    )

    SELECT 'Nazish' UNION ALL

    SELECT 'Bob' UNION ALL

    SELECT 'Jill'

    --===== Insert the test data into the EMPLOYEE_CHECKINOUT table

    INSERT INTO EMPLOYEE_CHECKINOUT

    (

    employeeID,

    isCheckInOut,

    timeInOut

    )

    SELECT 1,1,gettime() UNION ALL

    SELECT 1,0,gettime() UNION ALL

    SELECT 1,1,gettime() UNION ALL --suspicious entry

    SELECT 1,1,gettime() UNION ALL --suspicious entry

    SELECT 1,1,gettime() UNION ALL --suspicious entry

    SELECT 1,0,gettime() UNION ALL

    SELECT 2,1,gettime() UNION ALL

    SELECT 2,0,gettime() UNION ALL --suspicious entry

    SELECT 2,0,gettime() UNION ALL --suspicious entry

    SELECT 2,1 gettime() UNION ALL

    SELECT 3,1,gettime() UNION ALL

    SELECT 3,0,gettime()

    I have inserted some suspicious entries there intensionally. Now how to retrieve all suspicious entries for each employee????

    Waiting for response.

  • my example still finds all suspected records...try it.

    here's your statements cleaned up so they work:

    the idea is joining the table on itself in order to do the comparison

    --===== Create the employee info table with

    CREATE TABLE EMPLOYEE_INFO

    (

    employeeID INT IDENTITY(1,1) PRIMARY KEY,

    employeeName VARCHAR(100)

    )

    --===== Create the checkin/out table with

    CREATE TABLE EMPLOYEE_CHECKINOUT

    (

    checkInOutID INT IDENTITY(1,1) PRIMARY KEY,

    employeeID INT FOREIGN KEY REFERENCES EMPLOYEE_INFO(employeeID),

    isCheckInOut BIT,

    timeInOut DATETIME

    )

    --===== Insert the test data into the EMPLOYEE_INFO table

    INSERT INTO EMPLOYEE_INFO

    (

    employeeName

    )

    SELECT 'Nazish' UNION ALL

    SELECT 'Bob' UNION ALL

    SELECT 'Jill'

    --===== Insert the test data into the EMPLOYEE_CHECKINOUT table

    INSERT INTO EMPLOYEE_CHECKINOUT

    (

    employeeID,

    isCheckInOut,

    timeInOut

    )

    SELECT 1,1,getdate() UNION ALL

    SELECT 1,0,getdate() UNION ALL

    SELECT 1,1,getdate() UNION ALL --suspicious entry

    SELECT 1,1,getdate() UNION ALL --suspicious entry

    SELECT 1,1,getdate() UNION ALL --suspicious entry

    SELECT 1,0,getdate() UNION ALL

    SELECT 2,1,getdate() UNION ALL

    SELECT 2,0,getdate() UNION ALL --suspicious entry

    SELECT 2,0,getdate() UNION ALL --suspicious entry

    SELECT 2,1, getdate() UNION ALL

    SELECT 3,1,getdate() UNION ALL

    SELECT 3,0,getdate()

    SELECT EARLIER.*,LATER.*

    from EMPLOYEE_CHECKINOUT EARLIER

    INNER JOIN EMPLOYEE_CHECKINOUT LATER ON EARLIER.checkInOutID = LATER.checkInOutID -1

    AND EARLIER.employeeID = LATER.employeeID

    AND EARLIER.isCheckInOut = LATER.isCheckInOut

    checkInOutID employeeID isCheckInOut timeInOut checkInOutID employeeID isCheckInOut timeInOut

    ------------ ----------- ------------ ----------------------- ------------ ----------- ------------ -----------------------

    3 1 1 2008-02-26 07:07:31.200 4 1 1 2008-02-26 07:07:31.200

    4 1 1 2008-02-26 07:07:31.200 5 1 1 2008-02-26 07:07:31.200

    8 2 0 2008-02-26 07:07:31.200 9 2 0 2008-02-26 07:07:31.200

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • You are absolutely right Lowell but what if I insert data like this:

    --===== Insert the test data into the EMPLOYEE_CHECKINOUT table

    INSERT INTO EMPLOYEE_CHECKINOUT

    (

    employeeID,

    isCheckInOut,

    timeInOut

    )

    SELECT 1,1,getdate() UNION ALL

    SELECT 1,0,getdate() UNION ALL

    SELECT 1,1,getdate() UNION ALL --suspicious entry

    SELECT 2,1,getdate() UNION ALL

    SELECT 1,1,getdate() UNION ALL --suspicious entry

    SELECT 1,1,getdate() UNION ALL --suspicious entry

    SELECT 2,0,getdate() UNION ALL --suspicious entry

    SELECT 1,0,getdate() UNION ALL

    SELECT 2,0,getdate() UNION ALL --suspicious entry

    SELECT 2,1, getdate() UNION ALL

    SELECT 3,1,getdate() UNION ALL

    SELECT 3,0,getdate()

    employee2 can check in before the checkout of employee1.

    in that scenario, will this condition work?

    "EARLIER.check_sk_seq = LATER.check_sk_seq -1 "

    waiting for ur response.

    Regards,

    Nazish

  • that goes back to my original assumption...you said consecutive, and i stated my solution would work If you meant consecutive by the PK of the table...now you've changed the rules....

    now you are talking consecutive login/outs by user....that will involve a tally table i think.

    I'll look at it when i get a chance, maybe someone else knows the solution offhand.

    correct me if i'm wrong, but an item is suspect if any employee has two isCheckInOut of the same value in a row, right?

    so two , with value 1 in isCheckInOut are suspect.... but what if there are two rwos with the value 0, they are suspect as well, right?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Exactly!!!

    two or more than two consecutive 1's or 0's make those records suspicious.

    A table containing alternatives 0 and 1 records will considered to be the perfect table, free from all suspects.

    Waiting restlessly for reply from any member.

    Regards,

    Nazish

  • Try this. Probably not very efficient on large data sets though

    WITH CTE AS

    (SELECT checkInOutID,

    employeeID,

    isCheckInOut,

    timeInOut,

    ROW_NUMBER() OVER(PARTITION BY employeeID ORDER BY checkInOutID) -

    ROW_NUMBER() OVER(PARTITION BY employeeID,isCheckInOut ORDER BY checkInOutID) AS grp

    FROM EMPLOYEE_CHECKINOUT),

    CTE2 AS

    (SELECT checkInOutID,

    employeeID,

    isCheckInOut,

    timeInOut,

    COUNT(*) OVER(PARTITION BY employeeID,isCheckInOut,grp) AS grpcnt

    FROM CTE)

    SELECT checkInOutID,

    employeeID,

    isCheckInOut,

    timeInOut

    FROM CTE2

    WHERE grpcnt>1

    ORDER BY checkInOutID

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • Hats off Marks!!!!

    i posted that pblm on all popular forums but u r the only one who solved my pblm 100%, although that query is very complicated for me to understand.

    Thanks alot of u!

    Regards,

    Nazish

  • I changed the test data a bit, My understanding could be wrong here.

    SELECT 1,1,getdate() UNION ALL --suspicious entry

    SELECT 1,1,getdate() UNION ALL --suspicious entry

    SELECT 2,1,getdate() UNION ALL -- Employee 2 Login

    SELECT 1,1,getdate() UNION ALL -- Employee 1 Login

    SELECT 2,0,getdate() UNION ALL -- Employee 2 Logout

    SELECT 2,0,getdate() UNION ALL --suspicious entry

    SELECT 2,1, getdate() UNION ALL --suspicious entry

    SELECT 3,1,getdate() UNION ALL-- Employee 3 Login

    SELECT 1,0,getdate() UNION ALL-- Employee 1 Logout

    SELECT 1,0,getdate() UNION ALL--suspicious entry

    SELECT 1,0,getdate() UNION ALL--suspicious entry

    SELECT 3,0,getdate() -- Employee 3 Logout

    With CheckIn AS

    (select CheckinoutID,employeeid,ischeckinout,

    row_number() over( partition by employeeid order by checkinoutid) CID

    from EMPLOYEE_CHECKINOUT),

    CheckOut As

    (select CheckinoutID,employeeid,case ischeckinout when 0 then 1

    else 100

    end ischeckinout,

    row_number() over( partition by employeeid order by checkinoutid) CID

    from EMPLOYEE_CHECKINOUT),

    Final As

    (Select CheckIn.CheckinoutID INCheckinoutID,CheckOut.CheckinoutID OutCheckinoutID

    From CheckIn,CheckOut

    where Checkin.CID + 1 = CheckOut.CID

    and checkin.employeeid=checkout.employeeid

    and CheckIn.IsCheckInout=CheckOut.ISCheckinOut)

    Select CheckinoutID From EMPLOYEE_CHECKINOUT

    Where not exists (Select INCheckinoutID from Final where INCheckinoutID=CheckinoutID)

    and not exists (Select OutCheckinoutID from Final where OutCheckinoutID=CheckinoutID)

    Again this query doesn't suit for tables with tons of data.

    --------------------------------------------------------------------------------------
    Save our world, its all we have! A must watch video Pale Blue Dot[/url]

  • Not to trash on your code, xpscodes, but I don't think it works completely like the original poster wants (or ultimately needs, if I understand the goals correctly). But I could be wrong, please let me know.

    One of the issues I saw above is that you're finding the employee with an in-out issue, but not the location of the issue. Because you're using a rowid field, if one record is out of sequence, I think all records after it fail because there's no way for the employeeID to "reset" once it finds a checkin again. [if I follow what you did correctly]

    The other thing I'm thinking is that, although in the example data the times are all identical and the work is all done off of the PK values, if the table is indeed getting INSERTs from the application with the timestamp (instead of generating it itself with a default value or something), it's probably safer to use the timestamp for ordering rather than the PK value, in case rows get inserted out of order for any reason.

    I'm following this up with a new post with code that may work better for nazish.kanwal...

  • Note that I wrote this still using your PK values as the series identifier, but if the timestamp values is better (see my previous comment), you should change the code accordingly:

    Assuming:

    create table #employee_checkinout (

    checkInOutID int identity(1,1) not null primary key,

    employeeID int not null, --no FK reference in this example

    isCheckIn bit not null,

    checkInOutTime datetime not null

    )

    If you want to find users whose first entry was a check-out:

    --look for all employees that checked out before they ever checked in:

    --all employees who have checked in subsequently at least once:

    select A.*

    from (

    select *

    from #employee_checkinout

    where isCheckIn = 0

    ) A inner join (

    select

    employeeID,

    min(checkInOutID) as firstCheckIn

    from #employee_checkinout

    where isCheckIn = 1

    group by employeeID

    ) B

    on A.employeeID = B.employeeID

    and A.checkInOutID < B.firstCheckIn

    union all

    --all who haven't:

    select *

    from #employee_checkinout

    where employeeID in (

    select employeeID

    from #employee_checkinout

    group by employeeID

    having max(cast(isCheckIn as tinyint)) = 0

    )

    Looking for problem cases:

    with CheckIn_NextEmployeeEntry as (

    --for every check-in, look at most recent record after it:

    select A.employeeID, A.checkInOutID, min(B.checkInOutID) as nextcheckInOutID

    from (

    --all check-ins:

    select *

    from #employee_checkinout

    where isCheckIn = 1

    ) A left outer join (

    --all check-outs:

    select *

    from #employee_checkinout

    ) B

    on A.employeeID = B.employeeID

    where A.checkInOutID < B.checkInOutID

    group by A.employeeID, A.checkInOutID

    ),

    --bad check-ins:

    CheckIn_BadRecs as (

    --if most recent for this check-in is also a check-in, show result:

    select C.*

    from CheckIn_NextEmployeeEntry C inner join #employee_checkinout D

    on C.nextcheckInOutID = D.checkInOutID

    where D.isCheckIn = 1

    ),

    --bad check-outs:

    CheckOut_BadRecs as (

    --if check-out and not in check-in to check-out list, show result:

    select E.*

    from (

    select *

    from #employee_checkinout

    where isCheckIn = 0

    ) E left outer join CheckIn_NextEmployeeEntry F

    on E.checkInOutID = F.nextcheckInOutID

    where F.nextcheckInOutID is null

    )

    select *

    from #employee_checkinout

    where checkInOutID in (

    --these are the check-outs related to the problem check-outs:

    select prevcheckInOutID

    from (

    select

    J.employeeID, J.checkInOutID,

    max(K.checkInOutID) as prevcheckInOutID

    from CheckOut_BadRecs J inner join #employee_checkinout K

    on J.employeeID = K.employeeID

    and J.checkInOutID > K.checkInOutID

    group by J.employeeID, J.checkInOutID

    ) Z

    union

    --these are the problem check-outs:

    select checkInOutID

    from CheckOut_BadRecs

    )

    union all

    select *

    from #employee_checkinout

    where checkInOutID in (

    --these are the problem check-ins:

    select checkInOutID

    from CheckIn_BadRecs

    union

    --these are the check-ins related to the problem check-ins:

    select nextcheckInOutID

    from CheckIn_BadRecs

    )

    Personally, I'd recommend you make the first CTE an actual (temporary) table, which you'd then index. Yes, this is classic triangle joins, but I think in lieu of a relatively tricky aggregation function this is better than many alternatives.

  • Thanks for the review. I thought no one will bother looking at the query as the original poster was satisfied with Mark's solution.

    One of the issues I saw above is that you're finding the employee with an in-out issue, but not the location of the issue. Because you're using a rowid field, if one record is out of sequence, I think all records after it fail because there's no way for the employeeID to "reset" once it finds a checkin again

    If the checkinout id is an identity column, is there a chance that a record could be out of sequence. My understanding here could be wrong. Could you provide a sample. Thanks

    --------------------------------------------------------------------------------------
    Save our world, its all we have! A must watch video Pale Blue Dot[/url]

Viewing 15 posts - 1 through 15 (of 15 total)

You must be logged in to reply to this topic. Login to reply