Query for abnormal differences / find ouliers

  • Hi ,

    I am trying to build a query that will identify 'abnormal' differences in the values column between each successive date record available for each individual account. The table will have multiple accounts and I want to identify the abnormal differences. The difference threshold to be identified is an absolute value of 80.

    thanks in advance ,

    some sample data below.

    create table DataTable (
    AccountIdinteger,
    Values_1 decimal(15),
    Time_1 datetime,
    DesignationTinyint
    );


    insert into DataTable values (1,1,'2022-08-01 00:00:00',1);
    insert into DataTable values (1,81,'2022-08-02 00:00:00',1);
    insert into DataTable values (1,88,'2022-08-03 00:00:00',1);
    insert into DataTable values (1,170,'2022-08-04 00:00:00',1);
    insert into DataTable values (1,172,'2022-08-05 00:00:00',1);
    insert into DataTable values (1,180,'2022-08-06 00:00:00',1);
    insert into DataTable values (1,200,'2022-08-07 00:00:00',1);
    insert into DataTable values (3,85,'2022-08-01 00:00:00',1);
    insert into DataTable values (3,90,'2022-08-02 00:00:00',1);
    insert into DataTable values (3,100,'2022-08-03 00:00:00',1);
    insert into DataTable values (3,115,'2022-08-04 00:00:00',1);
    insert into DataTable values (3,195,'2022-08-05 00:00:00',1);
    insert into DataTable values (3,202,'2022-08-06 00:00:00',1);
    insert into DataTable values (3,215,'2022-08-07 00:00:00',1);
    insert into DataTable values (7,1,'2022-08-01 00:00:00',1);
    insert into DataTable values (7,81,'2022-08-02 00:00:00',1);
    insert into DataTable values (7,88,'2022-08-03 00:00:00',1);
    insert into DataTable values (7,170,'2022-08-04 00:00:00',1);
    insert into DataTable values (7,172,'2022-08-05 00:00:00',1);
    insert into DataTable values (7,180,'2022-08-06 00:00:00',1);
    insert into DataTable values (7,200,'2022-08-07 00:00:00',1);
    insert into DataTable values (13,200,'2022-08-01 00:00:00',1);
    insert into DataTable values (13,12000,'2022-08-02 00:00:00',1);
    insert into DataTable values (13,12079,'2022-08-03 00:00:00',1);
    insert into DataTable values (13,12079,'2022-08-04 00:00:00',1);
    insert into DataTable values (13,12159,'2022-08-05 00:00:00',1);
    insert into DataTable values (13,12163,'2022-08-06 00:00:00',1);
    insert into DataTable values (13,6000,'2022-08-07 00:00:00',1);
    insert into DataTable values (17,5,'2022-08-01 00:00:00',1);
    insert into DataTable values (17,25,'2022-08-02 00:00:00',1);
    insert into DataTable values (17,56,'2022-08-03 00:00:00',1);
    insert into DataTable values (17,136,'2022-08-04 00:00:00',1);
    insert into DataTable values (17,170,'2022-08-05 00:00:00',1);
    insert into DataTable values (17,190,'2022-08-06 00:00:00',1);
    insert into DataTable values (17,275,'2022-08-07 00:00:00',1);

     

     

  • This was removed by the editor as SPAM

  • Try this

    with cte as (
    select AccountId,Values_1,Time_1,Designation,
    lag(Values_1) over(partition by AccountId order by Time_1) as PrevValues_1
    from DataTable
    )
    select AccountId,Values_1,PrevValues_1,Time_1,Designation,abs(Values_1 - PrevValues_1) as Diff
    from cte
    where abs(Values_1 - PrevValues_1) >= 80
    order by AccountId,Time_1;

    ____________________________________________________

    Deja View - The strange feeling that somewhere, sometime you've optimised this query before

    How to get the best help on a forum

    http://www.sqlservercentral.com/articles/Best+Practices/61537
  • This was removed by the editor as SPAM

  • A few years back I had a similar issue of finding abnormal values in a table. I found some articles about how to identify Gaps and Islands a in data.  This is a common problem in Transact-SQL and it was described by Itzik Ben-Gan here . Below I have adapted the solution to find gaps greater than 1000 but you can change that to 81.

    IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL 
    BEGIN
    DROP TABLE #TempTable
    END

    SELECT HT_SerNum AS SerialNumber
    INTO #TempTable
    FROM [dbo].[CT2_Data]
    ORDER BY HT_SerNum;

    WITH StartingPoints AS
    (
    SELECT SerialNumber, ROW_NUMBER() OVER(ORDER BY SerialNumber) AS rownum
    FROM #TempTable AS A
    WHERE NOT EXISTS
    (
    SELECT *
    FROM #TempTable AS B
    WHERE B.SerialNumber = A.SerialNumber - 1
    )
    ),
    EndingPoints AS
    (
    SELECT SerialNumber, ROW_NUMBER() OVER(ORDER BY SerialNumber) AS rownum
    FROM #TempTable AS A
    WHERE NOT EXISTS
    (
    SELECT *
    FROM #TempTable AS B
    WHERE B.SerialNumber = A.SerialNumber + 1
    )
    )
    SELECT S.SerialNumber AS start_range ,E.SerialNumber AS end_range
    ,(CAST(E.SerialNumber AS int)- CAST(S.SerialNumber AS int)) AS Range
    FROM StartingPoints AS S JOIN EndingPoints AS E
    ON E.rownum = S.rownum
    WHERE (CAST(E.SerialNumber AS int)- CAST(S.SerialNumber AS int)) > 1000;

     

    • This reply was modified 2 years, 2 months ago by  wburke 85918.
  • Mark Cowne , wburke 85918

    Really appreciate the help.

    That worked perfectly , exactly what I needed.

    Thank you ,

    • This reply was modified 2 years, 2 months ago by  HeftSteady.
  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply