August 23, 2018 at 8:04 pm
Hello,
I have a table that contains every month stats, the sample data is as below:
CREATE TABLE dbo.Transactions(
Date1 Varchar(5) Not NULL,
Amount1 FLOAT,
Amount2 FLOAT
)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201801, 1000000, 2000000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201802, 980000 , 1980000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201803, 1500000, 18500000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201804, 1400000, 19400000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201805, 1350000, 19500000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201806, 1443000, 14430000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201807, 1380000, 13800000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201808, 1400000, 14000000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201809, 1490000, 14490000)
I need to compare both Amount1 and Amount2 (in fact there are more stats columns in the real table)
Using Amount1 as example, as you can see, 201803 is an outlier to 201802 because the delta is too much compare to 201802 vs 201801;
We need only compare the previous month because this is run monthly basis; in 201803 we have 201801-201803 data, in 201808 we have 201801-201808 data, and the ourlier check is made to only the previous month, with that being said, 201804 is NOT outlier anymore, however, 201809 is.
Threshold will be defined (e.g. 20% to Amount1 and 10% to Amount2), hard coded in the query is fine.
Thank you very much.
August 24, 2018 at 2:07 am
I think 201809 is not an outlier as percentage deviation is less than 20% for amount1 and less than 10 % for amount2
CREATE TABLE dbo.Transactions(
Date1 Varchar(6) Not NULL,
Amount1 FLOAT,
Amount2 FLOAT
)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201801, 1000000, 2000000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201802, 980000 , 1980000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201803, 1500000, 18500000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201804, 1400000, 19400000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201805, 1350000, 19500000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201806, 1443000, 14430000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201807, 1380000, 13800000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201808, 1400000, 14000000)
INSERT INTO Transactions (Date1, Amount1, Amount2) VALUES (201809, 1490000, 14490000)
; with cte as
(
Select
Date1,
Amount1 as cur_Amount1,
lag(Amount1) over ( order by Date1 asc) as prv_Amount1,
Amount2 as cur_Amount2,
lag(Amount2) over ( order by Date1 asc) as prv_Amount2
from Transactions
)
Select Date1
,CASE WHEN (cur_Amount1-prv_Amount1)*100/prv_Amount1 >=20 THEN 'Amount1 is an outlier' else 'Amount1 is not an outlier' END AS AMOUNT1_OUTLIER
,CASE WHEN (cur_Amount2-prv_Amount2)*100/prv_Amount2 >=10 THEN 'Amount1 is an outlier' else 'Amount1 is not an outlier' END AS AMOUNT2_OUTLIER
from cte
August 24, 2018 at 8:43 am
Here's my attempt in one pass:CREATE TABLE #Transactions (
Date1 date NOT NULL PRIMARY KEY CLUSTERED,
Amount1 int,
Amount2 int
);
INSERT INTO #Transactions (Date1, Amount1, Amount2)
VALUES ('20180101', 1000000, 2000000),
('20180201', 980000 , 1980000),
('20180301', 1500000, 18500000),
('20180401', 1400000, 19400000),
('20180501', 1350000, 19500000),
('20180601', 1443000, 14430000),
('20180701', 1380000, 13800000),
('20180801', 1400000, 14000000),
('20180901', 1490000, 14490000);
SELECT
CURR.Date1,
CURR.Amount1,
CURR.Amount2,
CASE
WHEN ABS(((CURR.Amount1 - LAG(CURR.Amount1, 1) OVER(ORDER BY CURR.Date1)) * 100)/LAG(CURR.Amount1, 1) OVER(ORDER BY CURR.Date1)) >= 20 THEN 1
ELSE 0
END AS Amount1_Is_OutlierFlag,
CASE
WHEN ABS(((CURR.Amount2 - LAG(CURR.Amount2, 1) OVER(ORDER BY CURR.Date1)) * 100)/LAG(CURR.Amount2, 1) OVER(ORDER BY CURR.Date1)) >= 10 THEN 1
ELSE 0
END AS Amount2_Is_OutlierFlag
FROM #Transactions AS CURR
DROP TABLE #Transactions;
Note that the determination of what is an outlier does NOT agree with the original poster's claim of what is an outlier, and also identifies an outlier based on a change in amounts in EITHER direction instead of just for increases. Not sure if that is the intent, so will need the original poster to speak up.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
August 24, 2018 at 4:49 pm
Thank you both very much for the replies, both work!
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply