July 6, 2017 at 7:07 pm
Hi
I am not sure how you would pull this of in SQL but I was hoping someone could help me out.
I have the below data and would like to calculate:
When the Status is Re-Activation what the probability is of it going past 0 Completion Days by looking at the number of times it got 'REJE'(Rejected) and 'N-CO' (Not Complete) but also the time it took for the 'REJE' and 'N-CO' to occur.
Not really my area but it would be nice to learn something new.
I would like to create an extra column for each identifier telling me the probability of the it going past 0 days.
Thanks,
Danii
CREATE TABLE #temp
(
ServiceNumber varchar(20) NOT NULL
,Identifier varchar(20)NOT NULL
,CreatedDate DATETIME NOT NULL
,CompletedDate DATETIME NOT NULL
,SN_Type varchar(20) NOT NULL
,SN_Status varchar(20) NOT NULL
);
INSERT INTO #temp
VALUES ('318403853','61030203647','20170512','20170518','De-Activated', 'COMP')
INSERT INTO #temp
VALUES('318616723','61030203647','20170613','20170613','Re-Activattion', 'N-CO')
INSERT INTO #temp
VALUES('318637013','61030203647','20170615','20170615','Re-Activattion', 'REJE')
INSERT INTO #temp
VALUES('318639191','61030203647','20170615','20170616','Re-Activattion', 'COMP')
INSERT INTO #temp
VALUES('318637791','61030203666','20170615','20170616','Re-Activattion', 'Rejected')
INSERT INTO #temp
VALUES('318637792','61030203666','20170614','20170615','Re-Activattion', 'N-CO');
INSERT INTO #temp
VALUES ('000318634115','64074558782','20170615','20170615','Re-Activattion','N-CO')
INSERT INTO #temp
VALUES('000318636669','64074558782','20170615','20170615','Re-Activattion','COMP')
INSERT INTO #temp
VALUES('000318636873','64074558782','20170615','20170614','Re-Activattion','REJE')
INSERT INTO #temp
VALUES('000318623572','64074558782','20170614','20170614','Re-Activattion','N-CO')
INSERT INTO #temp
VALUES('000318627678','64074558782','20170614','20170614','Re-Activattion','N-CO')
INSERT INTO #temp
VALUES('000318614132','64074558782','20170613','20170613','Re-Activattion','N-CO')
INSERT INTO #temp
VALUES('00031861999','64074558799','20170613','20170613','Re-Activattion','N-CO')
INSERT INTO #temp
VALUES('00031861991','64074558791','20170613','20170613','De-Activated','N-CO')
INSERT INTO #temp
VALUES('01031861999','64074558792','20160729','20160805','Re-Activattion','N-CO')
INSERT INTO #temp
VALUES('02031861999','64074558792','20160810','20160810','Re-Activattion','REJE')
INSERT INTO #temp
VALUES('03031861999','64074558792','20160811','20160811','Re-Activattion','COMP')
;
select identifier
,case
when sum(case when SN_STATUS='COMP' and SN_TYPE = 'Re-Activattion' then 1 else 0 end)>0
then str(datediff(day
,min(case
when SN_TYPE = 'Re-Activattion'
then CreatedDate
else null
end
)
,max(case
when (SN_TYPE = 'Re-Activattion'
and SN_STATUS='COMP'
)
then CompletedDate
else null
end
)
)
)
when sum(case when SN_TYPE='Re-Activattion' then 1 else 0 end)>0
then 'NOT COMP'
else 'NO RE-ACT'
end
as RE_ACT_COMPLETION_TIME
, sum(case when (SN_STATUS = 'SENT' and SN_TYPE = 'Re-Activattion') then 1 else 0 end) 'RE-AN SENT #'
, sum(case when (SN_STATUS = 'ACCE' and SN_TYPE = 'Re-Activattion') then 1 else 0 end) 'RE-AN ACCE #'
, sum(case when (SN_STATUS = 'N-CO' and SN_TYPE = 'Re-Activattion') then 1 else 0 end) 'RE-AN N-CO #'
, sum(case when (SN_STATUS = 'REJE' and SN_TYPE = 'Re-Activattion') then 1 else 0 end) 'RE-AN REJE #'
from #temp
-- WHERE Identifier = '64074558792'
group by identifier
;
July 10, 2017 at 8:33 am
Probability is rather dependent on how you choose to perceive it. On small data samples, the likelihood of that probability being accurate is rather low. You usually need to measure something called "statistical significance" in order to ensure that your sample size isn't too small to be worth measuring. If you have data over a rather longer period of time (usually, this would be several years to a decade or more), you might then have some level of predictability. The problem isn't so much the computation of probability, but understanding the causes of variation in your data. Human error is so often the cause of variation in data, that regardless of the nature of the data, it can often result in unreliable predictions. If you were able to have a cause field that would have possible values that include human error, you could at least throw that part of the data away and start looking at the numbers from there.
As to getting help with computing probability, you're going to need to explain the nature of your data and exactly what it represents before it makes sense to start computing values.
Steve (aka sgmunson) π π π
Rent Servers for Income (picks and shovels strategy)
July 17, 2017 at 9:33 am
Jacob Wilkins - Monday, July 17, 2017 7:15 AMAlso, just as a note, please do not post the same topic multiple times.You've posted several versions of this question under sub-forums for different versions of SQL Server.
Posting a brand new topic means any information from the older posts is gone, including very helpful things like answers you've given to other members' questions. At a minimum it clutters up the forums.
On that note, what version of SQL Server are you actually using?
Cheers!
The same question or variants thereof has been posted 8 times. In some cases the OP has started a new thread with responders questions left unanswered. Very inefficient, very frustrating for responders.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
July 17, 2017 at 3:38 pm
ChrisM@Work - Monday, July 17, 2017 9:33 AMJacob Wilkins - Monday, July 17, 2017 7:15 AMAlso, just as a note, please do not post the same topic multiple times.You've posted several versions of this question under sub-forums for different versions of SQL Server.
Posting a brand new topic means any information from the older posts is gone, including very helpful things like answers you've given to other members' questions. At a minimum it clutters up the forums.
On that note, what version of SQL Server are you actually using?
Cheers!
The same question or variants thereof has been posted 8 times. In some cases the OP has started a new thread with responders questions left unanswered. Very inefficient, very frustrating for responders.
BWAAAA-HAAA!!!! You're "probably" right there, Chris. π
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply