September 29, 2020 at 5:29 am
I have a table in SQL Server like this:
CREATE TABLE #SymptomData (
PatientID INT NOT NULL,
Symptom VARCHAR(20) NOT NULL,
ConsentDate DATE NOT NULL,
Severity TINYINT NOT NULL );
GO
INSERT INTO #SymptomData (PatientID, Symptom, Severity, ConsentDate)
VALUES (1001, 'Diabetes', 3, '1-jan-2020'),(1001, 'Diabetes', 4, '15-jan-2020'),(1002,'Coronary Artery Disease',2,'01-Jan-2019'),(1002,'Coronary Artery Disease',3,'01-Jan-2020');
I'm trying to do return the count per (Symptom, Severity) in Python, and it's giving me fits. In SQL it's stupid simple
SELECT
PatientID,
Symptom,
MAX(Severity)
FROM #Data
GROUP BY
PatientID,
Symptom
Getting the population size is trivial in both... just do a
COUNT(DISTINCT(PatientID)) in T-SQL or
in T-SQL or
population_size = df.INDIVIDUAL_ID.nunique()
but how do I do get both the count and the percent of total in the same query in Python? I can get one or the other, but not both together.
# create a datafram with just IndividualID, LongName, delivery level
df2 = df.loc[:, ["INDIVIDUAL_ID", "LONG_NAME", "DELIVERY_LEVEL", "CONSENT_DATE"]]
# rename the columns (PatientID, Symptom, Grade)
df2.columns = [
'PatientID',
'Symptom',
'Grade',
'ConsentDate'
]
# get the max grade for each combination of (PatientID and symptom)
df3 = df2.groupby(['PatientID','Symptom'], as_index=False).agg({'Grade':max})
# use the above query and do a count, and divide by the population_size
df3.groupby(by=['Symptom','Grade']).agg("count")/population_size
# up to here works... but how do I combine the MaxGrade and the Count?
In T-SQL it's stupid easy... just wrap the inner query in () and query it in the outer query. But how do I do that in Python?
SELECT @PopulationSize = COUNT(DISTINCT(PatientID)) FROM ToxicityData;
SELECT
mgp.Symptom
,mgp.WorstGrade
,Frequency = COUNT(*)
,Relative_Frequency = COUNT(*)/@PopulationSize
FROM
(SELECT PatientID, Symptom, WorstGrade = MAX(Severity)
FROM ToxicityData
GROUP BY PatientID, Symptom) mgp
GROUP BY mgp.Symptom, mgp.WorstGrade;
Sorry for the daft question... just for the life of me can't figure out how to return a Frequency and Relative Frequency (divided by Population Size) in the same data frame.
September 30, 2020 at 6:10 am
Thanks for posting your issue and hopefully someone will answer soon.
This is an automated bump to increase visibility of your question.
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply