February 1, 2017 at 11:33 am
Please have a look at my stored procedure and let me know if there's a more efficient way of doing it.
What it's supposed to do is to pull data into a temporary table, and based on that table, calculate: ratios, averages, and standard deviations.
The table looks something like this when completely populated:
All calculations are ConsequenceCategory-centric. In addition, the ratios are also project-centric. That is, a ratio is calculated by the sum of all TotalCredits where PeopleRelated = 1 divided by the sum of all TotalCredits period <-- And this is per project, per Consequence Category. For example, you see the value of 0.1304347826 under Ratio in the first three rows <-- this is calculated from the value of 3.000 under TotalCredit in the third row divided by the sum 3.000 + 19.000 + 1.000 = 23.000. We are only concerned with the first three rows for this calculation because they are the only ones for which the project Id is 8 and the Consequence Category is 'Economic'. For other categories in project 8, we calculate a different ratio, as we do for different projects and their categories.
The ratios are the individual data points that go into the average and the standard deviation. Again, the average and standard deviation are ConsequenceCategory-centric. They are the averages and standard deviations per category across all projects.
The point of efficiency that I'm wondering about is in the looping. This happens in more than one spot in the stored procedure. For each one, I'm essentially populating a table, looping through its rows, doing what I have to do for each row, deleting the current row, and getting the next row.
Is there a better way of doing this?
Here is the stored procedure:
ALTER PROCEDURE [dbo].[GetRelianceOnPersonnelByCategory]
@projectIdList nvarchar(max)
AS
-- Parse the project ID list into a table:
DECLARE @projectIdTable TABLE (ProjectId INT)
INSERT INTO @projectIdTable
SELECT ProjectId
FROM dbo.ConvertCsvProjectList(@projectIdList)
-- Prepare the table to hold our results:
DECLARE @resultTable TABLE (ProjectId INT,
ConsequenceCategory NVARCHAR(MAX),
TotalCredit DECIMAL(8,3),
PeopleRelated BIT,
Ratio DECIMAL(11,10),
Average DECIMAL(11,10),
StandardDeviation DECIMAL(11,10))
-- Table to hold unique consequence categories,
-- to be cycled through when calculating ratios, averages, and standard deviations:
DECLARE @CqCatTable TABLE (ConsequenceCategory NVARCHAR(MAX))
-- Variable to hold current project ID when cycling:
DECLARE @currentId INT
SELECT TOP 1 @currentId = ProjectId FROM @projectIdTable
-- Cycle through projects in ProjectIdTable,
-- insert consequence categories, total credits, and is or is not people related for all safeguards:
WHILE @currentId IS NOT NULL
BEGIN
INSERT INTO @resultTable (ProjectId, ConsequenceCategory, TotalCredit, PeopleRelated)
(
SELECT @currentId,
C.Name AS ConsequenceCategory,
SUM(S.Credit) AS TotalCredit,
SRDM.PeopleRelated AS PeopleRelated
FROM ProjectSafeguard PS
INNER JOIN Safeguard S ON PS.ProjectSafeguardId = S.SafeguardId
INNER JOIN Consequence Cq ON Cq.ConsequenceId = S.ConsequenceId
INNER JOIN Category Cat ON Cat.CategoryId = Cq.CategoryId
INNER JOIN Code C ON C.CodeId = Cat.DataMiningCodeId
INNER JOIN SafeRecCategory SRCat ON S.SafeRecCategoryId = SRCat.SafeRecCategoryId
INNER JOIN SafeRecDataMining SRDM ON SRCat.SafeRecDataMiningId = SRDM.SafeRecDataMiningId
WHERE PS.ProjectId = @currentId
AND SRDM.Code != '-'
GROUP BY SRDM.PeopleRelated,
C.Name
)
-- Get all the consequence categories for the current project,
-- for each one, we will calculate the ratio of total credits on people related safeguards
-- to total credits on all safeguards:
DELETE @CqCatTable
INSERT INTO @CqCatTable (ConsequenceCategory)
SELECT DISTINCT ConsequenceCategory FROM @resultTable WHERE ProjectId = @currentId
DECLARE @currentCqCat NVARCHAR(MAX)
SELECT TOP 1 @currentCqCat = ConsequenceCategory FROM @CqCatTable
DECLARE @SumPeopleRelated DECIMAL (16,10)
DECLARE @SumTotalCredit DECIMAL (16,10)
WHILE @currentCqCat IS NOT NULL
BEGIN
-- Get numerator and denominator for ratio:
SET @SumPeopleRelated = CAST(
(SELECT SUM(TotalCredit)
FROM @resultTable
WHERE PeopleRelated = 1 AND projectId = @currentId AND ConsequenceCategory = @currentCqCat)
AS DECIMAL(16,10))
SET @SumTotalCredit = CAST(
(SELECT SUM(TotalCredit)
FROM @resultTable
WHERE projectId = @currentId AND ConsequenceCategory = @currentCqCat)
AS DECIMAL(16,10))
IF @SumPeopleRelated IS NULL
BEGIN
SET @SumPeopleRelated = 0
END
IF @SumTotalCredit IS NULL OR @SumTotalCredit = 0
BEGIN
SET @SumPeopleRelated = 0
SET @SumTotalCredit = 1 -- ratio will be 0 below anyway
END
-- Calculate the ratio:
UPDATE @resultTable
SET Ratio = @SumPeopleRelated / @SumTotalCredit
WHERE projectId = @currentId AND ConsequenceCategory = @currentCqCat
-- Get next category
DELETE @CqCatTable WHERE ConsequenceCategory = @currentCqCat
SET @currentCqCat = NULL
SELECT TOP 1 @currentCqCat = ConsequenceCategory FROM @CqCatTable
END
-- Get next project:
DELETE FROM @projectIdTable WHERE ProjectId = @currentId
SET @currentId = NULL
SELECT TOP 1 @currentId = ProjectId FROM @projectIdTable
END
-- Prepare a table for holding project ID, consequence category, and ratio as unique records,
-- will be used to calculate average and standard deviation:
DECLARE @ProjIdCqCatRatioTable TABLE (ProjId INT, ConsequenceCategory NVARCHAR(MAX), Ratio DECIMAL(13,10))
INSERT INTO @ProjIdCqCatRatioTable
SELECT DISTINCT ProjectId, ConsequenceCategory, Ratio FROM @resultTable
-- Refill the consequence category table
DELETE @CqCatTable
INSERT INTO @CqCatTable (ConsequenceCategory)
SELECT DISTINCT ConsequenceCategory FROM @resultTable
DECLARE @average DECIMAL (11,10)
DECLARE @StdDev DECIMAL (11,10)
SELECT TOP 1 @currentCqCat = ConsequenceCategory FROM @CqCatTable
-- cycle through each category,
-- for each one, calculate average and standard deviation of ratios across all projects
WHILE @currentCqCat IS NOT NULL
BEGIN
SELECT @average = AVG(Ratio) FROM @ProjIdCqCatRatioTable WHERE ConsequenceCategory = @currentCqCat
SELECT @StdDev = STDEV(Ratio) FROM @ProjIdCqCatRatioTable WHERE ConsequenceCategory = @currentCqCat
-- Store average and standard deviation in results table:
UPDATE @resultTable
SET Average = @average, StandardDeviation = @StdDev
WHERE ConsequenceCategory = @currentCqCat
-- Get next category:
DELETE @CqCatTable WHERE ConsequenceCategory = @currentCqCat
SET @currentCqCat = NULL
SELECT TOP 1 @currentCqCat = ConsequenceCategory FROM @CqCatTable
END
-- Return results:
SELECT * FROM @resultTable
RETURN 0
February 1, 2017 at 12:38 pm
First, when posting code it REALLY helps if you use the SQL Code button to format your code.
You're using unnecessary WHILE loops. Get rid of them. The first one can be replaced by a simple JOIN on your @ProjectIDTable. The second one can be replaced by using the windowed versions of AVG() and STDEV().
Also, it's difficult to rewrite this without sample data to test on. And pictures of expected results don't help either. We want a SQL statement that we can execute to get the expected results.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
February 2, 2017 at 6:11 am
Something like this might help. There's no need to go row by row when you can do all projects at a time. You might want to consider dividing this into multiple queries if it's too complex.
WITH CTE1 AS(
SELECT
PS.ProjectId AS ProjectId,
C.Name AS ConsequenceCategory,
SUM(S.Credit) AS TotalCredit,
SRDM.PeopleRelated AS PeopleRelated
FROM ProjectSafeguard PS
INNER JOIN Safeguard S ON PS.ProjectSafeguardId = S.SafeguardId
INNER JOIN Consequence Cq ON Cq.ConsequenceId = S.ConsequenceId
INNER JOIN Category Cat ON Cat.CategoryId = Cq.CategoryId
INNER JOIN Code C ON C.CodeId = Cat.DataMiningCodeId
INNER JOIN SafeRecCategory SRCat ON S.SafeRecCategoryId = SRCat.SafeRecCategoryId
INNER JOIN SafeRecDataMining SRDM ON SRCat.SafeRecDataMiningId = SRDM.SafeRecDataMiningId
WHERE PS.ProjectId IN( SELECT ProjectID FROM @projectIdTable)
AND SRDM.Code != '-'
GROUP BY SRDM.PeopleRelated,
C.Name
),
CTE2 AS(
SELECT ProjectId,
ConsequenceCategory,
TotalCredit,
PeopleRelated,
ISNULL(SUM( CASE WHEN PeopleRelated = 1 THEN TotalCredit END) OVER(PARTITION BY ProjectId, ConsequenceCategory)
/NULLIF(SUM( TotalCredit) OVER(PARTITION BY ProjectId, ConsequenceCategory),0),0) AS Ratio
FROM CTE1
),
Ratios AS(
SELECT DISTINCT ProjectId, ConsequenceCategory, Ratio
FROM CTE2
)
SELECT c.ProjectId,
c.ConsequenceCategory,
c.TotalCredit,
c.PeopleRelated,
c.Ratio,
x.Average,
x.StandardDeviation
FROM CTE2 c
JOIN (SELECT ConsequenceCategory, AVG(Ratio) Average, STDEV(Ratio) StandardDeviation
FROM Ratios
GROUP BY ConsequenceCategory) x ON c.ConsequenceCategory = x.ConsequenceCategory
ORDER BY ProjectId, ConsequenceCategory;
February 2, 2017 at 10:38 am
Thanks both for your suggestions.
Luis, that's a very neat way of doing it. I'm sure to you it's nothing special, but I'm not quite the SQL guru as you obviously are.
Thanks especially for introducing me to OVER(PARTITION BY...) <-- I didn't know that existed.
February 2, 2017 at 11:58 am
junk.mail291276 - Thursday, February 2, 2017 10:38 AMThanks both for your suggestions.Luis, that's a very neat way of doing it. I'm sure to you it's nothing special, but I'm not quite the SQL guru as you obviously are.
Thanks especially for introducing me to OVER(PARTITION BY...) <-- I didn't know that existed.
So, do you understand how this works? Do you have any questions?
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply