January 31, 2011 at 3:39 am
Hi,
Some help with something new for me wrt SELECT queries.
I have data from a mining operation in a simple table with four fields as follows :
TimeStamp; IDNumber; Value; Period;
TimeStamp includes hh:mm:ss to allow me to enter hourly or shiftly data. The IDNumber is an integer that tells me what the parameter is e.g. tonnage or lab value for a particular stream, value is the data value and period tells me whether I am looking at hourly, shiftly or daily data.
By convention, I store daily values against 07:00:00 in the morning with a Period of 'D', but I may also have hourly and/or shiftly values for that parameter, so I could also have two other values at 07:00:00 but with different periods of 'S1' for the first shift of the day or 'H' for an hourly value.
My problem is best explained with a specific example :
The wet tonnage of ore has an IDNumber of 33080 and the contained % moisture of the ore has an IDNumber of 10080. I want to calculate daily dry tonnages (IDNo = 4080).
I get three shift values a day for the wet tonnage value :
'2011-01-15 07:00:00', 33080, 1000, 'S1'
'2011-01-15 15:00:00', 33080, 1500, 'S2'
'2011-01-15 23:00:00', 33080, 2000, 'S3'
I might get arbitrary hourly values for the contained % moisture :
'2011-01-15 07:00:00', 10080, 12.1, 'H'
'2011-01-15 09:00:00', 10080, 13.2, 'H'
'2011-01-15 11:00:00', 10080, 14.3, 'H'
'2011-01-15 13:00:00', 10080, 15.4, 'H' (Average for these four values is 13.75)
'2011-01-15 16:00:00', 10080, 13.5, 'H'
'2011-01-15 18:00:00', 10080, 14.6, 'H'
'2011-01-15 20:00:00', 10080, 15.7, 'H' (Ave = 14.60)
'2011-01-16 00:00:00', 10080, 14.8, 'H'
'2011-01-16 02:00:00', 10080, 15.9, 'H'
'2011-01-16 04:00:00', 10080, 16.0, 'H' (Ave = 15.57)
Now, I want to calculate an arithmetic average of all the moistures that fall within the 'S1' period which is >=07:00 and < 15:00 and apply that to the wet tonnage measured during 'S1'. Similarly with each period so that I can come up with a single daily total record for the derived dry tonnage :
Total Dry Tonnes = 1000*(100-13.75)/100 + 1500*(100-14.60)/100+2000*(100-15.57)/100 = 3832.17
and the corresponding record which I may wish to insert into the table would be . . .
'2011-01-15 07:00:00', 4080, 3832.17, 'D'
I'm a bit clueless about where to start writing this query because all the data is in the same table. If I had the moistures in a separate table I think this would be easier.
Perhaps this needs to be done in a stored procedure with looping to do the aggregation ? Perhaps it could be done in a view so I don't need to store the calculated results (this would be best).
TIA for any help,
Regards,
Adrian
January 31, 2011 at 5:38 am
A self join should work.
Something like:
-- *** Test Data ***
CREATE TABLE #t
(
[TimeStamp] datetime NOT NULL
,IDNumber int NOT NULL
,Value decimal(19, 2) NOT NULL
,Period varchar(2) NOT NULL
)
INSERT INTO #t
SELECT '20110115 07:00:00', 33080, 1000, 'S1'
UNION ALL SELECT '20110115 15:00:00', 33080, 1500, 'S2'
UNION ALL SELECT '20110115 23:00:00', 33080, 2000, 'S3'
UNION ALL SELECT '20110115 07:00:00', 10080, 12.1, 'H'
UNION ALL SELECT '20110115 09:00:00', 10080, 13.2, 'H'
UNION ALL SELECT '20110115 11:00:00', 10080, 14.3, 'H'
UNION ALL SELECT '20110115 13:00:00', 10080, 15.4, 'H'
UNION ALL SELECT '20110115 16:00:00', 10080, 13.5, 'H'
UNION ALL SELECT '20110115 18:00:00', 10080, 14.6, 'H'
UNION ALL SELECT '20110115 20:00:00', 10080, 15.7, 'H'
UNION ALL SELECT '20110116 00:00:00', 10080, 14.8, 'H'
UNION ALL SELECT '20110116 02:00:00', 10080, 15.9, 'H'
UNION ALL SELECT '20110116 04:00:00', 10080, 16.0, 'H'
-- *** End Test Data ***
;WITH ShiftTonnageMoisture
AS
(
SELECT S.[TimeStamp] AS ShiftTime
,S.Value AS WetTonnage
,COALESCE(ROUND(AVG(H.value), 2), 1.0) AS Moisture
FROM #t S
LEFT JOIN #t H -- outer join just in case no H values
ON H.[TimeStamp] >= S.[TimeStamp]
AND H.[TimeStamp] < DATEADD(hour, 8, S.[TimeStamp])
AND H.Period = 'H'
WHERE LEFT(S.Period, 1) = 'S' -- shift
AND S.IDNumber = 33080 -- wet tonnage
GROUP BY S.[TimeStamp], S.Value
)
-- INSERT INTO #t
SELECT DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0)) AS [TimeStamp]
,4080 AS IDNumber
,SUM((WetTonnage * (100 - Moisture)) / 100) AS Value
,'D' AS Period
FROM ShiftTonnageMoisture S
--WHERE NOT EXISTS
--(
--SELECT *
--FROM #t T
--WHERE T.[TimeStamp] = DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))
--AND T.Period = 'D'
--)
GROUP BY DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))
January 31, 2011 at 6:23 am
Ken, many thanks for your troubles. It will take me a while to digest what you have done, but it runs perfectly with the correct answer for the first one.
Although in this case it is not the "meat" of the solution, I think this is the second time I have had a reply with a construct that begins with . . .
With [Alias] AS ( SELECT . . . )
SELECT . . . . FROM [Alias]
which I gather is a common table expression. Is there a suitable book that springs to mind that would teach me more about how to write queries such as this one ? (I am sure a forum search would reveal many suggestions !)
Regards,
Adrian
January 31, 2011 at 6:47 am
A CTE, except for the special case of recursion, behaves just like a non-persisted view that can only be used in the currect statement.
ie Just think of a CTE as a VIEW.
CTEs are well documented in BOL. (Books online - the help which comes with SQL)
February 2, 2011 at 12:59 am
OK Ken (et al), I have gone through Ken's solution.
In my first posting, I simplified the problem and gave you a very limited data set that included only wet mass and moisture. As a result, you did not have to specify that your join should only be to records with an IDNumber of 10080 for the moisture.
I have provided a new data set (in the way that one should provide test data to this forum I hope) that uses my actual field names (I tried to provide descriptive field names in my original posting) This data set has shiftly data with a PID(=ParameterID) of 33080 which is the Wet Mass and I have included hourly data for moisture and % Cu (PID's = 10080 and 11081 respectively). As it turns out, there is only one hourly moisture value (PID=10080) per shift, but this is not necessarily the case and there are multiple hourly % Cu values per shift which need to be averaged.
Would one have to run a separate query per species (PID) that has related hourly values that need to be aggregated or can one return a Daily subtotal record per PID from a single query?
If one wanted to use a default value for the case that there were zero related hourly values for a particular shift, how would one do this ? In this data set, I already mentioned there are only single hourly values per shift for Moisture. If during one of the shifts there was missing moisture data, I'd like to provide a default value of say 15%.
One final real-life complication in this problem is that the mining operation does not run equal shifts, so this would not allow use of the condition AND H.[LogTime] < DATEADD(hour, 8, S.[LogTime])
in the join where 8 hours has been hard coded. The shifts run 07:00 to 15:00 (8 hours), 15:00 to 21:00 (6 hours) and 21:00 to 07:00 (10 hours).
The new set of test data is . . .
--===== If the test table already exists, drop it
IF OBJECT_ID('RUM..tblTestData','U') IS NOT NULL
DROP TABLE tblTestData
--===== Make the Test Table
CREATE TABLE [dbo].[tblTestData](
[LogTime] [datetime] NOT NULL,
[PID] [int] NOT NULL,
[Value] [float] NOT NULL,
[Period] [varchar](2) NOT NULL)
INSERT INTO tblTestData
(LogTime, PID, Value, Period)
SELECT 'Nov 1 2010 7:00AM','33080','201.6','S1' UNION ALL
SELECT 'Nov 1 2010 8:00AM','10080','12.25','H' UNION ALL
SELECT 'Nov 1 2010 8:00AM','11081','1.48','H' UNION ALL
SELECT 'Nov 1 2010 3:00PM','33080','451.28','S2' UNION ALL
SELECT 'Nov 1 2010 6:00PM','10080','14.66','H' UNION ALL
SELECT 'Nov 1 2010 6:00PM','11081','3.24','H' UNION ALL
SELECT 'Nov 1 2010 8:00PM','11081','1.08','H' UNION ALL
SELECT 'Nov 1 2010 9:00PM','33080','1295.03','S3' UNION ALL
SELECT 'Nov 1 2010 10:00PM','10080','10.62','H' UNION ALL
SELECT 'Nov 1 2010 10:00PM','11081','1.32','H' UNION ALL
SELECT 'Nov 2 2010 12:00AM','11081','1.92','H' UNION ALL
SELECT 'Nov 2 2010 6:00AM','11081','1.48','H' UNION ALL
SELECT 'Nov 2 2010 7:00AM','33080','1255.42','S1' UNION ALL
SELECT 'Nov 2 2010 8:00AM','10080','13.72','H' UNION ALL
SELECT 'Nov 2 2010 8:00AM','11081','1.39','H' UNION ALL
SELECT 'Nov 2 2010 10:00AM','11081','1.52','H' UNION ALL
SELECT 'Nov 2 2010 12:00PM','11081','1.48','H' UNION ALL
SELECT 'Nov 2 2010 3:00PM','33080','731.15','S2' UNION ALL
SELECT 'Nov 2 2010 6:00PM','10080','13.95','H' UNION ALL
SELECT 'Nov 2 2010 6:00PM','11081','1.62','H' UNION ALL
SELECT 'Nov 2 2010 8:00PM','11081','1.65','H' UNION ALL
SELECT 'Nov 2 2010 9:00PM','33080','1652.1','S3' UNION ALL
SELECT 'Nov 2 2010 10:00PM','10080','15.07','H' UNION ALL
SELECT 'Nov 2 2010 10:00PM','11081','1.72','H' UNION ALL
SELECT 'Nov 3 2010 12:00AM','11081','1.53','H' UNION ALL
SELECT 'Nov 3 2010 2:00AM','11081','1.78','H' UNION ALL
SELECT 'Nov 3 2010 4:00AM','11081','1.56','H' UNION ALL
SELECT 'Nov 3 2010 6:00AM','11081','1.64','H'
I have duplicated the SQL script provided by Ken, but I have substituted the new field names to run with the above data set . . .
WITH ShiftTonnageMoisture
AS
(
SELECT S.[LogTime] AS ShiftTime
,S.Value AS WetTonnage
,COALESCE(Round(AVG(H.Value),2), 1.0) AS Moisture
FROM tblTestData S
LEFT JOIN tblTestData H -- outer join just in case no H values
ON H.[LogTime] >= S.[LogTime]
AND H.[LogTime] < DATEADD(hour, 8, S.[LogTime])
AND H.Period = 'H'
WHERE LEFT(S.Period, 1) = 'S' -- shift
AND S.PID = 33080 -- wet tonnage
GROUP BY S.[LogTime], S.Value
)
-- INSERT INTO tblTestData
SELECT DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0)) AS [LogTime]
,4080 AS PID
,SUM((WetTonnage * (100 - Moisture)) / 100) AS Value
,'D' AS Period
FROM ShiftTonnageMoisture S
--WHERE NOT EXISTS
--(
-- SELECT *
-- FROM tblTestData T
-- WHERE T.[LogTime] = DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))
-- AND T.Period = 'D'
--)
GROUP BY DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))
Edit Added later :
I forgot to mention that while the moisture is used to calculate the Dry Mass (PID = 4080) from the Wet Mass, the %Cu(PID=11081) is used to calculate the Tons of Contained Copper (PID = 12080) from the Dry Mass. So, the actual calculations are different in each case :
DryMass = WetMass * (100-%Moisture)/100
TonsCu = DryMass * %Cu/100
The results for the first two days of the month by manual calculation are :
INSERT INTO tblTestData
(LogTime, PID, Value, Period)
SELECT 'Nov 1 2010 7:00AM','4080','1719.52','D' UNION ALL
SELECT 'Nov 1 2010 7:00AM','12080','29.1481','D' UNION ALL
SELECT 'Nov 2 2010 7:00AM','4080','3115.46','D' UNION ALL
SELECT 'Nov 2 2010 7:00AM','12080','49.2327','D'
Regards,
Adrian
February 2, 2011 at 10:39 am
Just do another self join and sort it from there.
Something like the following:
DECLARE @DefaultMoisture float = 15.0
,@DefaultCu float = 2.0;
WITH ShiftTonnageMoistureCu
AS
(
SELECT S.[LogTime] AS ShiftTime
,S.Value AS WetTonnage
,COALESCE(AVG(M.Value), @DefaultMoisture) AS MoistureP
,COALESCE(AVG(C.Value), @DefaultCu) AS CuP
FROM tblTestData S
LEFT JOIN tblTestData M -- outer join just in case no M values
ON M.[LogTime] >= S.[LogTime]
AND M.[LogTime] <
DATEADD(hour, CASE RIGHT(S.period, 1) WHEN '1' THEN 8 WHEN '2' THEN 6 WHEN '3' THEN 10 END, S.[LogTime])
AND M.Period = 'H'
AND M.PID = 10080 -- Moisture %
LEFT JOIN tblTestData C -- outer join just in case no C values
ON C.[LogTime] >= S.[LogTime]
AND C.[LogTime] <
DATEADD(hour, CASE RIGHT(S.period, 1) WHEN '1' THEN 8 WHEN '2' THEN 6 WHEN '3' THEN 10 END, S.[LogTime])
AND C.Period = 'H'
AND C.PID = 11081 -- CU %
WHERE S.PID = 33080 -- wet tonnage shift
GROUP BY S.[LogTime], S.Value
)
, Calcs
AS
(
SELECT DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0)) AS [LogTime]
,ROUND(SUM((WetTonnage * (100 - MoistureP)) / 100), 2) AS DryMass
,ROUND(SUM(((WetTonnage * (100 - MoistureP)) / 100) * CuP /100), 4) AS Cu
FROM ShiftTonnageMoistureCu S
GROUP BY DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))
)
, Statics (PID)
AS
(
SELECT 4080 -- DryMass
UNION ALL SELECT 12080 -- Cu
)
-- INSERT INTO tblTestData
SELECT C.LogTime, S.PID
,CASE
WHEN S.PID = 4080
THEN C.DryMass
ELSE C.Cu
END AS Value
,'D' AS Period
FROM Calcs C
CROSS JOIN Statics S
--WHERE NOT EXISTS
--(
--SELECT *
--FROM tblTestData T
--WHERE T.[LogTime] = C.LogTime
--AND T.Period = 'D'
--AND T.PID = S.PID
--)
February 3, 2011 at 1:44 am
Ken, thank you so much. I now have more work to do understanding what you have done. The result set from your query is EXACTLY what I need.
As a minor side issue, the shift lengths of 8, 6 and 10 are hard coded in the conditional clauses of the joins. To make this query more flexible, would it be reasonable practice to declare some variables for these shift lengths and then read them from a lookup table such as :
DECLARE @S1L shortint, @S2L shortint, @S3L shortint
SET @S1L = (SELECT Shift1Length FROM tblLookup);
SET @S2L = (SELECT Shift2Length FROM tblLookup);
SET @S3L = (SELECT Shift3Length FROM tblLookup);
The hard coded numbers would then be replaced with the relevant variables.
To get back to your solution, I have a proposed solution from a friend that doesn't use any joins, but yields the same correct answers although only to a shift level. This script pivots the results which I don't require, so I will have to modify it to remove the pivot and give Daily values. Do you have any comment on his method vs yours for this particular problem ?
WITH TmpTbl AS
(
SELECT PID, SUM(Value) AS TheSum, COUNT(Value) AS TheAmount,
CASE
WHEN DATEPART(hh, LogTime) >= 21 THEN CONVERT(varchar, LogTime, 106) + ' 21:00:00'
WHEN (DATEPART(hh, LogTime) >= 15 AND DATEPART(hh, LogTime) < 21) THEN CONVERT(varchar, LogTime, 106) + ' 15:00:00'
WHEN DATEPART(hh, LogTime) >= 7 THEN CONVERT(varchar, LogTime, 106) + ' 7:00:00'
ELSE DATEADD(dd, -1, CONVERT(varchar, LogTime, 106)) + ' 21:00:00'
END AS ShiftStart
FROM tblTestData
GROUP BY PID,
CASE
WHEN DATEPART(hh, LogTime) >= 21 THEN CONVERT(varchar, LogTime, 106) + ' 21:00:00'
WHEN (DATEPART(hh, LogTime) >= 15 AND DATEPART(hh, LogTime) < 21) THEN CONVERT(varchar, LogTime, 106) + ' 15:00:00'
WHEN DATEPART(hh, LogTime) >= 7 THEN CONVERT(varchar, LogTime, 106) + ' 7:00:00'
ELSE DATEADD(dd, -1, CONVERT(varchar, LogTime, 106)) + ' 21:00:00'
END
)
SELECT ShiftStart, [33080] AS WMS, [10080] AS H2O_pct, [11081] AS Cu_pct,
(([33080] * (100 - [10080])) / 100) AS DMS,
(([33080] * (100 - [10080])) / 100) * [11081] / 100 AS Cu
FROM
(SELECT ShiftStart, PID, TheSum / TheAmount AS Average
FROM TmpTbl) AS P
PIVOT
(
MAX (Average)
FOR PID IN
( [33080], [10080], [11081])
) AS pvt
ORDER BY ShiftStart
Regards,
Adrian
February 3, 2011 at 3:31 am
I would test both solutions and see which is most efficient with your data.
February 4, 2011 at 1:30 am
Hi Ken,
Thanks for your help. I will compare the two different approaches.
Regards,
Adrian
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply