Query Help - Denormalising Perhaps ?

Question

Post reply

Query Help - Denormalising Perhaps ?

AdrianJ

SSC Journeyman

Points: 91
More actions
January 31, 2011 at 3:39 am

#246030

Hi,
Some help with something new for me wrt SELECT queries.
I have data from a mining operation in a simple table with four fields as follows :
TimeStamp; IDNumber; Value; Period;
TimeStamp includes hh:mm:ss to allow me to enter hourly or shiftly data. The IDNumber is an integer that tells me what the parameter is e.g. tonnage or lab value for a particular stream, value is the data value and period tells me whether I am looking at hourly, shiftly or daily data.
By convention, I store daily values against 07:00:00 in the morning with a Period of 'D', but I may also have hourly and/or shiftly values for that parameter, so I could also have two other values at 07:00:00 but with different periods of 'S1' for the first shift of the day or 'H' for an hourly value.
My problem is best explained with a specific example :
The wet tonnage of ore has an IDNumber of 33080 and the contained % moisture of the ore has an IDNumber of 10080. I want to calculate daily dry tonnages (IDNo = 4080).
I get three shift values a day for the wet tonnage value :
'2011-01-15 07:00:00', 33080, 1000, 'S1'
'2011-01-15 15:00:00', 33080, 1500, 'S2'
'2011-01-15 23:00:00', 33080, 2000, 'S3'
I might get arbitrary hourly values for the contained % moisture :
'2011-01-15 07:00:00', 10080, 12.1, 'H'
'2011-01-15 09:00:00', 10080, 13.2, 'H'
'2011-01-15 11:00:00', 10080, 14.3, 'H'
'2011-01-15 13:00:00', 10080, 15.4, 'H' (Average for these four values is 13.75)
'2011-01-15 16:00:00', 10080, 13.5, 'H'
'2011-01-15 18:00:00', 10080, 14.6, 'H'
'2011-01-15 20:00:00', 10080, 15.7, 'H' (Ave = 14.60)
'2011-01-16 00:00:00', 10080, 14.8, 'H'
'2011-01-16 02:00:00', 10080, 15.9, 'H'
'2011-01-16 04:00:00', 10080, 16.0, 'H' (Ave = 15.57)
Now, I want to calculate an arithmetic average of all the moistures that fall within the 'S1' period which is >=07:00 and < 15:00 and apply that to the wet tonnage measured during 'S1'. Similarly with each period so that I can come up with a single daily total record for the derived dry tonnage :
Total Dry Tonnes = 1000*(100-13.75)/100 + 1500*(100-14.60)/100+2000*(100-15.57)/100 = 3832.17
and the corresponding record which I may wish to insert into the table would be . . .
'2011-01-15 07:00:00', 4080, 3832.17, 'D'
I'm a bit clueless about where to start writing this query because all the data is in the same table. If I had the moistures in a separate table I think this would be easier.
Perhaps this needs to be done in a stored procedure with looping to do the aggregation ? Perhaps it could be done in a view so I don't need to store the calculated results (this would be best).
TIA for any help,
Regards,
Adrian

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply

Ken McKelvey SSCoach Points: 18863 More actions · Answer 1

A self join should work.

Something like:

-- *** Test Data ***

CREATE TABLE #t

(

[TimeStamp] datetime NOT NULL

,IDNumber int NOT NULL

,Value decimal(19, 2) NOT NULL

,Period varchar(2) NOT NULL

)

INSERT INTO #t

SELECT '20110115 07:00:00', 33080, 1000, 'S1'

UNION ALL SELECT '20110115 15:00:00', 33080, 1500, 'S2'

UNION ALL SELECT '20110115 23:00:00', 33080, 2000, 'S3'

UNION ALL SELECT '20110115 07:00:00', 10080, 12.1, 'H'

UNION ALL SELECT '20110115 09:00:00', 10080, 13.2, 'H'

UNION ALL SELECT '20110115 11:00:00', 10080, 14.3, 'H'

UNION ALL SELECT '20110115 13:00:00', 10080, 15.4, 'H'

UNION ALL SELECT '20110115 16:00:00', 10080, 13.5, 'H'

UNION ALL SELECT '20110115 18:00:00', 10080, 14.6, 'H'

UNION ALL SELECT '20110115 20:00:00', 10080, 15.7, 'H'

UNION ALL SELECT '20110116 00:00:00', 10080, 14.8, 'H'

UNION ALL SELECT '20110116 02:00:00', 10080, 15.9, 'H'

UNION ALL SELECT '20110116 04:00:00', 10080, 16.0, 'H'

-- *** End Test Data ***

;WITH ShiftTonnageMoisture

AS

(

SELECT S.[TimeStamp] AS ShiftTime

,S.Value AS WetTonnage

,COALESCE(ROUND(AVG(H.value), 2), 1.0) AS Moisture

FROM #t S

LEFT JOIN #t H -- outer join just in case no H values

ON H.[TimeStamp] >= S.[TimeStamp]

AND H.[TimeStamp] < DATEADD(hour, 8, S.[TimeStamp])

AND H.Period = 'H'

WHERE LEFT(S.Period, 1) = 'S' -- shift

AND S.IDNumber = 33080 -- wet tonnage

GROUP BY S.[TimeStamp], S.Value

)

-- INSERT INTO #t

SELECT DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0)) AS [TimeStamp]

,4080 AS IDNumber

,SUM((WetTonnage * (100 - Moisture)) / 100) AS Value

,'D' AS Period

FROM ShiftTonnageMoisture S

--WHERE NOT EXISTS

--(

--SELECT *

--FROM #t T

--WHERE T.[TimeStamp] = DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))

--AND T.Period = 'D'

--)

GROUP BY DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))

AdrianJ SSC Journeyman Points: 91 More actions · Answer 2

Ken, many thanks for your troubles. It will take me a while to digest what you have done, but it runs perfectly with the correct answer for the first one.

Although in this case it is not the "meat" of the solution, I think this is the second time I have had a reply with a construct that begins with . . .

With [Alias] AS ( SELECT . . . )

SELECT . . . . FROM [Alias]

which I gather is a common table expression. Is there a suitable book that springs to mind that would teach me more about how to write queries such as this one ? (I am sure a forum search would reveal many suggestions !)

Regards,

Adrian

Ken McKelvey SSCoach Points: 18863 More actions · Answer 3

A CTE, except for the special case of recursion, behaves just like a non-persisted view that can only be used in the currect statement.

ie Just think of a CTE as a VIEW.

CTEs are well documented in BOL. (Books online - the help which comes with SQL)

AdrianJ SSC Journeyman Points: 91 More actions · Answer 4

OK Ken (et al), I have gone through Ken's solution.

In my first posting, I simplified the problem and gave you a very limited data set that included only wet mass and moisture. As a result, you did not have to specify that your join should only be to records with an IDNumber of 10080 for the moisture.

I have provided a new data set (in the way that one should provide test data to this forum I hope) that uses my actual field names (I tried to provide descriptive field names in my original posting) This data set has shiftly data with a PID(=ParameterID) of 33080 which is the Wet Mass and I have included hourly data for moisture and % Cu (PID's = 10080 and 11081 respectively). As it turns out, there is only one hourly moisture value (PID=10080) per shift, but this is not necessarily the case and there are multiple hourly % Cu values per shift which need to be averaged.

Would one have to run a separate query per species (PID) that has related hourly values that need to be aggregated or can one return a Daily subtotal record per PID from a single query?

If one wanted to use a default value for the case that there were zero related hourly values for a particular shift, how would one do this ? In this data set, I already mentioned there are only single hourly values per shift for Moisture. If during one of the shifts there was missing moisture data, I'd like to provide a default value of say 15%.

One final real-life complication in this problem is that the mining operation does not run equal shifts, so this would not allow use of the condition AND H.[LogTime] < DATEADD(hour, 8, S.[LogTime]) in the join where 8 hours has been hard coded. The shifts run 07:00 to 15:00 (8 hours), 15:00 to 21:00 (6 hours) and 21:00 to 07:00 (10 hours).

The new set of test data is . . .

--===== If the test table already exists, drop it

IF OBJECT_ID('RUM..tblTestData','U') IS NOT NULL

DROP TABLE tblTestData

--===== Make the Test Table

CREATE TABLE [dbo].[tblTestData](

[LogTime] [datetime] NOT NULL,

[PID] [int] NOT NULL,

[Value] [float] NOT NULL,

[Period] [varchar](2) NOT NULL)

INSERT INTO tblTestData

(LogTime, PID, Value, Period)

SELECT 'Nov 1 2010 7:00AM','33080','201.6','S1' UNION ALL

SELECT 'Nov 1 2010 8:00AM','10080','12.25','H' UNION ALL

SELECT 'Nov 1 2010 8:00AM','11081','1.48','H' UNION ALL

SELECT 'Nov 1 2010 3:00PM','33080','451.28','S2' UNION ALL

SELECT 'Nov 1 2010 6:00PM','10080','14.66','H' UNION ALL

SELECT 'Nov 1 2010 6:00PM','11081','3.24','H' UNION ALL

SELECT 'Nov 1 2010 8:00PM','11081','1.08','H' UNION ALL

SELECT 'Nov 1 2010 9:00PM','33080','1295.03','S3' UNION ALL

SELECT 'Nov 1 2010 10:00PM','10080','10.62','H' UNION ALL

SELECT 'Nov 1 2010 10:00PM','11081','1.32','H' UNION ALL

SELECT 'Nov 2 2010 12:00AM','11081','1.92','H' UNION ALL

SELECT 'Nov 2 2010 6:00AM','11081','1.48','H' UNION ALL

SELECT 'Nov 2 2010 7:00AM','33080','1255.42','S1' UNION ALL

SELECT 'Nov 2 2010 8:00AM','10080','13.72','H' UNION ALL

SELECT 'Nov 2 2010 8:00AM','11081','1.39','H' UNION ALL

SELECT 'Nov 2 2010 10:00AM','11081','1.52','H' UNION ALL

SELECT 'Nov 2 2010 12:00PM','11081','1.48','H' UNION ALL

SELECT 'Nov 2 2010 3:00PM','33080','731.15','S2' UNION ALL

SELECT 'Nov 2 2010 6:00PM','10080','13.95','H' UNION ALL

SELECT 'Nov 2 2010 6:00PM','11081','1.62','H' UNION ALL

SELECT 'Nov 2 2010 8:00PM','11081','1.65','H' UNION ALL

SELECT 'Nov 2 2010 9:00PM','33080','1652.1','S3' UNION ALL

SELECT 'Nov 2 2010 10:00PM','10080','15.07','H' UNION ALL

SELECT 'Nov 2 2010 10:00PM','11081','1.72','H' UNION ALL

SELECT 'Nov 3 2010 12:00AM','11081','1.53','H' UNION ALL

SELECT 'Nov 3 2010 2:00AM','11081','1.78','H' UNION ALL

SELECT 'Nov 3 2010 4:00AM','11081','1.56','H' UNION ALL

SELECT 'Nov 3 2010 6:00AM','11081','1.64','H'

I have duplicated the SQL script provided by Ken, but I have substituted the new field names to run with the above data set . . .

WITH ShiftTonnageMoisture

AS

(

SELECT S.[LogTime] AS ShiftTime

,S.Value AS WetTonnage

,COALESCE(Round(AVG(H.Value),2), 1.0) AS Moisture

FROM tblTestData S

LEFT JOIN tblTestData H -- outer join just in case no H values

ON H.[LogTime] >= S.[LogTime]

AND H.[LogTime] < DATEADD(hour, 8, S.[LogTime])

AND H.Period = 'H'

WHERE LEFT(S.Period, 1) = 'S' -- shift

AND S.PID = 33080 -- wet tonnage

GROUP BY S.[LogTime], S.Value

)

-- INSERT INTO tblTestData

SELECT DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0)) AS [LogTime]

,4080 AS PID

,SUM((WetTonnage * (100 - Moisture)) / 100) AS Value

,'D' AS Period

FROM ShiftTonnageMoisture S

--WHERE NOT EXISTS

--(

-- SELECT *

-- FROM tblTestData T

-- WHERE T.[LogTime] = DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))

-- AND T.Period = 'D'

--)

GROUP BY DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))

Edit Added later :

I forgot to mention that while the moisture is used to calculate the Dry Mass (PID = 4080) from the Wet Mass, the %Cu(PID=11081) is used to calculate the Tons of Contained Copper (PID = 12080) from the Dry Mass. So, the actual calculations are different in each case :

DryMass = WetMass * (100-%Moisture)/100

TonsCu = DryMass * %Cu/100

The results for the first two days of the month by manual calculation are :

INSERT INTO tblTestData

(LogTime, PID, Value, Period)

SELECT 'Nov 1 2010 7:00AM','4080','1719.52','D' UNION ALL

SELECT 'Nov 1 2010 7:00AM','12080','29.1481','D' UNION ALL

SELECT 'Nov 2 2010 7:00AM','4080','3115.46','D' UNION ALL

SELECT 'Nov 2 2010 7:00AM','12080','49.2327','D'

Regards,

Adrian

Ken McKelvey SSCoach Points: 18863 More actions · Answer 5

Just do another self join and sort it from there.

Something like the following:

DECLARE @DefaultMoisture float = 15.0

,@DefaultCu float = 2.0;

WITH ShiftTonnageMoistureCu

AS

(

SELECT S.[LogTime] AS ShiftTime

,S.Value AS WetTonnage

,COALESCE(AVG(M.Value), @DefaultMoisture) AS MoistureP

,COALESCE(AVG(C.Value), @DefaultCu) AS CuP

FROM tblTestData S

LEFT JOIN tblTestData M -- outer join just in case no M values

ON M.[LogTime] >= S.[LogTime]

AND M.[LogTime] <

DATEADD(hour, CASE RIGHT(S.period, 1) WHEN '1' THEN 8 WHEN '2' THEN 6 WHEN '3' THEN 10 END, S.[LogTime])

AND M.Period = 'H'

AND M.PID = 10080 -- Moisture %

LEFT JOIN tblTestData C -- outer join just in case no C values

ON C.[LogTime] >= S.[LogTime]

AND C.[LogTime] <

DATEADD(hour, CASE RIGHT(S.period, 1) WHEN '1' THEN 8 WHEN '2' THEN 6 WHEN '3' THEN 10 END, S.[LogTime])

AND C.Period = 'H'

AND C.PID = 11081 -- CU %

WHERE S.PID = 33080 -- wet tonnage shift

GROUP BY S.[LogTime], S.Value

)

, Calcs

AS

(

SELECT DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0)) AS [LogTime]

,ROUND(SUM((WetTonnage * (100 - MoistureP)) / 100), 2) AS DryMass

,ROUND(SUM(((WetTonnage * (100 - MoistureP)) / 100) * CuP /100), 4) AS Cu

FROM ShiftTonnageMoistureCu S

GROUP BY DATEADD(hour, 7, DATEADD(d, DATEDIFF(d, 0, S.ShiftTime), 0))

)

, Statics (PID)

AS

(

SELECT 4080 -- DryMass

UNION ALL SELECT 12080 -- Cu

)

-- INSERT INTO tblTestData

SELECT C.LogTime, S.PID

,CASE

WHEN S.PID = 4080

THEN C.DryMass

ELSE C.Cu

END AS Value

,'D' AS Period

FROM Calcs C

CROSS JOIN Statics S

--WHERE NOT EXISTS

--(

--SELECT *

--FROM tblTestData T

--WHERE T.[LogTime] = C.LogTime

--AND T.Period = 'D'

--AND T.PID = S.PID

--)

AdrianJ SSC Journeyman Points: 91 More actions · Answer 6

Ken, thank you so much. I now have more work to do understanding what you have done. The result set from your query is EXACTLY what I need.

As a minor side issue, the shift lengths of 8, 6 and 10 are hard coded in the conditional clauses of the joins. To make this query more flexible, would it be reasonable practice to declare some variables for these shift lengths and then read them from a lookup table such as :

DECLARE @S1L shortint, @S2L shortint, @S3L shortint

SET @S1L = (SELECT Shift1Length FROM tblLookup);

SET @S2L = (SELECT Shift2Length FROM tblLookup);

SET @S3L = (SELECT Shift3Length FROM tblLookup);

The hard coded numbers would then be replaced with the relevant variables.

To get back to your solution, I have a proposed solution from a friend that doesn't use any joins, but yields the same correct answers although only to a shift level. This script pivots the results which I don't require, so I will have to modify it to remove the pivot and give Daily values. Do you have any comment on his method vs yours for this particular problem ?

WITH TmpTbl AS

(

SELECT PID, SUM(Value) AS TheSum, COUNT(Value) AS TheAmount,

CASE

WHEN DATEPART(hh, LogTime) >= 21 THEN CONVERT(varchar, LogTime, 106) + ' 21:00:00'

WHEN (DATEPART(hh, LogTime) >= 15 AND DATEPART(hh, LogTime) < 21) THEN CONVERT(varchar, LogTime, 106) + ' 15:00:00'

WHEN DATEPART(hh, LogTime) >= 7 THEN CONVERT(varchar, LogTime, 106) + ' 7:00:00'

ELSE DATEADD(dd, -1, CONVERT(varchar, LogTime, 106)) + ' 21:00:00'

END AS ShiftStart

FROM tblTestData

GROUP BY PID,

CASE

WHEN DATEPART(hh, LogTime) >= 21 THEN CONVERT(varchar, LogTime, 106) + ' 21:00:00'

WHEN (DATEPART(hh, LogTime) >= 15 AND DATEPART(hh, LogTime) < 21) THEN CONVERT(varchar, LogTime, 106) + ' 15:00:00'

WHEN DATEPART(hh, LogTime) >= 7 THEN CONVERT(varchar, LogTime, 106) + ' 7:00:00'

ELSE DATEADD(dd, -1, CONVERT(varchar, LogTime, 106)) + ' 21:00:00'

END

)

SELECT ShiftStart, [33080] AS WMS, [10080] AS H2O_pct, [11081] AS Cu_pct,

(([33080] * (100 - [10080])) / 100) AS DMS,

(([33080] * (100 - [10080])) / 100) * [11081] / 100 AS Cu

FROM

(SELECT ShiftStart, PID, TheSum / TheAmount AS Average

FROM TmpTbl) AS P

PIVOT

(

MAX (Average)

FOR PID IN

( [33080], [10080], [11081])

) AS pvt

ORDER BY ShiftStart

Regards,

Adrian

Ken McKelvey SSCoach Points: 18863 More actions · Answer 7

I would test both solutions and see which is most efficient with your data.

AdrianJ SSC Journeyman Points: 91 More actions · Answer 8

Hi Ken,

Thanks for your help. I will compare the two different approaches.

Regards,

Adrian