Handling Aggregations on a Poorly Designed Database

Question

Handling Aggregations on a Poorly Designed Database

aveek22

Mr or Mrs. 500

Points: 509
More actions
February 17, 2020 at 12:00 am

#3717151

Comments posted to this topic are about the item Handling Aggregations on a Poorly Designed Database
Aveek has been working as an Analytics Engineer for the past couple of years now. His main areas of interests is SQL Server, SSIS, and SSAS. On his leisure, he likes to travel and is also an amateur photographer.
https://twitter.com/avikoleum
https://www.linkedin.com/in/aveekd22/

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

jmabee 26405 Say Hey Kid Points: 666 More actions · Answer 1

I think you could also just use a CTE such as:

;WITH CTE AS
(
SELECT 
 us.Name ,us.TeamID
 ,SUM(wl.HoursWorked) HoursWorked,
 ROW_NUMBER() OVER(PARTITION BY NAME ORDER BY Name) RN
FROM Users us
INNER JOIN WorkLog wl ON wl.UserKey = us.UserKey
GROUP BY us.Name ,us.TeamID
)
SELECT Name,HoursWorked
from CTE 
WHERE RN = 1

ScottPletcher SSC Guru Points: 101238 More actions · Answer 2

How's about this instead?!:

SELECT
    us.Name
    ,SUM(wl.HoursWorked) / COUNT(DISTINCT us.TeamID) AS HoursWorked
FROM Users us
INNER JOIN WorkLog wl ON wl.UserKey = us.UserKey
GROUP BY us.Name

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

SoHelpMeCodd SSCertifiable Points: 5871 More actions · Answer 3

The original design should have imposed a unique constraint on WorkLog.UserKey. Removing such a unique constraint (if done, in order to accommodate new design requirements) can have far reaching and unexpected consequences (beyond this view) - such as SQL that has been embedded in client application code.

The following assumes only 1 set of duplicate user hours will be inserted by the new design (i.e., WorkLog will remain designed as-is - there will be no splitting of a user's work hours between teams, and the same user will not work the same number of hours in different WorkLog entries that are being summed):

SELECT
us.Name
,SUM(distinct wl.HoursWorked) HoursWorked
FROM Users us
INNER JOIN WorkLog wl ON wl.UserKey = us.UserKey
GROUP BY us.Name

If the real tables and their uses are more complex, consider aggregating via the windowing function OVER(PARTITION BY ...), where "..." can be what I am imagining to be a column that is named (or serving the same purpose as a) [day] - using this can eliminate some of my assumptions.

I did not compare execution plans or statistics io.

With respect to the original design, there is also a possibility to sp_rename the original tables, create new base tables (with new names) that are normalized per the new design requirements, import the original tables' data into the new tables, and create update-able views that are named identically to the original tables (and reference the new tables). While that possibility is more work, it allows a new design to be more formally, publicly, and robustly declared in SQL, and yet still allow legacy applications to address what they believe to be the tables (but are now views). As a bonus, those who follow you (years later) will see your intent, formally declared :). There can be concerns with this approach (such as table or index maintenance activities), but usually "a rose by any other name smells just as sweet".

This reply was modified 6 years ago by SoHelpMeCodd.
This reply was modified 6 years ago by SoHelpMeCodd.
This reply was modified 6 years ago by SoHelpMeCodd.
This reply was modified 6 years ago by SoHelpMeCodd.

cstater Old Hand Points: 377 More actions · Answer 4

It seems likely that reporting by team would be a feature of the application. Otherwise, why have teams? The data structure cannot support this without modification. I guess my question is, at what point do you just say 'Your application data structure requires modification in order to handle NEW requirements.'? Instead of band-aiding unsustainable solutions into an already poor design.