two joins to same view is causing my count to be squared

  • There is some concept I don't understand with doing multiple joins to the same table/view. With the query below, when I do the second join to the view vwEventHistoryWithInterpolatedAndOutlierAdjusted it squares all of the counts that are from the COUNT of rows from the first join. That to me, does not make sense. I would think the joins are completely different because I do not tie them together in any way. Both are pulling sets constrained by NSFCategoryHistory. Does anyone have any suggestions? Thanks!

    SELECT nh.[year], nh.nsfCategoryId, nh.targetPercent, n.nsfCategoryDesc, n.nsfCategoryMax, n.nsfCategoryMin, n.SortOrder,

    COUNT(eh.eventHistoryId) AS ValidEventCount

    --COUNT(eh2.eventHistoryId) AS TotalEventCount

    --ROUND( nh.targetPercent / ROUND( 100 * (CONVERT(DECIMAL, COUNT(eh.eventHistoryId) ) ), 4 ), 4 ) AS Weight

    FROM NSFCategoryHistory nh

    JOIN NSFCategory n ON n.nsfCategoryId = nh.nsfCategoryId

    --JOIN Event e ON e.isActive = 1 AND e.isConsumerEvent = 0

    JOIN vwEventHistoryWithInterpolatedAndOutlierAdjusted eh ON eh.[year] = nh.[year]

    AND eh.netSquareFeet > n.nsfCategoryMin

    AND eh.netSquareFeet <= n.nsfCategoryMax
    AND eh.HasTwoYearsContinuousData = 1
    AND eh.HasEnoughDataPoints = 1
    AND eh.isConsumerEvent = 0
    AND eh.netSquareFeet IS NOT NULL

    JOIN vwEventHistoryWithInterpolatedAndOutlierAdjusted eh2 ON eh2.[year] = nh.[year]
    AND eh2.HasTwoYearsContinuousData = 1
    AND eh2.HasEnoughDataPoints = 1
    AND eh2.isConsumerEvent = 0
    AND eh2.netSquareFeet IS NOT NULL

    WHERE nh.[year] > 2000

    GROUP BY nh.[year], nh.nsfCategoryId, nh.targetPercent, n.nsfCategoryDesc, n.nsfCategoryMax, n.nsfCategoryMin, n.SortOrder

  • Does thsi:

    JOIN vwEventHistoryWithInterpolatedAndOutlierAdjusted eh ON eh.[year] = nh.[year]

    AND eh.netSquareFeet > n.nsfCategoryMin

    AND eh.netSquareFeet <= n.nsfCategoryMax

    AND eh.HasTwoYearsContinuousData = 1

    AND eh.HasEnoughDataPoints = 1

    AND eh.isConsumerEvent = 0

    AND eh.netSquareFeet IS NOT NULL

    .. resolve to 1 record per CategoryHistory ? If not, it's already multiplying out number of rows per year.

    And then, the same join again, but with a less-restrictive filer

    JOIN vwEventHistoryWithInterpolatedAndOutlierAdjusted eh2 ON eh2.[year] = nh.[year]

    AND eh2.HasTwoYearsContinuousData = 1

    AND eh2.HasEnoughDataPoints = 1

    AND eh2.isConsumerEvent = 0

    AND eh2.netSquareFeet IS NOT NULL

    If join 1 already returns more than 1 row per year, and you join again with less filtering restrictions, you expand the resultset out further, multiplying it by the number of eh2 rows per year.

     

  • Yes, the point of those joins is to return many results, because I want to perform a COUNT on the result set. So the first one might return 20-75 rows, and the second one would return 180-260 rows. But like I said, I have to perform COUNTS, so I need to join to those sets like this don't I?

    Is there an easy alternative I am missing as far as obtaining these counts? I need the sets from the vwEventHistoryWithInterpolatedAndOutlierAdjusted to be constrained by year (which is from nsfCategoryHistory as well as the max and min's from nsfCategory. It seems like what I am doing does make sense but it definitely comes out with the wrong counts when we join to both sets, and it makes them both the same, even though we're using two different sets of constraints. I know what the counts should be because I did direct queries on that view and hard coded the same constraints for a bunch of different combinations, and I just know from working with the data the last 6 months about what the counts should be.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply