cross join sql

  • I'm attempting to use INNER JOIN to join numerous tables.

    Here is the code:

    IF OBJECT_ID('tempdb..#tmpRecData') IS NOT NULL 
    DROP TABLE #tmpRecData

    --STEP 1
    SELECT DISTINCT
    pr.ChainID, pr.StoreID, pr.SupplierID, pr.ProductID,
    MAX(CAST(pr.ActiveLastDate AS date)) AS 'Active Date'
    --ChainID, SupplierID, StoreID, InvoiceDate, InvoiceNumber, SupplierInvoiceDate, SupplierInvoiceNumber
    INTO
    #tmpRecData
    FROM
    dbo.[ProductPrices_Retailer] AS pr
    LEFT JOIN
    ProductIdentifiers iden ON pr.ProductID = iden.ProductID
    AND iden.ProductIdentifierTypeID = 2
    WHERE
    pr.ChainID = '129121'
    AND pr.ActiveLastDate > '12/01/2016'
    GROUP BY
    pr.ProductID, pr.ProductName, iden.IdentifierValue,
    pr.ChainID, pr.StoreID, pr.SupplierID

    --STEP 2
    SELECT
    rec.ChainID, rec.StoreID, rec.SupplierInvoiceNumber,
    rec.TransactionTypeID, rec.SupplierID, rec.SaleDateTime,
    rec.ProductID, rec.UPC, rec.ProductDescriptionReported,
    rec.RawProductIdentifier
    FROM
    #tmpRecData t
    INNER JOIN
    dbo.StoreTransactions AS rec WITH (NOLOCK) ON rec.ChainID = T.ChainID
    WHERE
    rec.ChainID = '129121'

    DROP TABLE #tmpRecData

    I receive 4096 rows (Step 1) * 14979 rows (Step 2) = 725007693 rows (725 million)

    This is a large amount of records, yet I used INNER JOIN; therefore, why did it work as CROSS JOIN?

  • The joining clause in step 2 needs to be more selective.

    rec.ChainID = T.ChainID will there be a CROSS JOIN as each row in step 1 relates to each and every row in step 2

  • Why are you using DISTINCT and GROUP BY in your first query out of interest? GROUP BY already puts your data into distinct groups, so the DISTINCT in the SELECT would do little more than cause unnecessary overhead.

    I also suggest against putting your aliases in single quotes; they can come with "gotchas" that result in unexpected behaviour and some formats with them are actually deprecated. I'd advise as well a less ambiguous date format; '12/01/2016' could be 12 January 2016 or 01 December 2016. yyyyMMdd would be a better format (such as '20161201' for 12 January 2016).

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Both tables contain a chain id, store id, supplier id, and product id.  I would expect that you would want to include EACH of those columns in your join criteria.

    I also notice that you haven't used ANY information from your temp table in your second query.  You aren't including any fields from that table and you aren't filtering on any fields either.  Why is it even in your second query?

    Finally, using the (NOLOCK) hint can produce dirty data.  You should only use this if you don't care if there are inconsistencies in your data.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Thom A wrote:

    Why are you using DISTINCT and GROUP BY in your first query out of interest? GROUP BY already puts your data into distinct groups, so the DISTINCT in the SELECT would do little more than cause unnecessary overhead.

    I also suggest against putting your aliases in single quotes; they can come with "gotchas" that result in unexpected behaviour and some formats with them are actually deprecated. I'd advise as well a less ambiguous date format; '12/01/2016' could be 12 January 2016 or 01 December 2016. yyyyMMdd would be a better format (such as '20161201' for 12 January 2016).

    The fields listed in the SELECT clause are a subset of the fields in the GROUP BY clause.  What you said about DISTINCT is only necessarily true if the two field lists are the same.  That being said, I suspect that the two SHOULD be the same.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • drew.allen wrote:

    The fields listed in the SELECT clause are a subset of the fields in the GROUP BY clause.  What you said about DISTINCT is only necessarily true if the two field lists are the same.  That being said, I suspect that the two SHOULD be the same.

    Yes, that's sort of the point. If a query has both a GROUP BY and DISTINCT in the same scope this is a good indicator that either the DISTINCT is pointless or the GROUP BY is wrong; it appears the latter is the case here.

    The OP is grouping in the column ProductName but that isn't in their SELECT. THough I would expect that ProductName would be unique for a single value of ProductID so I suspect that the inclusion of the column in the GROUP BY isn't creating "duplicate" rows. I don't know what IdentifierValue is, and I suspect that it should not be in the GROUP BY at all.

    TL;DR: Remove ProductName, IdentifierValue from the GROUP BY and remove the DISTINCT.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • Thom A wrote:

    drew.allen wrote:

    The fields listed in the SELECT clause are a subset of the fields in the GROUP BY clause.  What you said about DISTINCT is only necessarily true if the two field lists are the same.  That being said, I suspect that the two SHOULD be the same.

    Yes, that's sort of the point. If a query has both a GROUP BY and DISTINCT in the same scope this is a good indicator that either the DISTINCT is pointless or the GROUP BY is wrong; it appears the latter is the case here.

    The OP is grouping in the column ProductName but that isn't in their SELECT. THough I would expect that ProductName would be unique for a single value of ProductID so I suspect that the inclusion of the column in the GROUP BY isn't creating "duplicate" rows. I don't know what IdentifierValue is, and I suspect that it should not be in the GROUP BY at all.

    TL;DR: Remove ProductName, IdentifierValue from the GROUP BY and remove the DISTINCT.

    I suspect that IdentifierValue is a generated ID field that is essentially an alternate to the unique combination of SupplierID-ProductID.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Well thank your help and time everyone

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply