April 11, 2023 at 10:58 am
I'm attempting to use INNER JOIN to join numerous tables.
Here is the code:
IF OBJECT_ID('tempdb..#tmpRecData') IS NOT NULL
DROP TABLE #tmpRecData
--STEP 1
SELECT DISTINCT
pr.ChainID, pr.StoreID, pr.SupplierID, pr.ProductID,
MAX(CAST(pr.ActiveLastDate AS date)) AS 'Active Date'
--ChainID, SupplierID, StoreID, InvoiceDate, InvoiceNumber, SupplierInvoiceDate, SupplierInvoiceNumber
INTO
#tmpRecData
FROM
dbo.[ProductPrices_Retailer] AS pr
LEFT JOIN
ProductIdentifiers iden ON pr.ProductID = iden.ProductID
AND iden.ProductIdentifierTypeID = 2
WHERE
pr.ChainID = '129121'
AND pr.ActiveLastDate > '12/01/2016'
GROUP BY
pr.ProductID, pr.ProductName, iden.IdentifierValue,
pr.ChainID, pr.StoreID, pr.SupplierID
--STEP 2
SELECT
rec.ChainID, rec.StoreID, rec.SupplierInvoiceNumber,
rec.TransactionTypeID, rec.SupplierID, rec.SaleDateTime,
rec.ProductID, rec.UPC, rec.ProductDescriptionReported,
rec.RawProductIdentifier
FROM
#tmpRecData t
INNER JOIN
dbo.StoreTransactions AS rec WITH (NOLOCK) ON rec.ChainID = T.ChainID
WHERE
rec.ChainID = '129121'
DROP TABLE #tmpRecData
I receive 4096 rows (Step 1) * 14979 rows (Step 2) = 725007693 rows (725 million)
This is a large amount of records, yet I used INNER JOIN; therefore, why did it work as CROSS JOIN?
April 11, 2023 at 11:06 am
The joining clause in step 2 needs to be more selective.
rec.ChainID = T.ChainID will there be a CROSS JOIN as each row in step 1 relates to each and every row in step 2
April 11, 2023 at 11:50 am
Why are you using DISTINCT
and GROUP BY
in your first query out of interest? GROUP BY
already puts your data into distinct groups, so the DISTINCT
in the SELECT
would do little more than cause unnecessary overhead.
I also suggest against putting your aliases in single quotes; they can come with "gotchas" that result in unexpected behaviour and some formats with them are actually deprecated. I'd advise as well a less ambiguous date format; '12/01/2016'
could be 12 January 2016 or 01 December 2016. yyyyMMdd
would be a better format (such as '20161201'
for 12 January 2016).
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
April 11, 2023 at 1:48 pm
Both tables contain a chain id, store id, supplier id, and product id. I would expect that you would want to include EACH of those columns in your join criteria.
I also notice that you haven't used ANY information from your temp table in your second query. You aren't including any fields from that table and you aren't filtering on any fields either. Why is it even in your second query?
Finally, using the (NOLOCK) hint can produce dirty data. You should only use this if you don't care if there are inconsistencies in your data.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
April 11, 2023 at 1:52 pm
Why are you using
DISTINCT
andGROUP BY
in your first query out of interest?GROUP BY
already puts your data into distinct groups, so theDISTINCT
in theSELECT
would do little more than cause unnecessary overhead.I also suggest against putting your aliases in single quotes; they can come with "gotchas" that result in unexpected behaviour and some formats with them are actually deprecated. I'd advise as well a less ambiguous date format;
'12/01/2016'
could be 12 January 2016 or 01 December 2016.yyyyMMdd
would be a better format (such as'20161201'
for 12 January 2016).
The fields listed in the SELECT
clause are a subset of the fields in the GROUP BY
clause. What you said about DISTINCT
is only necessarily true if the two field lists are the same. That being said, I suspect that the two SHOULD be the same.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
April 11, 2023 at 2:00 pm
The fields listed in the
SELECT
clause are a subset of the fields in theGROUP BY
clause. What you said aboutDISTINCT
is only necessarily true if the two field lists are the same. That being said, I suspect that the two SHOULD be the same.
Yes, that's sort of the point. If a query has both a GROUP BY
and DISTINCT
in the same scope this is a good indicator that either the DISTINCT
is pointless or the GROUP BY
is wrong; it appears the latter is the case here.
The OP is grouping in the column ProductName
but that isn't in their SELECT
. THough I would expect that ProductName
would be unique for a single value of ProductID
so I suspect that the inclusion of the column in the GROUP BY
isn't creating "duplicate" rows. I don't know what IdentifierValue
is, and I suspect that it should not be in the GROUP BY
at all.
TL;DR: Remove ProductName
, IdentifierValue
from the GROUP BY
and remove the DISTINCT
.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
April 11, 2023 at 2:08 pm
drew.allen wrote:The fields listed in the
SELECT
clause are a subset of the fields in theGROUP BY
clause. What you said aboutDISTINCT
is only necessarily true if the two field lists are the same. That being said, I suspect that the two SHOULD be the same.Yes, that's sort of the point. If a query has both a
GROUP BY
andDISTINCT
in the same scope this is a good indicator that either theDISTINCT
is pointless or theGROUP BY
is wrong; it appears the latter is the case here.The OP is grouping in the column
ProductName
but that isn't in theirSELECT
. THough I would expect thatProductName
would be unique for a single value ofProductID
so I suspect that the inclusion of the column in theGROUP BY
isn't creating "duplicate" rows. I don't know whatIdentifierValue
is, and I suspect that it should not be in theGROUP BY
at all.TL;DR: Remove
ProductName
,IdentifierValue
from theGROUP BY
and remove theDISTINCT
.
I suspect that IdentifierValue is a generated ID field that is essentially an alternate to the unique combination of SupplierID-ProductID.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
April 13, 2023 at 8:41 pm
Well thank your help and time everyone
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply