February 25, 2017 at 6:41 pm
I have two tables like this:
Items
ItemID (PK) | Title |
---|---|
101 | iPhone 5 |
102 | iPhone 6 |
103 | Blackberry Storm |
104 | Samsung Galaxy S1 |
105 | Samsung Galaxy S6 |
Sales
SaleID (PK) | Reference | Title |
---|---|---|
1 | 103-B00 | Blackberry Storm |
2 | 101-IK5 | iPhone 5 |
3 | 105-ISG | Samsung Galaxy S6 |
4 | 104-UN3 | Samsung Galaxy S1 |
5 | 101-IK5 | iPhone 5 |
6 | 102-HWB | iPhone 6 |
7 | 105-ISG | Samsung Galaxy S6 |
8 | 101-IK5 | iPhone 5 |
9 | 102-HWB | iPhone 6 |
I want to join them using "ItemID" column in Items table and "Reference" column in Sales table. As you can see, the values of two columns are not matching, but the part before the hyphen in each value of "Reference" column is matched with values in "ItemID" column. How can I join them? I know it's odd to have two columns with such values, but I did not create these tables and must play with what's available. Thanks
February 25, 2017 at 7:21 pm
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This thing is addressing problems that dont exist. Its solution-ism at its worst. We are dumbing down machines that are inherently superior. - Gilfoyle
February 25, 2017 at 9:07 pm
DECLARE @Ref VARCHAR(7) = '103-B00';
SELECT LEFT(@Ref,CHARINDEX('-',@Ref,1)-1);
February 26, 2017 at 4:24 pm
The problem is that isolation of the part of the column from your Sales table is going to need to be calculated for every row in the table. This isn't every row in the result set (because it isn't built yet) but every row in the table. As the number of rows goes up, so will the performance problems. This is only of the problems with having values concatenated into one column instead of keeping the values separate.
If you're stuck with the existing structure, can you add a column? Specifically, can you add a persisted computed column to hold the value of the foreign key? If so, you'll isolate the calculation of your value to when the row is written and save yourself from having to calculate it over the whole table every time the tables are joined. If not, then you'll save yourself some work in the future by using the approach pietlinden suggested. The work you save will be when the values to isolate goes above 999, when you'll have to switch to finding the position of the delimiter anyway.
The best way is to fix the table structure, but I understand if you're stuck with it.
March 22, 2017 at 3:31 am
This was removed by the editor as SPAM
March 22, 2017 at 4:51 am
CREATE TABLE #items ---- Build test tables, credit to the great Jeff Moden for this bit.
(
ItemID INT IDENTITY(101,1)
,Title VARCHAR(6)
)
ALTER TABLE #items
ADD CONSTRAINT PK_ItemID PRIMARY KEY CLUSTERED (ItemID) WITH FILLFACTOR = 90
GO
INSERT INTO #items
SELECT TOP 100 -- Create 100 unique items
Title = CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A'))
+ CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A'))
+ CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A'))
+ CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A'))
+ CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A'))
+ CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A'))
FROM sys.all_columns ac1
CROSS JOIN sys.all_columns ac2
GO
CREATE TABLE #sales
(
SaleID INT IDENTITY(1,1)
,Reference VARCHAR(20)
,Title VARCHAR(40)
)
ALTER TABLE #sales
ADD CONSTRAINT PK_SaleID PRIMARY KEY CLUSTERED (SaleID) WITH FILLFACTOR = 90
GO
INSERT INTO #sales
SELECT ---- Add 10000 sales for each item, 1000000 rows in total
CAST(i.itemID AS VARCHAR(6)) + '-' + CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A'))
+ CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A'))
+ CHAR(ABS(CHECKSUM(NEWID())) % (ASCII('Z')-ASCII('A')+1) +ASCII('A')) AS Reference
,i.Title
FROM #Items i
GO 10000
/*
The actual query starts here
*/
SELECT
i.ItemID
,i.Title
,der.CleanReference
,der.Reference
,der.Title
,der.SaleID
FROM #Items i
JOIN (
SELECT
CAST(LEFT(s.Reference,CHARINDEX('-',s.Reference)-1) AS INT) AS CleanReference
,s.Reference
,s.Title
,s.SaleID
FROM #Sales s
) der ON der.CleanReference = i.ItemID
DROP TABLE
#Items
,#sales
Does this do what you want? I can't help thinking it's too obvious and there's a glaring flaw though, The whole thing runs in about 22 seconds when the tables are built each time and in about 10 seconds when the tables aren't dropped. That's on a crappy test box so on something with a bit of oomph it might be pretty quick.
--- Edited to make the data more realistic
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
March 22, 2017 at 6:25 am
SELECT *
FROM Items i
INNER JOIN Sales s
ON s.Reference LIKE CAST(i.ItemID AS VARCHAR(50)) + '-%'
-- Retains SARGability of s.Reference
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 22, 2017 at 6:53 am
ChrisM@Work - Wednesday, March 22, 2017 6:25 AMSELECT *
FROM Items i
INNER JOIN Sales s
ON s.Reference LIKE CAST(i.ItemID AS VARCHAR(50)) + '-%'
-- Retains SARGability of s.Reference
I'm not sure what the advantages of that are Chris, I was under the impression that functions in JOINs was to be avoided. It's considerably slower than my version on my laptop against the same million row tables.
--Edited to correct a mistake in terminology.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
March 22, 2017 at 7:49 am
Neil Burton - Wednesday, March 22, 2017 6:53 AMChrisM@Work - Wednesday, March 22, 2017 6:25 AMSELECT *
FROM Items i
INNER JOIN Sales s
ON s.Reference LIKE CAST(i.ItemID AS VARCHAR(50)) + '-%'
-- Retains SARGability of s.Reference
I'm not sure what the advantages of that are Chris, I was under the impression that functions in WHERE clauses were to be avoided. It's considerably slower than my version on my laptop against the same million row tables.
Put an index on s.reference and restrict the output columns to i.ItemID and s.Reference.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 22, 2017 at 7:59 am
CREATE INDEX ix_Stuff ON #sales (Reference) INCLUDE (Title)
SELECT i.ItemID, s.Title
FROM #Items i
INNER JOIN #Sales s
ON s.Reference LIKE CAST(i.ItemID AS VARCHAR(50)) + '-%'
-- 1,000,000 / 00:00:04
Execution plan shows an index seek for #Sales.
Edit: "blackholing" the output to eliminate return & display of the 1000000-row result set.
SET
STATISTICS TIME ON
DECLARE @ItemID INT, @Title VARCHAR(40)
SELECT @ItemID = i.ItemID, @Title = s.Title
FROM #Items i
INNER JOIN #Sales s
ON s.Reference LIKE CAST(i.ItemID AS VARCHAR(50)) + '-%'
SET STATISTICS TIME OFF
-- (1 row(s) affected)
-- SQL Server Execution Times:
-- CPU time = 1214 ms, elapsed time = 402 ms.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 22, 2017 at 8:33 am
I see. It's logical that a function on the right of an ON (which is what I meant when I said WHERE clause above) won't affect SARGability.
It certainly speeds things up having the index in place. Over the million rows my version still seems to edge it though. When there is a WHERE clause in place mine is definitely faster.SELECT
i.ItemID
,der.Reference
FROM #Items i
JOIN (
SELECT
CAST(LEFT(s.Reference,CHARINDEX('-',s.Reference)-1) AS INT) AS CleanReference
,s.Reference
FROM #Sales s
) der ON der.CleanReference = i.ItemID
WHERE der.Reference LIKE '10%'
SELECT
i.itemid
,s.reference
FROM #Items i
INNER JOIN #Sales s ON s.Reference LIKE CAST(i.ItemID AS VARCHAR(50)) + '-%'
WHERE s.Reference LIKE '10%'
Out of those two queries mine is consistently much quicker.
How to post a question to get the most help http://www.sqlservercentral.com/articles/Best+Practices/61537
March 22, 2017 at 9:02 am
Neil Burton - Wednesday, March 22, 2017 8:33 AMI see. It's logical that a function on the right of an ON (which is what I meant when I said WHERE clause above) won't affect SARGability.It certainly speeds things up having the index in place. Over the million rows my version still seems to edge it though. When there is a WHERE clause in place mine is definitely faster.
SELECT
i.ItemID
,der.Reference
FROM #Items i
JOIN (
SELECT
CAST(LEFT(s.Reference,CHARINDEX('-',s.Reference)-1) AS INT) AS CleanReference
,s.Reference
FROM #Sales s) der ON der.CleanReference = i.ItemID
WHERE der.Reference LIKE '10%'SELECT
i.itemid
,s.reference
FROM #Items iINNER JOIN #Sales s ON s.Reference LIKE CAST(i.ItemID AS VARCHAR(50)) + '-%'
WHERE s.Reference LIKE '10%'Out of those two queries mine is consistently much quicker.
It's quicker because a hash match will process 1000000 rows faster than 1000000 individual seeks. LIKE won't play with a hash join.
If you change the row restriction to a filter on #Item (quite likely in the wild) you see the nested loops join (with seeks) overtakes the blunt-force hash join, by a significant amount:
--Run each batch individually several times
DECLARE@ItemID INT, @Reference VARCHAR(50),@Title VARCHAR(50) = (SELECT TOP 1Title FROM #Items)
SET STATISTICS TIME ON
SELECT
@ItemID = i.ItemID
,@Reference= der.Reference
FROM#Items i
JOIN (
SELECT
CAST(LEFT(s.Reference,CHARINDEX('-',s.Reference)-1) AS INT) ASCleanReference
,s.Reference
FROM#Sales s
)der ON der.CleanReference = i.ItemID
WHERE i.Title= @Title
SET STATISTICS TIME OFF
--CPU time = 672 ms, elapsed time = 134ms.
GO
DECLARE@ItemID INT, @Reference VARCHAR(50),@Title VARCHAR(50) = (SELECT TOP 1Title FROM #Items)
SET STATISTICS TIME ON
SELECT
@ItemID = i.itemid
,@Reference= s.reference
FROM#Items i
INNER JOIN#Sales s
ON s.ReferenceLIKE CAST(i.ItemIDAS VARCHAR(50)) + '-%'
WHERE i.Title= @Title
SET STATISTICS TIME OFF
--CPU time = 16 ms, elapsed time = 9 ms.
GO
DECLARE@ItemID INT, @Reference VARCHAR(50),@Title VARCHAR(50) = (SELECT TOP 1Title FROM #Items)
SET STATISTICS TIME ON
SELECT
@ItemID = i.itemid
,@Reference= s.reference
FROM#Items i
INNER JOIN#Sales s
ON s.ReferenceBETWEEN CAST(i.ItemIDAS VARCHAR(50)) AND CAST(i.ItemIDAS VARCHAR(50)) + '¬'
WHERE i.Title= @Title
SET STATISTICS TIME OFF
--CPU time = 16 ms, elapsed time = 5 ms.
GO
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply