September 22, 2016 at 7:25 am
Jason A. Long (9/22/2016)
One issue that I would take with Sergiy's suggestion is that it can't be used in an iTVF. The resulting function will either be a scalar function or mTVF.
A way to get around this, might be to alter/recreate the iTVF I posted on a regular basis, maybe through a trigger in the table. I haven't tried it but it's an idea.
September 22, 2016 at 7:33 am
Very slick, I like it. I was using the priority as a group priority. the random words are group 1, the manufacture was group 2 and the colors are group 3. if I was to use it in its current form I would just multiple the priority by 1000 or something.
I wrote a user UI to keep me out of the maintenance business. and I WAS going to use the dynamically writen function, but this solution its far better.
Thanks for everyone's help!!!
September 22, 2016 at 8:07 am
Sergiy's suggestion looks good but does not give provide for order of extracted words...
so 'IPHONE 64GB ROSE 6S SGL' will not be an exact match for 'IPHONE 6S ROSE 64GB SGL'
DECLARE @ReplacementCodes TABLE (
priority INT NOT NULL,
Code VARCHAR(50)
)
INSERT INTO @ReplacementCodes ( priority, Code )
VALUES
(1, ' ROSE'), (2, ' KIT'),(3, ' SGL'),(4, ' SILVER'), (5, ' BLK'), (6, ' GRAY'),(7, ' PLUS'), (8, ' BK')
DECLARE @Model VARCHAR(200)
SET @Model = 'IPHONE 64GB ROSE 6S SGL'
SELECT @Model = REPLACE(@Model, code, '')
FROM @ReplacementCodes rc
ORDER BY rc.priority
SELECT @Model
DECLARE @Model2 VARCHAR(200)
SET @Model2 = 'IPHONE 6S ROSE 64GB SGL'
SELECT @Model2 = REPLACE(@Model2, code, '')
FROM @ReplacementCodes rc
ORDER BY rc.priority
SELECT @Model2
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
September 22, 2016 at 8:13 am
I was curious on my previous idea, which seems to be working fine. Although, with more time, I would improve the code.
CREATE TABLE ReplacementCodes(
type_priority int,
ind_priority int,
code varchar(50),
Replacement varchar(50)
);
GO
CREATE TRIGGER TX_ReplacementCodes ON dbo.ReplacementCodes
FOR INSERT, UPDATE, DELETE
AS
DECLARE @FunctionSQL nvarchar(max) = '@ModName';
SELECT @FunctionSQL = CHAR(10) + 'REPLACE(' + @FunctionSQL
+ CHAR(10) + ',' + QUOTENAME( code, '''') + ','''')'
FROM ReplacementCodes
ORDER BY type_priority , ind_priority ;
SET @FunctionSQL = 'CREATE FUNCTION [dbo].[itvf_Model_Name_Compare]
(
@ModName VARCHAR(100)
)
RETURNS TABLE WITH SCHEMABINDING
AS
RETURN
SELECT LTRIM(RTRIM(
REPLACE(REPLACE(' + @FunctionSQL + ','' '','' ''),'' '','' ''))) AS StrippedModelName;';
IF OBJECT_ID( 'itvf_Model_Name_Compare') IS NOT NULL
DROP FUNCTION itvf_Model_Name_Compare;
EXEC( @FunctionSQL);
GO
INSERT INTO ReplacementCodes
VALUES
--special words
( 1, 1 , ' SNGL','')
,( 1, 2 , ' KIT','')
,( 1, 3 , ' KT','')
,( 1, 4 , ' KI','')
,( 1, 5 , ' SGL','')
,( 1, 6 , ' XSGL','')
,( 1, 7 , ' SNG','')
,( 1, 8 , ' XCVR','')
,( 1, 9 , ' HOTSPOT','')
,( 1, 10, ' HTSPT','')
,( 1, 11, ' TRANSCEIVER','')
,( 1, 12, ' HANDSET','')
,( 1, 13, ' SG','')
,( 1, 14, ' DEVICE','')
,( 1, 15, ' DVC','')
,( 1, 16, ' LTE','')
,( 1, 17, ' TABLET','')
,( 1, 18, 'CPO ','')
--manufactors
,( 2, 1 , 'IPHONE ','')
,( 2, 2 , 'HTC ','')
,( 2, 3 , 'LG ','')
,( 2, 4 , 'IPH ','')
,( 2, 5 , 'SAM ','')
,( 2, 6 , 'SAMSUNG ','')
,( 2, 7 , 'PALM ','')
,( 2, 8 , 'SANYO ','')
,( 2, 9 , 'NETGEAR ','')
,( 2, 10, 'KYOCERA ','')
,( 2, 11, 'KYO ','')
,( 2, 12, 'FRANKLIN ','')
,( 2, 13, 'ALCATEL ','')
,( 2, 14, 'MOTO ','')
,( 2, 15, 'ZTE ','')
----colors last
,( 3, 1 , 'WHITE ','')
,( 3, 2 , ' WHITE','')
,( 3, 3 , 'WHT ','')
,( 3, 4 , ' WHT','')
,( 3, 5 , 'WH ','')
,( 3, 6 , ' WH','')
,( 3, 7 , 'PINK ','')
,( 3, 8 , 'GOLD ','')
,( 3, 9 , 'GOLD ','')
,( 3, 10, ' GOLD','')
,( 3, 11, 'GLD ','')
,( 3, 12, ' GLD','')
,( 3, 13, ' GD','')
,( 3, 14, 'BLK ','')
,( 3, 15, ' BLK','')
,( 3, 16, 'BLACK ','')
,( 3, 17, ' BLACK','')
,( 3, 18, ' BK',' ')
,( 3, 19, ' BK ',' ')
,( 3, 20, 'GREEN ','')
,( 3, 21, ' GREEN','')
,( 3, 22, 'GRAY ','')
,( 3, 23, ' GRAY','')
,( 3, 24, 'GRY ','')
,( 3, 25, ' GRY','')
,( 3, 26, 'SILVER ','')
,( 3, 27, ' SILVER','')
,( 3, 28, 'SILV ','')
,( 3, 29, ' SILV','')
,( 3, 30, 'SLV ','')
,( 3, 31, ' SLV','')
,( 3, 32, ' SV','')
,( 3, 33, 'ROSE ','')
,( 3, 34, ' ROSE','')
,( 3, 35, 'BLUE ','')
,( 3, 36, ' BLUE','')
,( 3, 37, 'BLU ','')
,( 3, 38, ' BLU','')
,( 3, 39, 'RED ','')
,( 3, 40, ' RED','')
,( 3, 41, 'NAVY ','')
,( 3, 42, ' NAVY','');
GO
CREATE TABLE Table1 (MODEL_NAME varchar(100), LOAN_MODEL_NAME varchar(100))
INSERT INTO Table1
SELECT 'IPHONE 6S ROSE 64GB SGL ' , 'IPHONE 6S ROSE 64GB KIT ' UNION ALL
SELECT 'IPHONE 6S ROSE 64GB SGL ' , 'IPHONE 6S ROSE 16GB KIT ' UNION ALL
SELECT 'SAMSUNG L900 SILVER ' , 'SAMSUNG N930P BLK KIT ' UNION ALL
SELECT 'SAM G935 32GB BK XCVR SGL' , 'SAMSUNG G935P 32GB BK KIT ' UNION ALL
SELECT 'IPH 6S PLUS GRAY 64GB SGL' , 'IPHONE 6S PLUS GRAY 64GB KIT'
SELECT *,
CASE WHEN mn.StrippedModelName = lmn.StrippedModelName THEN 'Yes'
ELSE 'Needs manual review of model match' END
FROM Table1 t1
CROSS APPLY dbo.[itvf_Model_Name_Compare](MODEL_NAME) mn
CROSS APPLY dbo.[itvf_Model_Name_Compare](LOAN_MODEL_NAME) lmn;
GO
DROP TABLE Table1;
GO
DROP TABLE ReplacementCodes;
September 22, 2016 at 8:34 am
I am just being curious.....
OP said
....
I have 2 tables and I am attempting to match models. (they are described differently in both tables).
IPHONE 6S ROSE 64GB SGL vs IPHONE 6S ROSE 64GB KIT
IPHONE 6S ROSE 64GB SGL vs IPHONE 6S ROSE 16GB KIT
SAMSUNG L900 SILVER vs SAMSUNG N930P BLK KIT
SAM G935 32GB BK XCVR SGL vs SAMSUNG G935P 32GB BK KIT
IPH 6S PLUS GRAY 64GB SGL vs IPHONE 6S PLUS GRAY 64GB KIT
.....
so with two separate tables how did it get determined to compare 'SAMSUNG L900 SILVER vs SAMSUNG N930P BLK KIT ' ?
or is the OP intending to compare all rows in primary table with all rows in secondary table
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
September 22, 2016 at 8:45 am
J Livingston SQL (9/22/2016)
I am just being curious.....OP said
....
I have 2 tables and I am attempting to match models. (they are described differently in both tables).
IPHONE 6S ROSE 64GB SGL vs IPHONE 6S ROSE 64GB KIT
IPHONE 6S ROSE 64GB SGL vs IPHONE 6S ROSE 16GB KIT
SAMSUNG L900 SILVER vs SAMSUNG N930P BLK KIT
SAM G935 32GB BK XCVR SGL vs SAMSUNG G935P 32GB BK KIT
IPH 6S PLUS GRAY 64GB SGL vs IPHONE 6S PLUS GRAY 64GB KIT
.....
so with two separate tables how did it get determined to compare 'SAMSUNG L900 SILVER vs SAMSUNG N930P BLK KIT ' ?
or is the OP intending to compare all rows in primary table with all rows in secondary table
I'm guessing that the tables are related by other column. Probably a catalog table with the transaction table, or a contract table with a table storing replacement phones that are given during services.
September 22, 2016 at 8:45 am
I like Luis' approach better... The one thing I would add is code to drop the existing function if it exists...
...
SET @FunctionSQL = 'IF OBJECT_ID(''[dbo].[itvf_Model_Name_Compare]'', ''IF'') IS NOT NULL
DROP FUNCTION [dbo].[itvf_Model_Name_Compare];
CREATE FUNCTION [dbo].[itvf_Model_Name_Compare]
...'
Never mind... He had it in there... I just wasn't where I 1st expected to see it... Mea culpa...
September 22, 2016 at 8:56 am
Jason A. Long (9/22/2016)
I like Luis' approach better... The one thing I would add is code to drop the existing function if it exists...
...
SET @FunctionSQL = 'IF OBJECT_ID(''[dbo].[itvf_Model_Name_Compare]'', ''IF'') IS NOT NULL
DROP FUNCTION [dbo].[itvf_Model_Name_Compare];
CREATE FUNCTION [dbo].[itvf_Model_Name_Compare]
...'
Never mind... He had it in there... I just wasn't where I 1st expected to see it... Mea culpa...
The reason for the location is that the function should be the only statement in the batch. 😉
September 22, 2016 at 9:13 am
Luis Cazares (9/22/2016)
Jason A. Long (9/22/2016)
I like Luis' approach better... The one thing I would add is code to drop the existing function if it exists...
...
SET @FunctionSQL = 'IF OBJECT_ID(''[dbo].[itvf_Model_Name_Compare]'', ''IF'') IS NOT NULL
DROP FUNCTION [dbo].[itvf_Model_Name_Compare];
CREATE FUNCTION [dbo].[itvf_Model_Name_Compare]
...'
Never mind... He had it in there... I just wasn't where I 1st expected to see it... Mea culpa...
The reason for the location is that the function should be the only statement in the batch. 😉
Yea... The way you did it makes perfect sense... The fact that you can't use the GO batch separator with dynamic SQL means it can't be combined...
It's one of those things that doesn't hit you (me) until you hit the post button and look at what you (I) just posted... :Whistling:
September 22, 2016 at 9:33 am
You asked "so with two separate tables how did it get determined to compare 'SAMSUNG L900 SILVER vs SAMSUNG N930P BLK KIT ' ?
or is the OP intending to compare all rows in primary table with all rows in secondary table..."
I just need to produce "Needs Model Review" if I can't get a match. I am joining the 2 tables that have some matching criteria and both of them have the model name in each table and model description is different. The idea was to take what I called non relevant words and strip them out and compare what is left. Upon doing that I noticed that have to take out the words in specific groups to make the next group work and so forth.
Currently if you try to match 'SAMSUNG L900 SILVER vs SAMSUNG N930P BLK KIT ', this comes out (below)
random words (group 1)
'SAMSUNG L900 SILVER' changes to 'SAMSUNG L900 SILVER' --no change
'SAMSUNG N930P BLK KIT ' changes to 'SAMSUNG N930P BLK ' -- KIT removed
now we have 'SAMSUNG L900 SILVER' vs 'SAMSUNG N930P BLK '
manufacture's (group 2)
'SAMSUNG L900 SILVER' changes to 'L900 SILVER' --SAMSUNG removed
'SAMSUNG N930P BLK ' changes to 'N930P BLK ' -- SAMSUNG removed
now we have 'L900 SILVER' vs 'N930P BLK '
colors (group 3)
'L900 SILVER' changes to 'L900' --SAMSUNG removed
'N930P BLK ' changes to 'N930P' -- SAMSUNG removed
now we have 'L900' vs 'N930P'
And this is of course is a no match.
However if you started with "IPhone 6S 128GB SNGL" vs "IPH 6S 128GB KIT", you would get "6S 128GB" vs "6S 128GB" which would match
This works very well. out of 100,000 records sampled I am getting 89% match, 10.68% don't match (model is different), and .32% (this is mostly spelling errors in the data or something I have yet to put in the table). Using the table new driven function with the same 100,000 records produced a 22% increase of speed.
For my test I created 2 tables. TableA had same data as TableB. Both table have 2 model descriptions. TableA I used my original function, TableB I used the new functions. the new function is faster. I used 2 tables because I did not want caching to be a factor.
September 22, 2016 at 5:12 pm
ghughes (9/22/2016)
I just need to produce "Needs Model Review" if I can't get a match. I am joining the 2 tables that have some matching criteria and both of them have the model name in each table and model description is different. The idea was to take what I called non relevant words and strip them out and compare what is left. Upon doing that I noticed that have to take out the words in specific groups to make the next group work and so forth.
I'd suggest you to create a separate table to store only unique model descriptions from those 2 100k rows table, along with "group 1", "group 2" and "group 3" cleaned-up model names - values in them are generated by the function from the full model description.
Don't forget to index every column.
This table may be auto-populated by a background job or a trigger.
I would not bother deleting records from there, it should not grow big.
Then you can simply join Table1 and Table2 records to 2 instances of this "Parsed Names" table by exact match of Product Description.
And here you can see if those 2 instances have a match on any level of "cleaned names".
Something like this:
SELECT T1.Customer, T1.ModelDescription CustomerPhone,
T2.ModelDescription AvailableReplacement,
CASE
WHEN T1.ModelDescription = T2.ModelDescription THEN 0
WHEN M2.Group1Name = M1.Group1Name THEN 1
WHEN M2.Group2Name = M1.Group2Name THEN 2
ELSE 3 END [Matching Level]
FROM CustomerPhone T1
INNER JOIN ModelNames M1 ON M1.ModelDescription = T1.ModelDescription
INNER JOIN ModelNames M2 ON M2.Group3Name = M1.Group3Name
INNER JOIN PhonesInStock T2 ON T2.ModelDescription = M2.ModelDescription
ORDER BY [Matching Level]
_____________
Code for TallyGenerator
September 22, 2016 at 5:56 pm
My approach would be to not try and use REPLACE in this way because I imagine that this list of models is constantly changing.
Instead create your own key for models (if you can't just use the actual model number) and map the descriptions to it. Then you can easily join on a key instead of a description.
September 23, 2016 at 8:08 am
Sergiy you are right again! The data changes, but not so much that I cant run a process every night/week and create a table of distinct full descriptions, and then create a stripped down version from the function. Then when I run my normal process it will be a table join instead of the scalar function on every row. Indexed properly it will contribute a lot on the speed.
I used a sample is 100,000 of raw data. My production is a little bit more that. If I can squeeze my job in the schedule, it should help on processing during the day also. A win.
Thanks again. This one has been actually a lot of fun. 🙂
September 23, 2016 at 9:30 am
One more alternative (I think) to throw in the mix. Each phone model string is split into individual "words", which are then filtered out by an existence test against the #Badwords table, then concatenated by together with the FOR XML technique.
IF OBJECT_ID('tempdb..#PhoneModels', 'U') IS NOT NULL
D ROP TABLE #PhoneModels;
SELECT ROW_NUMBER() over(order by (select null)) as RowID,
x.Model
INTO #PhoneModels
FROM ( VALUES
('IPHONE 6S ROSE 64GB SGL'), ('IPHONE 6S ROSE 64GB KIT'),
('IPHONE 6S ROSE 64GB SGL'), ('IPHONE 6S ROSE 16GB KIT'),
('SAMSUNG L900 SILVER'), ('SAMSUNG N930P BLK KIT'),
('SAM G935 32GB BK XCVR SGL'), ('SAMSUNG G935P 32GB BK KIT'),
('IPH 6S PLUS GRAY 64GB SGL'), ('IPHONE 6S PLUS GRAY 64GB KIT')
) x (Model);
create unique clustered index PK#PhoneModels on #PhoneModels(Rowid)
IF OBJECT_ID('tempdb..#BadWords', 'U') IS NOT NULL
D ROP TABLE #BadWords;
SELECT
x.Word
INTO #BadWords
FROM ( VALUES
('ROSE'), ('KIT'),('SGL'),('SILVER'), ('BLK'), ('GRAY'),
('PLUS'), ('BK')
) x (Word);
create unique index PK#Badwords on #Badwords(Word);
-- SOLUTION
set statistics time, io on;
select RowID,stuff((SELECT ' ' + ds.item
FROM dbo.DelimitedSplit8K(p.Model, space(1)) ds
WHERE not exists (select 1 from #BadWords b where b.word = ds.Item)
ORDER BY ItemNumber
FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')
,1,1,'') as [Concatenated]
from #PhoneModels P
order by RowID
set statistics time, io off;
This gives a "cleaned" value to be used for further comparison.
Edited to add: I just read Sergiy's answer from much earlier and now understand the various potential combinations. Although it would be more complicated, the above code could be extended to produce a level1 string, a level2 string, etc. This would avoid having to maintain the the tables Sergiy suggested, but I think Sergiy has the right idea.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
September 24, 2016 at 7:56 am
ghughes (9/23/2016)
Sergiy you are right again! The data changes, but not so much that I cant run a process every night/week and create a table of distinct full descriptions, and then create a stripped down version from the function. Then when I run my normal process it will be a table join instead of the scalar function on every row. Indexed properly it will contribute a lot on the speed.I used a sample is 100,000 of raw data. My production is a little bit more that. If I can squeeze my job in the schedule, it should help on processing during the day also. A win.
Thanks again. This one has been actually a lot of fun. 🙂
You're very welcome.
🙂
Remember - data normalisation is your best friend in the world of relational databases.
😎
_____________
Code for TallyGenerator
Viewing 15 posts - 16 through 29 (of 29 total)
You must be logged in to reply to this topic. Login to reply