Is there a better faster way then REPLACE in this example?

  • Jason A. Long (9/22/2016)


    One issue that I would take with Sergiy's suggestion is that it can't be used in an iTVF. The resulting function will either be a scalar function or mTVF.

    A way to get around this, might be to alter/recreate the iTVF I posted on a regular basis, maybe through a trigger in the table. I haven't tried it but it's an idea.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Very slick, I like it. I was using the priority as a group priority. the random words are group 1, the manufacture was group 2 and the colors are group 3. if I was to use it in its current form I would just multiple the priority by 1000 or something.

    I wrote a user UI to keep me out of the maintenance business. and I WAS going to use the dynamically writen function, but this solution its far better.

    Thanks for everyone's help!!!

  • Sergiy's suggestion looks good but does not give provide for order of extracted words...

    so 'IPHONE 64GB ROSE 6S SGL' will not be an exact match for 'IPHONE 6S ROSE 64GB SGL'

    DECLARE @ReplacementCodes TABLE (

    priority INT NOT NULL,

    Code VARCHAR(50)

    )

    INSERT INTO @ReplacementCodes ( priority, Code )

    VALUES

    (1, ' ROSE'), (2, ' KIT'),(3, ' SGL'),(4, ' SILVER'), (5, ' BLK'), (6, ' GRAY'),(7, ' PLUS'), (8, ' BK')

    DECLARE @Model VARCHAR(200)

    SET @Model = 'IPHONE 64GB ROSE 6S SGL'

    SELECT @Model = REPLACE(@Model, code, '')

    FROM @ReplacementCodes rc

    ORDER BY rc.priority

    SELECT @Model

    DECLARE @Model2 VARCHAR(200)

    SET @Model2 = 'IPHONE 6S ROSE 64GB SGL'

    SELECT @Model2 = REPLACE(@Model2, code, '')

    FROM @ReplacementCodes rc

    ORDER BY rc.priority

    SELECT @Model2

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • I was curious on my previous idea, which seems to be working fine. Although, with more time, I would improve the code.

    CREATE TABLE ReplacementCodes(

    type_priority int,

    ind_priority int,

    code varchar(50),

    Replacement varchar(50)

    );

    GO

    CREATE TRIGGER TX_ReplacementCodes ON dbo.ReplacementCodes

    FOR INSERT, UPDATE, DELETE

    AS

    DECLARE @FunctionSQL nvarchar(max) = '@ModName';

    SELECT @FunctionSQL = CHAR(10) + 'REPLACE(' + @FunctionSQL

    + CHAR(10) + ',' + QUOTENAME( code, '''') + ','''')'

    FROM ReplacementCodes

    ORDER BY type_priority , ind_priority ;

    SET @FunctionSQL = 'CREATE FUNCTION [dbo].[itvf_Model_Name_Compare]

    (

    @ModName VARCHAR(100)

    )

    RETURNS TABLE WITH SCHEMABINDING

    AS

    RETURN

    SELECT LTRIM(RTRIM(

    REPLACE(REPLACE(' + @FunctionSQL + ','' '','' ''),'' '','' ''))) AS StrippedModelName;';

    IF OBJECT_ID( 'itvf_Model_Name_Compare') IS NOT NULL

    DROP FUNCTION itvf_Model_Name_Compare;

    EXEC( @FunctionSQL);

    GO

    INSERT INTO ReplacementCodes

    VALUES

    --special words

    ( 1, 1 , ' SNGL','')

    ,( 1, 2 , ' KIT','')

    ,( 1, 3 , ' KT','')

    ,( 1, 4 , ' KI','')

    ,( 1, 5 , ' SGL','')

    ,( 1, 6 , ' XSGL','')

    ,( 1, 7 , ' SNG','')

    ,( 1, 8 , ' XCVR','')

    ,( 1, 9 , ' HOTSPOT','')

    ,( 1, 10, ' HTSPT','')

    ,( 1, 11, ' TRANSCEIVER','')

    ,( 1, 12, ' HANDSET','')

    ,( 1, 13, ' SG','')

    ,( 1, 14, ' DEVICE','')

    ,( 1, 15, ' DVC','')

    ,( 1, 16, ' LTE','')

    ,( 1, 17, ' TABLET','')

    ,( 1, 18, 'CPO ','')

    --manufactors

    ,( 2, 1 , 'IPHONE ','')

    ,( 2, 2 , 'HTC ','')

    ,( 2, 3 , 'LG ','')

    ,( 2, 4 , 'IPH ','')

    ,( 2, 5 , 'SAM ','')

    ,( 2, 6 , 'SAMSUNG ','')

    ,( 2, 7 , 'PALM ','')

    ,( 2, 8 , 'SANYO ','')

    ,( 2, 9 , 'NETGEAR ','')

    ,( 2, 10, 'KYOCERA ','')

    ,( 2, 11, 'KYO ','')

    ,( 2, 12, 'FRANKLIN ','')

    ,( 2, 13, 'ALCATEL ','')

    ,( 2, 14, 'MOTO ','')

    ,( 2, 15, 'ZTE ','')

    ----colors last

    ,( 3, 1 , 'WHITE ','')

    ,( 3, 2 , ' WHITE','')

    ,( 3, 3 , 'WHT ','')

    ,( 3, 4 , ' WHT','')

    ,( 3, 5 , 'WH ','')

    ,( 3, 6 , ' WH','')

    ,( 3, 7 , 'PINK ','')

    ,( 3, 8 , 'GOLD ','')

    ,( 3, 9 , 'GOLD ','')

    ,( 3, 10, ' GOLD','')

    ,( 3, 11, 'GLD ','')

    ,( 3, 12, ' GLD','')

    ,( 3, 13, ' GD','')

    ,( 3, 14, 'BLK ','')

    ,( 3, 15, ' BLK','')

    ,( 3, 16, 'BLACK ','')

    ,( 3, 17, ' BLACK','')

    ,( 3, 18, ' BK',' ')

    ,( 3, 19, ' BK ',' ')

    ,( 3, 20, 'GREEN ','')

    ,( 3, 21, ' GREEN','')

    ,( 3, 22, 'GRAY ','')

    ,( 3, 23, ' GRAY','')

    ,( 3, 24, 'GRY ','')

    ,( 3, 25, ' GRY','')

    ,( 3, 26, 'SILVER ','')

    ,( 3, 27, ' SILVER','')

    ,( 3, 28, 'SILV ','')

    ,( 3, 29, ' SILV','')

    ,( 3, 30, 'SLV ','')

    ,( 3, 31, ' SLV','')

    ,( 3, 32, ' SV','')

    ,( 3, 33, 'ROSE ','')

    ,( 3, 34, ' ROSE','')

    ,( 3, 35, 'BLUE ','')

    ,( 3, 36, ' BLUE','')

    ,( 3, 37, 'BLU ','')

    ,( 3, 38, ' BLU','')

    ,( 3, 39, 'RED ','')

    ,( 3, 40, ' RED','')

    ,( 3, 41, 'NAVY ','')

    ,( 3, 42, ' NAVY','');

    GO

    CREATE TABLE Table1 (MODEL_NAME varchar(100), LOAN_MODEL_NAME varchar(100))

    INSERT INTO Table1

    SELECT 'IPHONE 6S ROSE 64GB SGL ' , 'IPHONE 6S ROSE 64GB KIT ' UNION ALL

    SELECT 'IPHONE 6S ROSE 64GB SGL ' , 'IPHONE 6S ROSE 16GB KIT ' UNION ALL

    SELECT 'SAMSUNG L900 SILVER ' , 'SAMSUNG N930P BLK KIT ' UNION ALL

    SELECT 'SAM G935 32GB BK XCVR SGL' , 'SAMSUNG G935P 32GB BK KIT ' UNION ALL

    SELECT 'IPH 6S PLUS GRAY 64GB SGL' , 'IPHONE 6S PLUS GRAY 64GB KIT'

    SELECT *,

    CASE WHEN mn.StrippedModelName = lmn.StrippedModelName THEN 'Yes'

    ELSE 'Needs manual review of model match' END

    FROM Table1 t1

    CROSS APPLY dbo.[itvf_Model_Name_Compare](MODEL_NAME) mn

    CROSS APPLY dbo.[itvf_Model_Name_Compare](LOAN_MODEL_NAME) lmn;

    GO

    DROP TABLE Table1;

    GO

    DROP TABLE ReplacementCodes;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I am just being curious.....

    OP said

    ....

    I have 2 tables and I am attempting to match models. (they are described differently in both tables).

    IPHONE 6S ROSE 64GB SGL vs IPHONE 6S ROSE 64GB KIT

    IPHONE 6S ROSE 64GB SGL vs IPHONE 6S ROSE 16GB KIT

    SAMSUNG L900 SILVER vs SAMSUNG N930P BLK KIT

    SAM G935 32GB BK XCVR SGL vs SAMSUNG G935P 32GB BK KIT

    IPH 6S PLUS GRAY 64GB SGL vs IPHONE 6S PLUS GRAY 64GB KIT

    .....

    so with two separate tables how did it get determined to compare 'SAMSUNG L900 SILVER vs SAMSUNG N930P BLK KIT ' ?

    or is the OP intending to compare all rows in primary table with all rows in secondary table

    ________________________________________________________________
    you can lead a user to data....but you cannot make them think
    and remember....every day is a school day

  • J Livingston SQL (9/22/2016)


    I am just being curious.....

    OP said

    ....

    I have 2 tables and I am attempting to match models. (they are described differently in both tables).

    IPHONE 6S ROSE 64GB SGL vs IPHONE 6S ROSE 64GB KIT

    IPHONE 6S ROSE 64GB SGL vs IPHONE 6S ROSE 16GB KIT

    SAMSUNG L900 SILVER vs SAMSUNG N930P BLK KIT

    SAM G935 32GB BK XCVR SGL vs SAMSUNG G935P 32GB BK KIT

    IPH 6S PLUS GRAY 64GB SGL vs IPHONE 6S PLUS GRAY 64GB KIT

    .....

    so with two separate tables how did it get determined to compare 'SAMSUNG L900 SILVER vs SAMSUNG N930P BLK KIT ' ?

    or is the OP intending to compare all rows in primary table with all rows in secondary table

    I'm guessing that the tables are related by other column. Probably a catalog table with the transaction table, or a contract table with a table storing replacement phones that are given during services.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • I like Luis' approach better... The one thing I would add is code to drop the existing function if it exists...

    ...

    SET @FunctionSQL = 'IF OBJECT_ID(''[dbo].[itvf_Model_Name_Compare]'', ''IF'') IS NOT NULL

    DROP FUNCTION [dbo].[itvf_Model_Name_Compare];

    CREATE FUNCTION [dbo].[itvf_Model_Name_Compare]

    ...'

    Never mind... He had it in there... I just wasn't where I 1st expected to see it... Mea culpa...

  • Jason A. Long (9/22/2016)


    I like Luis' approach better... The one thing I would add is code to drop the existing function if it exists...

    ...

    SET @FunctionSQL = 'IF OBJECT_ID(''[dbo].[itvf_Model_Name_Compare]'', ''IF'') IS NOT NULL

    DROP FUNCTION [dbo].[itvf_Model_Name_Compare];

    CREATE FUNCTION [dbo].[itvf_Model_Name_Compare]

    ...'

    Never mind... He had it in there... I just wasn't where I 1st expected to see it... Mea culpa...

    The reason for the location is that the function should be the only statement in the batch. 😉

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (9/22/2016)


    Jason A. Long (9/22/2016)


    I like Luis' approach better... The one thing I would add is code to drop the existing function if it exists...

    ...

    SET @FunctionSQL = 'IF OBJECT_ID(''[dbo].[itvf_Model_Name_Compare]'', ''IF'') IS NOT NULL

    DROP FUNCTION [dbo].[itvf_Model_Name_Compare];

    CREATE FUNCTION [dbo].[itvf_Model_Name_Compare]

    ...'

    Never mind... He had it in there... I just wasn't where I 1st expected to see it... Mea culpa...

    The reason for the location is that the function should be the only statement in the batch. 😉

    Yea... The way you did it makes perfect sense... The fact that you can't use the GO batch separator with dynamic SQL means it can't be combined...

    It's one of those things that doesn't hit you (me) until you hit the post button and look at what you (I) just posted... :Whistling:

  • You asked "so with two separate tables how did it get determined to compare 'SAMSUNG L900 SILVER vs SAMSUNG N930P BLK KIT ' ?

    or is the OP intending to compare all rows in primary table with all rows in secondary table..."

    I just need to produce "Needs Model Review" if I can't get a match. I am joining the 2 tables that have some matching criteria and both of them have the model name in each table and model description is different. The idea was to take what I called non relevant words and strip them out and compare what is left. Upon doing that I noticed that have to take out the words in specific groups to make the next group work and so forth.

    Currently if you try to match 'SAMSUNG L900 SILVER vs SAMSUNG N930P BLK KIT ', this comes out (below)

    random words (group 1)

    'SAMSUNG L900 SILVER' changes to 'SAMSUNG L900 SILVER' --no change

    'SAMSUNG N930P BLK KIT ' changes to 'SAMSUNG N930P BLK ' -- KIT removed

    now we have 'SAMSUNG L900 SILVER' vs 'SAMSUNG N930P BLK '

    manufacture's (group 2)

    'SAMSUNG L900 SILVER' changes to 'L900 SILVER' --SAMSUNG removed

    'SAMSUNG N930P BLK ' changes to 'N930P BLK ' -- SAMSUNG removed

    now we have 'L900 SILVER' vs 'N930P BLK '

    colors (group 3)

    'L900 SILVER' changes to 'L900' --SAMSUNG removed

    'N930P BLK ' changes to 'N930P' -- SAMSUNG removed

    now we have 'L900' vs 'N930P'

    And this is of course is a no match.

    However if you started with "IPhone 6S 128GB SNGL" vs "IPH 6S 128GB KIT", you would get "6S 128GB" vs "6S 128GB" which would match

    This works very well. out of 100,000 records sampled I am getting 89% match, 10.68% don't match (model is different), and .32% (this is mostly spelling errors in the data or something I have yet to put in the table). Using the table new driven function with the same 100,000 records produced a 22% increase of speed.

    For my test I created 2 tables. TableA had same data as TableB. Both table have 2 model descriptions. TableA I used my original function, TableB I used the new functions. the new function is faster. I used 2 tables because I did not want caching to be a factor.

  • ghughes (9/22/2016)


    I just need to produce "Needs Model Review" if I can't get a match. I am joining the 2 tables that have some matching criteria and both of them have the model name in each table and model description is different. The idea was to take what I called non relevant words and strip them out and compare what is left. Upon doing that I noticed that have to take out the words in specific groups to make the next group work and so forth.

    I'd suggest you to create a separate table to store only unique model descriptions from those 2 100k rows table, along with "group 1", "group 2" and "group 3" cleaned-up model names - values in them are generated by the function from the full model description.

    Don't forget to index every column.

    This table may be auto-populated by a background job or a trigger.

    I would not bother deleting records from there, it should not grow big.

    Then you can simply join Table1 and Table2 records to 2 instances of this "Parsed Names" table by exact match of Product Description.

    And here you can see if those 2 instances have a match on any level of "cleaned names".

    Something like this:

    SELECT T1.Customer, T1.ModelDescription CustomerPhone,

    T2.ModelDescription AvailableReplacement,

    CASE

    WHEN T1.ModelDescription = T2.ModelDescription THEN 0

    WHEN M2.Group1Name = M1.Group1Name THEN 1

    WHEN M2.Group2Name = M1.Group2Name THEN 2

    ELSE 3 END [Matching Level]

    FROM CustomerPhone T1

    INNER JOIN ModelNames M1 ON M1.ModelDescription = T1.ModelDescription

    INNER JOIN ModelNames M2 ON M2.Group3Name = M1.Group3Name

    INNER JOIN PhonesInStock T2 ON T2.ModelDescription = M2.ModelDescription

    ORDER BY [Matching Level]

    _____________
    Code for TallyGenerator

  • My approach would be to not try and use REPLACE in this way because I imagine that this list of models is constantly changing.

    Instead create your own key for models (if you can't just use the actual model number) and map the descriptions to it. Then you can easily join on a key instead of a description.

  • Sergiy you are right again! The data changes, but not so much that I cant run a process every night/week and create a table of distinct full descriptions, and then create a stripped down version from the function. Then when I run my normal process it will be a table join instead of the scalar function on every row. Indexed properly it will contribute a lot on the speed.

    I used a sample is 100,000 of raw data. My production is a little bit more that. If I can squeeze my job in the schedule, it should help on processing during the day also. A win.

    Thanks again. This one has been actually a lot of fun. 🙂

  • One more alternative (I think) to throw in the mix. Each phone model string is split into individual "words", which are then filtered out by an existence test against the #Badwords table, then concatenated by together with the FOR XML technique.

    IF OBJECT_ID('tempdb..#PhoneModels', 'U') IS NOT NULL

    D ROP TABLE #PhoneModels;

    SELECT ROW_NUMBER() over(order by (select null)) as RowID,

    x.Model

    INTO #PhoneModels

    FROM ( VALUES

    ('IPHONE 6S ROSE 64GB SGL'), ('IPHONE 6S ROSE 64GB KIT'),

    ('IPHONE 6S ROSE 64GB SGL'), ('IPHONE 6S ROSE 16GB KIT'),

    ('SAMSUNG L900 SILVER'), ('SAMSUNG N930P BLK KIT'),

    ('SAM G935 32GB BK XCVR SGL'), ('SAMSUNG G935P 32GB BK KIT'),

    ('IPH 6S PLUS GRAY 64GB SGL'), ('IPHONE 6S PLUS GRAY 64GB KIT')

    ) x (Model);

    create unique clustered index PK#PhoneModels on #PhoneModels(Rowid)

    IF OBJECT_ID('tempdb..#BadWords', 'U') IS NOT NULL

    D ROP TABLE #BadWords;

    SELECT

    x.Word

    INTO #BadWords

    FROM ( VALUES

    ('ROSE'), ('KIT'),('SGL'),('SILVER'), ('BLK'), ('GRAY'),

    ('PLUS'), ('BK')

    ) x (Word);

    create unique index PK#Badwords on #Badwords(Word);

    -- SOLUTION

    set statistics time, io on;

    select RowID,stuff((SELECT ' ' + ds.item

    FROM dbo.DelimitedSplit8K(p.Model, space(1)) ds

    WHERE not exists (select 1 from #BadWords b where b.word = ds.Item)

    ORDER BY ItemNumber

    FOR XML PATH(''), TYPE).value('.', 'nvarchar(max)')

    ,1,1,'') as [Concatenated]

    from #PhoneModels P

    order by RowID

    set statistics time, io off;

    This gives a "cleaned" value to be used for further comparison.

    Edited to add: I just read Sergiy's answer from much earlier and now understand the various potential combinations. Although it would be more complicated, the above code could be extended to produce a level1 string, a level2 string, etc. This would avoid having to maintain the the tables Sergiy suggested, but I think Sergiy has the right idea.

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • ghughes (9/23/2016)


    Sergiy you are right again! The data changes, but not so much that I cant run a process every night/week and create a table of distinct full descriptions, and then create a stripped down version from the function. Then when I run my normal process it will be a table join instead of the scalar function on every row. Indexed properly it will contribute a lot on the speed.

    I used a sample is 100,000 of raw data. My production is a little bit more that. If I can squeeze my job in the schedule, it should help on processing during the day also. A win.

    Thanks again. This one has been actually a lot of fun. 🙂

    You're very welcome.

    🙂

    Remember - data normalisation is your best friend in the world of relational databases.

    😎

    _____________
    Code for TallyGenerator

Viewing 15 posts - 16 through 29 (of 29 total)

You must be logged in to reply to this topic. Login to reply