finding lookup table text strings inside much larger text string fields

  • Thanks for the help in advance, I am really interested to see if this can be achived in SQL.

    I have two tables, one with a notes column, in this varchar(max) column there is lots of text which includes product codes e.g. DYGER_074332 with the string. This table has around 91 thousand rows.

    I have another table I have a column with the product codes in e.g. 1 row per product code around 250 thousand rows.

    I would like to be able to find the product codes which are present the notes and populate another table which records the note id and the product code, essentially one row per found product code.

    Can you please tell me the right way to go about this, I am a bit stuck as to where to start, if you can point me in the right direction that would be great, 😉

    Many Thanks for the advice.

    Oliver

  • Are you saying you want a table with only 1 row per product code with 2 columns: productcode and list of noteids?

    Jared

    Jared
    CE - Microsoft

  • It would be good to have:

    note id product code

    1 DIUNR_47332

    1 FUDIF_38239

    2 DIUNR_47332

    3 YREII_74389

    etc

    Many thanks for the quick reply

  • oliver.morris (12/1/2011)


    It would be good to have:

    note id product code

    1 DIUNR_47332

    1 FUDIF_38239

    2 DIUNR_47332

    3 YREII_74389

    etc

    Many thanks for the quick reply

    Hmm... Fun one! So you want to take and search each note for each product id and write the noteid and productid to a table. First off, you also want an id field in that table:

    CREATE TABLE something (id int (1,1), noteid, productcode)

    That table will be inserted into. Before I work on a solution, is this a one time thing or something that will be run regularly?

    Jared

    Jared
    CE - Microsoft

  • Well currently it would be a once a month job or something around that frequency.

    Thank you for the help,

    Oliver

  • I can't think of any other way than a loop or a cursor to search for the existence of each product id and then writing what it finds. So you would write a set to the table for each product id, then move on to the next one. So, you could load your cursor with the product ids and fetch each one, search for it in the string using a LIKE '%productid%' where productid IS the product id and write to the table.

    Anyone else think this can be done more efficiently?

    Jared

    Jared
    CE - Microsoft

  • Thanks for the help, I can't help but think there must be a smart way to do this. I fear thatnif I go down the cursor route it is going to be very slow going. Anything that anyone can add would be great.

    Thank you very much for the help,

    Oliver

  • oliver.morris (12/1/2011)


    Thanks for the help, I can't help but think there must be a smart way to do this. I fear thatnif I go down the cursor route it is going to be very slow going. Anything that anyone can add would be great.

    Thank you very much for the help,

    Oliver

    Can you maybe FROM producttable

    INNER JOIN notetable

    ON producttable.productid LIKE '%' + notetable.textfield + '%'

    I think this will work!

    Jared

    Jared
    CE - Microsoft

  • INSERT INTO foundTable ( note_#, product_code )

    SELECT n.note_#, p.product_code

    FROM notesTable n

    INNER JOIN productsTable p ON

    n.notes LIKE '%' + p.product_code + '%'

    You have to decide if you need to exclude "false" matches by delimiting the product_code.

    For example: if you might have product codes like 'DYGER_074332' and 'DYGER_074332A', which are two different codes.

    Then insure the product_code appearing in the note is not immediately preceded by a letter or number:

    INNER JOIN productsTable p ON

    n.notes LIKE '%[^a-z0-9]%' + p.product_code + '[^a-z0-9]%'

    EXCEPT that the note could also start or end with the product_code itself, so you''d have to add:

    n.notes LIKE '%[^a-z0-9]%' + p.product_code + '[^a-z0-9]%' OR

    n.notes LIKE p.product_code + '%' OR

    n.notes LIKE '%' + p.product_code

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (12/1/2011)


    INSERT INTO foundTable ( note_#, product_code )

    SELECT n.note_#, p.product_code

    FROM notesTable n

    INNER JOIN productsTable p ON

    n.notes LIKE '%' + p.product_code + '%'

    You have to decide if you need to exclude "false" matches by delimiting the product_code.

    For example: if you might have product codes like 'DYGER_074332' and 'DYGER_074332A', which are two different codes.

    Then insure the product_code appearing in the note is not immediately preceded by a letter or number:

    INNER JOIN productsTable p ON

    n.notes LIKE '%[^a-z0-9]%' + p.product_code + '[^a-z0-9]%'

    EXCEPT that the note could also start or end with the product_code itself, so you''d have to add:

    n.notes LIKE '%[^a-z0-9]%' + p.product_code + '[^a-z0-9]%' OR

    n.notes LIKE p.product_code + '%' OR

    n.notes LIKE '%' + p.product_code

    Nice! Regex's I need to learn more of 🙂

    Jared

    Jared
    CE - Microsoft

  • Thank you for the help on this, I am running this now. Looks like to run through 90,000 rows of varchar(max) data and comparing against 250,000 products its going to take some time to run.

    Will let you know how it goes. If you know any way that this can be sped up that would be really helpful.

    Many Thanks,

    Oliver

  • Thank you for the help on this. I am two hours into running this query but I guess it's going to take a while to run against 90000 notes, and 250000 product id's.

    Any ideas on how I could make this faster would be greatfully appreciated.

    Many Thanks for your help,

    Oliver

  • Hmm, that's a difficult issue. With what you're doing that won't be easy.

    I thought earlier about creating a full-text index across the "notes" column. In theory that could help under the right conditions.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Yeah I will give this a go and see if it helps.

    Many Thanks,

    Oliver

  • oliver.morris (12/2/2011)


    Yeah I will give this a go and see if it helps.

    Many Thanks,

    Oliver

    Whilst you are doing this pesky parsing, I would think about a redesign. Allow the user to input the product ids in a separate field with the note. The product id can still be in the note, but require them to also put it in separate fields or a comma delimited sting that the code behind can parse into individual rows. This way you don't have to do this.

    OR if you have a development team you can have someone write some code that will do this search for you outside of SQL. I think it "could" be much faster if you cannot change the actual application.

    Jared

    Jared
    CE - Microsoft

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply