Table join using LIKE or PATINDEX - Challenge

  • Hi,

    I have the below query where I am joining 2 tables with PATINDEX or LIKE as I donot have an exact join column. The ItemNbr from Item table is varchar(100) and it is contained in ProductNbr column which is Varchar(150) in product table to get the Product Name. I ran the below query with 500k records in Item table and 10k records in Product table which took 5hrs to complete. But the actual product table has 5MM records which will probably take few months to finish. Any other efficient solution to this problem? i tried LIKE which slower than PATINDEX

    SELECT

    Item.ItemNbr

    ,Product.ProductNbr

    , Product.ProductName

    INTO CT.dbo.ItemMatching

    FROM CT.dbo.Item Item WITH (NOLOCK)

    JOIN CT.dbo.Product Product WITH (NOLOCK)

    ON PATINDEX('%' + Item.ItemNbr + '%', Product.ProductNbr) > 0

  • Is the item number a "separate string" inside the product number column, or is it burried in another string?

    Separate would be like:

    Item Number = A15

    Product Number = A15 XYZ

    With a space separating it from the rest of the string.

    Burried would be:

    Item Number = A15

    Product Number = A15XYZ

    No space.

    If it has space around it, then full-text indexing could be used to separate out the pieces of the product number and search against those. Speed can be very good, but it does add to the storage space needed.

    If it doesn't have a space, then you'll have to use PatIndex/Charindex/Like, and it's going to be slow.

    Either way, it's a very flawed database design. Any chance the data can be modelled correctly? (I'm assuming that you're stuck with the current design and data, but I need to ask, just in case.)

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared - Thanks for your reply!

    ItemNbr is buried in another string, ProductNbr. ProductNbr donot have spaces

    Sample data:

    ItemNbr: 1455363 (ItemNbr though varchar(100), it has integer values)

    ProductNbr: ZEFT.97771455363

    Unfortunately, the database design cannot be changed.

  • And there isn't a bill-of-materials somewhere in the data, with the component items mapped to the products they are part of, right? It's just in the string, no mapping anywhere?

    If so, slow and error-prone is the only option you have.

    I say error-prone, because if there's an Item Number 5363, that will also come up as part of that product number, so will item 1455, or item 771. I doubt that's meant to work that way, but it will. Any business rules on where the item number goes in the product number? Like "always the last 7 digits", or anything like that?

    If there is a rule like that, and you have the authority to do so, you could pull that data out of the product numbers, either in a computed column in the table, or in an indexed view of the table, and improve the performance of your query tremendously. If not, then you're stuck with straight up string comparisons.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply