August 16, 2012 at 9:28 am
Hi,
I have the below query where I am joining 2 tables with PATINDEX or LIKE as I donot have an exact join column. The ItemNbr from Item table is varchar(100) and it is contained in ProductNbr column which is Varchar(150) in product table to get the Product Name. I ran the below query with 500k records in Item table and 10k records in Product table which took 5hrs to complete. But the actual product table has 5MM records which will probably take few months to finish. Any other efficient solution to this problem? i tried LIKE which slower than PATINDEX
SELECT
Item.ItemNbr
,Product.ProductNbr
, Product.ProductName
INTO CT.dbo.ItemMatching
FROM CT.dbo.Item Item WITH (NOLOCK)
JOIN CT.dbo.Product Product WITH (NOLOCK)
ON PATINDEX('%' + Item.ItemNbr + '%', Product.ProductNbr) > 0
August 16, 2012 at 9:35 am
Is the item number a "separate string" inside the product number column, or is it burried in another string?
Separate would be like:
Item Number = A15
Product Number = A15 XYZ
With a space separating it from the rest of the string.
Burried would be:
Item Number = A15
Product Number = A15XYZ
No space.
If it has space around it, then full-text indexing could be used to separate out the pieces of the product number and search against those. Speed can be very good, but it does add to the storage space needed.
If it doesn't have a space, then you'll have to use PatIndex/Charindex/Like, and it's going to be slow.
Either way, it's a very flawed database design. Any chance the data can be modelled correctly? (I'm assuming that you're stuck with the current design and data, but I need to ask, just in case.)
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
August 16, 2012 at 9:46 am
GSquared - Thanks for your reply!
ItemNbr is buried in another string, ProductNbr. ProductNbr donot have spaces
Sample data:
ItemNbr: 1455363 (ItemNbr though varchar(100), it has integer values)
ProductNbr: ZEFT.97771455363
Unfortunately, the database design cannot be changed.
August 16, 2012 at 9:56 am
And there isn't a bill-of-materials somewhere in the data, with the component items mapped to the products they are part of, right? It's just in the string, no mapping anywhere?
If so, slow and error-prone is the only option you have.
I say error-prone, because if there's an Item Number 5363, that will also come up as part of that product number, so will item 1455, or item 771. I doubt that's meant to work that way, but it will. Any business rules on where the item number goes in the product number? Like "always the last 7 digits", or anything like that?
If there is a rule like that, and you have the authority to do so, you could pull that data out of the product numbers, either in a computed column in the table, or in an indexed view of the table, and improve the performance of your query tremendously. If not, then you're stuck with straight up string comparisons.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply