November 10, 2017 at 1:03 am
Hi All
I have a table currently holding 5.2 mil rows with a total of 50 + columns in about 10 of the columns could have a string which I need to check against my lookup table with about 15000 rows of the single or pattern of words i need to find the first matching value and insert the DataID from my lookup table into a staging table for later use I have some code that works but is' very slow doing about 10000 rows in about 2 hours and I have 5.2 mil to look through 10 times so any ideas on how I could speed this up please I have put the code and examples of the lookup table and the strings it could be looking up in
Lookup Table "Dim_InMem_ReplaceSize" 15488 rows sorted ascending by len of the value I'm looking for in the string in the products table
LookupValue LookupLenSort DataID
48K 3 15072
50A 3 15073
32 | K 6 13817
34 - A 6 13818
34DD|36D|38DD3 14 7499
30/B,30/C,32/C 14 5622
Products Table with 5.2 mil rows but this will be bigger
One of the columns has the following value
model_number
76576U-Generic Font-Cleverley/18-24 Months
7418BSUSBADUN-- Inter Font-Bobby Boswell/M
Body Sculpture BE-6120GHW Elliptical Cross Trainer CLICK TO VIEW PRODUCT VIDEO! 12 Program ComputerMagnetic Brake System4 user settingsHand Pulse SensorsAdjustable SaddleDual HandlebarsFREE NEXT DAY DELIVERY
Big Air Universal Octagonal 10ft Trampoline + Safety EnclosureFor all the Family56 All New Extreme Bounce Springs2.0mm Gauge Top Frame5 Year Frame WarrantyHOT Galvanised FrameFREE NEXT DAY DELIVERY AVAILABLE
The code I have currently is a function wrapping the column i'm looking up from one at a time in stages
this is the function(
@Value VARCHAR(8000)
)
RETURNS TABLE AS
RETURN
WITH ColourList AS ( -- replace this section with your Colour table when the time comes.
SELECT SCT.LookupValue,SCT.DataID
----,SCT.ReplaceBackSize
FROM dbo.Dim_InMem_ReplaceSize SCT
),
LookupList
AS
(
SELECT
ROW_NUMBER() OVER (ORDER BY
--LEN(cl.LookupValue
cl.LookupLenSort
--)
--DESC
) AS rn, -- prevents "Yellow" from butchering up "LightGoldenRodYellow"..
cl.LookupValue,DataID
----,cl.ReplaceBackSize
FROM
dbo.Dim_InMem_ReplaceSize cl
WHERE
@Value LIKE '%' + cl.LookupValue + '%'
), RecursiveReplace AS (
-- -- My select 1
SELECT cis.DataID AS ReturnString,
----cis.ReplaceBackSize AS ReturnString,
cis.rn
FROM
LookupList cis
WHERE
cis.rn = 1
UNION ALL
-- -- My select 2
SELECT cis.DataID
------cis.ReplaceBackSize
AS ReturnString,
cis.rn
FROM
LookupList cis
JOIN RecursiveReplace rr
ON cis.rn = rr.rn + 1
)
SELECT TOP 1
rr.ReturnString
FROM
RecursiveReplace rr
ORDER BY
rr.rn DESC;
And this is how I'm using it in 10k chunks the lookup is the one taking the time
DECLARE @DateNow DATE = CONVERT(DATE,GETDATE())
-- -- -- -- -- Large batch processing
---- -- -- -- -- Lookup from column model_number
WHILE EXISTS (
SELECT TOP 1
Fact_ProviderProductID = FSM.Fact_ProviderProductID
,Fact_MasterProductID = FSM.Fact_MasterProductID
,FlagProcessed = NULL
FROM dbo.Fact_InMem_Staging_MasterProductLookupProviderProductID FSM
LEFT OUTER JOIN dbo.Fact_InMem_Staging_MasterProductLookupProviderProductIDSplitBatch FSMS
ON FSM.Fact_ProviderProductID = FSMS.Fact_ProviderProductID
WHERE FSMS.Fact_ProviderProductID IS NULL
)
BEGIN
INSERT INTO dbo.Fact_InMem_Staging_MasterProductLookupProviderProductIDSplitBatch
SELECT TOP 10000
Fact_ProviderProductID = FSM.Fact_ProviderProductID
,Fact_MasterProductID = FSM.Fact_MasterProductID
,FlagProcessed = 0
FROM dbo.Fact_InMem_Staging_MasterProductLookupProviderProductID FSM
LEFT OUTER JOIN dbo.Fact_InMem_Staging_MasterProductLookupProviderProductIDSplitBatch FSMS
ON FSM.Fact_ProviderProductID = FSMS.Fact_ProviderProductID
WHERE FSMS.Fact_ProviderProductID IS NULL
INSERT INTO dbo.Staging_Feed_AffW_MasterSizeLookup
SELECT
aw_product_id = SFA.aw_product_id
,[Value] = (SELECT lc.ReturnString FROM dbo.UTfn_LikeSizeTableDataID(REPLACE(SFA.model_number + ' ','/',' ')) lc)
,[Source] = 'model_number'
,Etl_DateFirstLoad = @DateNow
,Etl_DateUpdated = @DateNow
FROM dbo.Fact_InMem_Staging_MasterProductLookupProviderProductIDSplitBatch FSMPP
INNER JOIN dbo.Staging_Feed_AffW SFA
ON FSMPP.Fact_ProviderProductID = SFA.aw_product_id
WHERE FSMPP.FlagProcessed = 0
----(SELECT lc.ReturnString FROM dbo.UTfn_LikeSizeTableDataID(REPLACE(SFA.model_number + ' ','/',' ')) lc) IS NOT NULL
UPDATE dbo.Fact_InMem_Staging_MasterProductLookupProviderProductIDSplitBatch
SET FlagProcessed = 1
WHERE FlagProcessed = 0
END
The above was something that was recommended to me on I think the SQL 2014 forum
Could anyone recommend a faster way to do this please as having to wait for about 2 months for some look up's to complete is not my idea of best pratice
PS I have no control over the source of the data
Kind regards
The Northern Monkey
November 14, 2017 at 8:03 am
Not sure this is ever going to go very fast. Also, using the function that way, while it is an inline table-valued function, the problem is that it's used in a way that I'm not sure is really going to work very well. You also are performing a REPLACE function on long string data, so that's not helping either. You might be able to use FULL TEXT INDEXING, because your search is a CONTAINS kind of search, and it would have to be on the smaller table. I don't know enough about FTI to help much, so Google will be your best friend in that regard.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
November 14, 2017 at 8:18 am
maybe a set up script that creates the tables and inserts some simple sample data and also the expected results would help......you have had many views and only one reply.
just saying.
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply