September 28, 2013 at 6:19 am
Hi,
I have two tables named "Table1" and "Table2".
Table1 Details:
id bigint, product_name nvarchar(1000),quantity int
records count on Table1 : 25000( may increase in future)
Table2 Details:
id bigint, product_name nvarchar(max),details nvarchar(1000), description nvarchar(1000)
Note: Table2 productName column will have comma separated values
records count on Table2 : 186289( may increase in future)
sample data:
Table1:
1 canola 120
2 bread 130
3 sauce 140
4 corn 120
Table2:
1 canola,tea,muffin,cheese jellyproducts null
2 vinegar,canola,sunflower oliproducts null
3. cornil,vegoil,canola,sesameoil oilproducts null
my requirement is i will take each prodcutname from Table1 and will search it in Table2's prodcutname column. if matches found then get the row and insert into temp table.
i tried with while loop/cursor with "like" condition but it takes 15+ hours to process the records. I know searching in comma separate data with "like" will take time, but it take 15+ hours and still running.
if am not wrong that looping will be bad on this larger records. so i have tried with joins as well. but nothing seems reducing the processing time.
Try1 :
select T2.*
from dbo.Table1 t1
inner join dbo.Table2 t2 on t2.product_name like '%' + t1.product_name + '%'
Try2:
select T2.*
from dbo.Table2 T2
inner join dbo.Table1 T1 on CHARINDEX(T1.product_name, T2.product_name) > 0
Is there any fastest way to achieve this requirement? please help me on this and share with me sample code if there is any.
September 29, 2013 at 5:06 pm
born2achieve (9/28/2013)
Hi,I have two tables named "Table1" and "Table2".
Table1 Details:
id bigint, product_name nvarchar(1000),quantity int
records count on Table1 : 25000( may increase in future)
Table2 Details:
id bigint, product_name nvarchar(max),details nvarchar(1000), description nvarchar(1000)
Note: Table2 productName column will have comma separated values
records count on Table2 : 186289( may increase in future)
sample data:
Table1:
1 canola 120
2 bread 130
3 sauce 140
4 corn 120
Table2:
1 canola,tea,muffin,cheese jellyproducts null
2 vinegar,canola,sunflower oliproducts null
3. cornil,vegoil,canola,sesameoil oilproducts null
my requirement is i will take each prodcutname from Table1 and will search it in Table2's prodcutname column. if matches found then get the row and insert into temp table.
i tried with while loop/cursor with "like" condition but it takes 15+ hours to process the records. I know searching in comma separate data with "like" will take time, but it take 15+ hours and still running.
if am not wrong that looping will be bad on this larger records. so i have tried with joins as well. but nothing seems reducing the processing time.
Try1 :
select T2.*
from dbo.Table1 t1
inner join dbo.Table2 t2 on t2.product_name like '%' + t1.product_name + '%'
Try2:
select T2.*
from dbo.Table2 T2
inner join dbo.Table1 T1 on CHARINDEX(T1.product_name, T2.product_name) > 0
Is there any fastest way to achieve this requirement? please help me on this and share with me sample code if there is any.
Is there any duplication of single product names in the product_name column of Table 2?
Also, the absolute BEST way to do this would be to properly normalize Table 2. CSV columns are one of the worst things you can store in a permanent table in a database for all the reasons that you're now finding out.
Also, for future posts, it's best if you post readily consumable test data. Please see the article at the first link under "Helpful Links" in my signature line below.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 29, 2013 at 5:09 pm
Jeff Moden (9/29/2013)
Also, the absolute BEST way to do this would be to properly normalize Table 2. CSV columns are one of the worst things you can store in a permanent table in a database for all the reasons that you're now finding out.
This. A thousand times over, this. Unless completely impossible, redesign that table, it will make life soooo much easier.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 29, 2013 at 8:05 pm
Thanks Jeff and Gila for the reply. I totally agree your answers. Appreciate gentle men.
September 29, 2013 at 8:23 pm
born2achieve (9/29/2013)
Thanks Jeff and Gila for the reply. I totally agree your answers. Appreciate gentle men.
No problem although Gail is one fine lady with a manly avatar. 😛
Do you have an answer to my question about duplicate single products (see my previous post, please)?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 30, 2013 at 5:01 am
Oh that's my bad and sorry Gila.
Thank you Jeff for the information. also answer for your question on Table2 we will have duplicate product name as it contains comma separated values column. but on table1 we will not have the duplication. but can have null.
The logic here is take each row value from table1 prodcutname column and map it with table2 productname column(which is comma separated) and get the matched record and make it available for further processing in temp table
any more suggestions....
September 30, 2013 at 11:42 am
born2achieve (9/30/2013)
Oh that's my bad and sorry Gila.Thank you Jeff for the information. also answer for your question on Table2 we will have duplicate product name as it contains comma separated values column. but on table1 we will not have the duplication. but can have null.
The logic here is take each row value from table1 prodcutname column and map it with table2 productname column(which is comma separated) and get the matched record and make it available for further processing in temp table
any more suggestions....
I guess I need to ask the question a different way... can any individual product in the CSV column show up in more than one row? In other words, can an individual product name be assigned to more than one category?
--Jeff Moden
Change is inevitable... Change for the better is not.
September 30, 2013 at 6:00 pm
To be crystal clear, would you ever have data like the following where a given single product appeared in the CSV of more than 1 product category? I ask because this WILL cause a partial Cartesian Product and THAT will determine the "best" way to handle the tables as they currently exist.
1 canola,tea,muffin,cheese jellyproducts null
2 vinegar,canola,sunflower oliproducts null
3 cornil,vegoil,canola,sesameoil oilproducts null
4 margarine,butter,muffin,bagel butterproducts null
5 muffin,bagel,wheatbread breadproducts null
6 tea,coffee,hotcocoa hotdrinkproducts null
--Jeff Moden
Change is inevitable... Change for the better is not.
September 30, 2013 at 6:21 pm
Hi Jeff,
your sample data format is perfect. This is how my data will looks like. Do you have any solutions.....
October 1, 2013 at 7:22 am
born2achieve (9/30/2013)
Hi Jeff,your sample data format is perfect. This is how my data will looks like. Do you have any solutions.....
Not sure yet. I know this is taking a while to wrangle out but getting it right is always important and, because of the huge amount of data the partial Cartesian Products generate, so is performance.
I know your requirement is to return each row in Table2 for every match in Table 1 and to put those matches in a Temp Table for "later processing" but perhaps the underlying problem to this all is, what will that later process actually do with that data? We might be able to streamline what we actually need to return if we knew just a bit more about that process.
I'm off to work. I probably won't be able to reply until I get home tonight.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 1, 2013 at 8:17 am
Hi Jeff,
Thanks a lot for your time on this and from temp table i am taking data and doing some other process. that will not be a deal. the actual issues is what you are looking on. if Cartesian product helps that would be great.
My eyes are rolling over to see your reply always.
Many thanks
October 1, 2013 at 8:22 am
Cartesian products aren't things that help 😉 This is a cartesian product:
SELECT <stuff> FROM LargeTable1 CROSS JOIN LargeTable2
You get a huge resultset with a number of row equal to the product of the row counts in each table (hence cartesian product). It's a great way to kill a server.
Is a table redesign not an option? It really, really would make stuff like this far easier.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
October 1, 2013 at 8:26 am
Hi Gila,
Thanks for your time,
The problem now is that production rolled out. so at this point of time the table design cannot be changed. looking for some temp solution.
Any hope.....
October 1, 2013 at 12:05 pm
you appear to want to match "corn" with "cornil"...is this correct?
some set up data to play with,,,,,
CREATE TABLE [dbo].[Table1] (
[id] [bigint] NULL,
[product_name] [nvarchar](1000) NULL,
[quantity] [int] NULL
) ON [PRIMARY]
CREATE TABLE [dbo].[Table2] (
[id] [bigint] NULL,
[product_name] [nvarchar](1000) NULL,
[details] [nvarchar](1000) NULL,
[description] [nvarchar](1000) NULL
) ON [PRIMARY]
INSERT INTO [Table1]([id],[product_name],[quantity])
VALUES(1,'canola',120)
INSERT INTO [Table1]([id],[product_name],[quantity])
VALUES(2,'bread',130)
INSERT INTO [Table1]([id],[product_name],[quantity])
VALUES(3,'sauce',140)
INSERT INTO [Table1]([id],[product_name],[quantity])
VALUES(4,'corn',120)
INSERT INTO [Table2]([id],[product_name],[details],[description])
VALUES(1,'canola,tea,muffin,cheese ','jellyproducts',NULL)
INSERT INTO [Table2]([id],[product_name],[details],[description])
VALUES(2,'vinegar,canola,sunflower ','oilproducts',NULL)
INSERT INTO [Table2]([id],[product_name],[details],[description])
VALUES(3,'cornil,vegoil,canola,sesameoil ','oilproducts',NULL)
GO
/*Try1 : */
SELECT t2.id,
t2.product_name,
t2.details,
t1.id AS T1_row
FROM Table1 AS t1
INNER JOIN Table2 AS t2 ON t2.product_name LIKE '%' + t1.product_name + '%'
/*Try2: */
SELECT T2.id,
T2.product_name,
T2.details,
T1.id AS T1_row
FROM Table2 AS T2
INNER JOIN Table1 AS T1 ON CHARINDEX(T1.product_name, T2.product_name) > 0
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
October 1, 2013 at 1:01 pm
Hi Livingston,
thanks for your time on this and i have tried whatever you have shown couple of days before, the problem is time consuming and how to fasten the search process.
Viewing 15 posts - 1 through 15 (of 32 total)
You must be logged in to reply to this topic. Login to reply