July 12, 2017 at 8:45 am
I need all combinations (not permutations) of the keywords and their synonyms. I'm having trouble getting groups of words substituting with their synonyms.
A keyword's synonym could be matched in the synonym's table cWord column or cSyn column.
I'm guessing it's going to involve something like a recursive cte w/ cross join (shudders), but the keywords table will rarely be more than 5, and never more than 8, rows.
Can anyone help?
TIA,
Paul
DECLARE @tKeywords TABLE
(nGroup_Id INTEGER,
cWord VARCHAR(255))
DECLARE @tSynonyms TABLE
(cWord VARCHAR(255),
cSyn VARCHAR(255))
INSERT @tKeywords
SELECT 1, 'Employment'
UNION ALL SELECT 1, 'Insurance'
UNION ALL SELECT 1, 'Online'
UNION ALL SELECT 1, 'Application'
INSERT @tSynonyms
SELECT 'Employment', 'Job'
UNION ALL SELECT 'Employment', 'Profession'
UNION ALL SELECT 'Employment', 'Vocation'
UNION ALL SELECT 'Online', 'Electronic'
Expected results:
Group_Id cWord
1 Employment
1 Insurance
1 Online
1 Application
2 Job
2 Insurance
2 Online
2 Application
3 Profession
3 Insurance
3 Online
3 Application
4 Vocation
4 Insurance
4 Online
4 Application
5 Employment
5 Insurance
5 Electronic
5 Application
6 Job
6 Insurance
6 Electronic
6 Application
7 Profession
7 Insurance
7 Electronic
7 Application
8 Vocation
8 Insurance
8 Electronic
8 Application
July 12, 2017 at 9:54 am
I'm, sorry, but I can't for the life of me figure out what you need here. I'll blame it on percocet. 🙂
1) You have a groupid field that is all 1's, but your output has an incrementing groupid
2) Insurance and Application have no matches at all, yet they are in the output
3) Just general confusion. 🙂
Can you please simplify with perhaps just two of each thingy and put some verbiage with each expected output row explaining the rule(s) for why each exists. Perhaps something about why other permutations do NOT make it in the output could be helpful too?
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
July 12, 2017 at 10:23 am
I think he wants all combinations of one word from each group of synonyms. The groups are (Employment, Job, Profession, Vocation), (Insurance), (Online, Electronic) and (Application). I got close, but I've run out of time, so I'll save what I've done and have another go at it tomorrow if I get a chance.
Edit: what even is percocet? Is it like covfefe?!
John
July 12, 2017 at 10:28 am
John Mitchell-245523 - Wednesday, July 12, 2017 10:23 AMI think he wants all combinations of one word from each group of synonyms. The groups are (Employment, Job, Profession, Vocation), (Insurance), (Online, Electronic) and (Application). I got close, but I've run out of time, so I'll save what I've done and have another go at it tomorrow if I get a chance.Edit: what even is percocet? Is it like covfefe?!
John
Can you post up what you have so far? Maybe I can carry on with it.
Percocet is a narcotic painkiller. I had a surgery go bad almost 2 months ago and I still have to take it regularly. :crying:
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
July 12, 2017 at 10:43 am
I have annotated the expected results to try to understand better. Am I on the right track? One thing that still gets me is that it (almost) seems like order matters? Still fuzzy ..
Expected results:
Group_Id cWord
1 Employment Employment, 1st hit, NO replacement
1 Insurance
1 Online
1 Application
2 Job Employment --> Job cSyn
2 Insurance
2 Online
2 Application
3 Profession Employment --> Profession cSyn
3 Insurance
3 Online
3 Application
4 Vocation Employment --> Vocation cSyn
4 Insurance
4 Online
4 Application
5 Employment Employment, 2nd hit, replace Online (3rd row) with Electronic cSyn
5 Insurance
5 Electronic
5 Application
6 Job Employment, 2nd hit --> Job cSyn, replace Online (3rd row) with Electronic cSyn
6 Insurance
6 Electronic
6 Application
7 Profession Employment, 2nd hit --> Profession cSyn, replace Online (3rd row) with Electronic cSyn
7 Insurance
7 Electronic
7 Application
8 Vocation Employment, 2nd hit --> Vocation Syn, replace Online (3rd row) with Electronic cSyn
8 Insurance
8 Electronic
8 Application
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
July 12, 2017 at 11:21 am
Sorry, was -- and am -- out to lunch :Whistling:
Precisely!
But thinking on it over food, I may have been making this harder than it need be.
I can start with the string 'Employment Insurance Online Application', do the substitutions, then run them through the splitter.
It's a WHILE loop, but, seems to work alright.
DECLARE @cCleanStringVar VARCHAR(512) = 'Employment Insurance Online Application'
DECLARE @tKeywords TABLE
(cWord VARCHAR(255))
DECLARE @tSynonyms TABLE
(cWord VARCHAR(255),
cSyn VARCHAR(255))
INSERT @tSynonyms
SELECT 'Employment', 'Job'
UNION ALL SELECT 'Employment', 'Work'
UNION ALL SELECT 'Employment', 'Profession'
UNION ALL SELECT 'Employment', 'Vocation'
UNION ALL SELECT 'Online', 'Electronic'
INSERT @tKeywords
SELECT @cCleanStringVar
WHILE @@ROWCOUNT > 0
INSERT @tKeywords
SELECT DISTINCT --ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) + 1,
STUFF(k.cWord, CHARINDEX(s.cWord, k.cWord, nRow), DATALENGTH(s.cWord), s.cSyn)--,
-- s.cWord
FROM @tKeywords k
CROSS APPLY Project.dbo.ufn_Numbers(DATALENGTH(k.cWord))
JOIN @tSynonyms s ON CHARINDEX(s.cWord, k.cWord, nRow) > 0
AND NOT EXISTS (SELECT *
FROM @tKeywords k2
WHERE k2.cWord = STUFF(k.cWord, CHARINDEX(s.cWord, k.cWord, nRow), DATALENGTH(s.cWord), s.cSyn))
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
cWord
FROM @tKeywords
ORDER BY 1
July 12, 2017 at 12:42 pm
schleep - Wednesday, July 12, 2017 11:21 AMSorry, was -- and am -- out to lunch :Whistling:Precisely!
But thinking on it over food, I may have been making this harder than it need be.
I can start with the string 'Employment Insurance Online Application', do the substitutions, then run them through the splitter.
It's a WHILE loop, but, seems to work alright.
DECLARE @cCleanStringVar VARCHAR(512) = 'Employment Insurance Online Application'
DECLARE @tKeywords TABLE
(cWord VARCHAR(255))DECLARE @tSynonyms TABLE
(cWord VARCHAR(255),
cSyn VARCHAR(255))INSERT @tSynonyms
SELECT 'Employment', 'Job'
UNION ALL SELECT 'Employment', 'Work'
UNION ALL SELECT 'Employment', 'Profession'
UNION ALL SELECT 'Employment', 'Vocation'
UNION ALL SELECT 'Online', 'Electronic'INSERT @tKeywords
SELECT @cCleanStringVarWHILE @@ROWCOUNT > 0
INSERT @tKeywords
SELECT DISTINCT --ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) + 1,
STUFF(k.cWord, CHARINDEX(s.cWord, k.cWord, nRow), DATALENGTH(s.cWord), s.cSyn)--,
-- s.cWord
FROM @tKeywords k
CROSS APPLY Project.dbo.ufn_Numbers(DATALENGTH(k.cWord))
JOIN @tSynonyms s ON CHARINDEX(s.cWord, k.cWord, nRow) > 0
AND NOT EXISTS (SELECT *
FROM @tKeywords k2
WHERE k2.cWord = STUFF(k.cWord, CHARINDEX(s.cWord, k.cWord, nRow), DATALENGTH(s.cWord), s.cSyn))SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)),
cWord
FROM @tKeywords
ORDER BY 1
That code makes my brain hurt!! 😀 But if it works I'm moving on ... LOL
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
July 12, 2017 at 12:51 pm
Mine too. Here's the final version. I didn't mention before, we also have phrases and synonyms of phrases.
Managed to get rid of the cross apply, which makes me considerably happier.
IF OBJECT_ID('dbo.ufn_IPP_GetKWSynsGroups') IS NOT NULL
DROP FUNCTION dbo.ufn_IPP_GetKWSynsGroups
GO
CREATE FUNCTION dbo.ufn_IPP_GetKWSynsGroups
(@cCleanStringArg VARCHAR(255),
@cLangArg CHAR(1))
RETURNS @tRS TABLE ( nGroup_Id INTEGER,
cWord VARCHAR(512))
AS
BEGIN
DECLARE @nRowCountVar INTEGER
DECLARE @tKeywords TABLE
(nGroup_Id INTEGER IDENTITY,
cWord VARCHAR(255) PRIMARY KEY)
DECLARE @tSynonyms TABLE
(cWord VARCHAR(255),
cSyn VARCHAR(255),
nPos INTEGER)
INSERT @tSynonyms
SELECT cWord, cSyn, CHARINDEX(cWord, @cCleanStringArg)
FROM dbo.IPP_Synonyms s
WHERE CHARINDEX(cWord, @cCleanStringArg) > 0
AND s.cLang = @cLangArg
WHILE @@ROWCOUNT > 0
INSERT @tSynonyms
SELECT s.cWord, s.cSyn, CHARINDEX(s.cWord, @cCleanStringArg, nPos)
FROM dbo.IPP_Synonyms s
CROSS JOIN @tSynonyms t
WHERE CHARINDEX(s.cWord, @cCleanStringArg) > 0
AND s.cLang = @cLangArg
AND NOT EXISTS (SELECT *
FROM @tSynonyms t2
WHERE t2.cWord = s.cWord
AND t2.cSyn = s.cSyn)
INSERT @tKeywords
SELECT @cCleanStringArg
SELECT @nRowCountVar = @@ROWCOUNT
WHILE @nRowCountVar > 0
BEGIN
INSERT @tKeywords
SELECT DISTINCT STUFF(k.cWord, CHARINDEX(s.cWord, k.cWord), DATALENGTH(s.cWord), s.cSyn)
FROM @tKeywords k
JOIN @tSynonyms s ON CHARINDEX(s.cWord, k.cWord) > 0
AND NOT EXISTS (SELECT *
FROM @tKeywords k2
WHERE k2.cWord = STUFF(k.cWord, CHARINDEX(s.cWord, k.cWord), DATALENGTH(s.cWord), s.cSyn))
SELECT @nRowCountVar = @@ROWCOUNT
INSERT @tKeywords
SELECT DISTINCT STUFF(k.cWord, CHARINDEX(s.cSyn, k.cWord), DATALENGTH(s.cSyn), s.cWord)
FROM @tKeywords k
JOIN @tSynonyms s ON CHARINDEX(s.cSyn, k.cWord) > 0
AND NOT EXISTS (SELECT *
FROM @tKeywords k2
WHERE k2.cWord = STUFF(k.cWord, CHARINDEX(s.cSyn, k.cWord), DATALENGTH(s.cSyn), s.cWord))
SELECT @nRowCountVar += @@ROWCOUNT
END
INSERT @tRS
SELECT nGroup_Id,
cRow
FROM (SELECT nGroup_Id,
cWord
FROM @tKeywords) k
CROSS APPLY Project.dbo.ufn_DelimitedStringToTable (k.cWord, ' ')
RETURN
END
GO
July 12, 2017 at 12:52 pm
TheSQLGuru - Wednesday, July 12, 2017 12:42 PMThat code makes my brain hurt!! 😀 But if it works I'm moving on ... LOL
Might wanna pop a couple more percs 😛
Thanks for taking the time.
July 12, 2017 at 3:50 pm
Here's another way (based on the original sample data with two table variables) that uses dynamic TSQL. Of course, all the usual caveats with such things apply (protecting against injection, etc.); this is just to illustrate another technique:
DECLARE
@CASE VARCHAR(MAX)=' CASE '
,@FROM VARCHAR(MAX)=' FROM (SELECT cWord FROM #temp WHERE cWord=cSyn) t '
,@ORDER VARCHAR(MAX)=''
,@SQL VARCHAR(MAX)
;
IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp;
SELECT cWord, cSyn=cWord
INTO #temp
FROM @tKeywords
UNION ALL
SELECT cWord, cSyn
FROM @tSynonyms ts
WHERE EXISTS (SELECT 0 FROM @tKeywords tk WHERE ts.cWord=tk.cWord)
;
SELECT @CASE=@CASE+'WHEN t.cWord='''+cWord+''' THEN '+QUOTENAME(cWord)+'.cSyn ',
@FROM=@FROM+'OUTER APPLY (SELECT cSyn FROM #temp WHERE cWord='''+cWord+''') '+QUOTENAME(cWord)+' ',
@ORDER=@ORDER+', '+QUOTENAME(cWord)+'.cSyn'
FROM @tKeywords;
SELECT @CASE=@CASE+' END ',
@ORDER=STUFF(@ORDER,1,1,'');
SELECT @sql='SELECT group_id=DENSE_RANK() OVER (ORDER BY '+@ORDER+'), cWord='+@CASE+@FROM+' ORDER BY group_id ASC, t.cWord ASC';
EXEC(@SQL);
Cheers!
July 13, 2017 at 9:06 am
Jacob Wilkins - Wednesday, July 12, 2017 3:50 PMHere's another way (based on the original sample data with two table variables) that uses dynamic TSQL. Of course, all the usual caveats with such things apply (protecting against injection, etc.); this is just to illustrate another technique:
DECLARE
@CASE VARCHAR(MAX)=' CASE '
,@FROM VARCHAR(MAX)=' FROM (SELECT cWord FROM #temp WHERE cWord=cSyn) t '
,@ORDER VARCHAR(MAX)=''
,@SQL VARCHAR(MAX)
;IF OBJECT_ID('tempdb..#temp') IS NOT NULL DROP TABLE #temp;
SELECT cWord, cSyn=cWord
INTO #temp
FROM @tKeywords
UNION ALL
SELECT cWord, cSyn
FROM @tSynonyms ts
WHERE EXISTS (SELECT 0 FROM @tKeywords tk WHERE ts.cWord=tk.cWord)
;SELECT @CASE=@CASE+'WHEN t.cWord='''+cWord+''' THEN '+QUOTENAME(cWord)+'.cSyn ',
@FROM=@FROM+'OUTER APPLY (SELECT cSyn FROM #temp WHERE cWord='''+cWord+''') '+QUOTENAME(cWord)+' ',
@ORDER=@ORDER+', '+QUOTENAME(cWord)+'.cSyn'
FROM @tKeywords;SELECT @CASE=@CASE+' END ',
@ORDER=STUFF(@ORDER,1,1,'');SELECT @sql='SELECT group_id=DENSE_RANK() OVER (ORDER BY '+@ORDER+'), cWord='+@CASE+@FROM+' ORDER BY group_id ASC, t.cWord ASC';
EXEC(@SQL);
Cheers!
Brilliant work! This is excellent!
I spent a couple hours trying to come up with a more set based way to do this - it looks really simple but I ended up pulling my hair out.
-- Itzik Ben-Gan 2001
July 13, 2017 at 10:10 am
@alan.B: Thank you sir! Coming from you that's especially high praise 🙂
I absolutely agree; this is a deceptively difficult exercise, and I spent a couple hours last night trying to find something better as well.
I did come up with one other possibility, based on a little math trick to make sure the groups are distributed correctly.
In the first version I used CTEs to do all the required aggregations, but some of them are just painful to do that way.
I switched to using a series of temp tables instead, which makes it much faster, especially since the running product can be calculated using a quirky update style approach (the order in which it's calculated doesn't actually matter).
In my tests with >100 synonyms for some of the keywords, it generally outperforms the dynamic TSQL approach, but it's a lot more bizarre 🙂
Basically, the idea is to count the synonyms for each keyword (including the keyword itself). Then, you number each synonym within a cWord group.
Then, you calculate a running product of the count per keyword, which is used to give both the running product for each cWord and the total number of combinations.
Once you have all that, you can use a tally to generate a possible row for each synonym/group_id combination, and distribute synonyms into groups correctly by requiring that row_number-1=((group_id-1)/(total_combinations/running_product))%count
Essentially you're splitting the possibility space into buckets, So, let's say you have just 2 keywords with 2 possible values each (the keyword itself and one synonym).
For the first cWord (in arbitrary order), the running product is 2. For the second cWord, the running product is 4, which is also the total number of combinations.
The first cWord is then split into buckets of size 2 (total_combinations/running_product) over which its values cycle. So, group_ids 1 and 2 would have the same value, and group_ids 3 and 4 would have the other value.
The second cWord is split into buckets of size 1 (total_combinations/running_product) over which its values cycle. By essentially making sure that each successive cWord cycles through its values within a single cycle of the previous cWord (again, in arbitrary order), you make sure you get all combinations.
I still prefer the dynamic TSQL approach, honestly, but this one was more fun to write 🙂
IF OBJECT_ID('tempdb..#unified') IS NOT NULL DROP TABLE #unified;
CREATE TABLE #unified (rn BIGINT, cWord VARCHAR(255), cSyn VARCHAR(255));
INSERT INTO #unified
SELECT rn=ROW_NUMBER() OVER (PARTITION BY cWord ORDER BY cSyn)-1, cWord, cSyn
FROM
(
SELECT cWord, cSyn=cWord
FROM @tKeywords
UNION ALL
SELECT cWord, cSyn
FROM @tSynonyms ts
)x
;
IF OBJECT_ID('tempdb..#prods') IS NOT NULL DROP TABLE #prods;
SELECT cWord, cnt=COUNT(*),moving_prod=0 INTO #prods FROM #unified GROUP BY cWord;
DECLARE @agg_prod INT=1;
UPDATE #prods SET @agg_prod=moving_prod=cnt*@agg_prod OPTION (MAXDOP 1);
IF OBJECT_ID('tempdb..#some_aggs') IS NOT NULL DROP TABLE #some_aggs;
CREATE TABLE #some_aggs (cWord VARCHAR(255), cnt BIGINT, moving_prod BIGINT, marker BIGINT, N BIGINT);
INSERT INTO #some_aggs SELECT cWord, cnt, moving_prod, marker=((tn.N-1)/(@agg_prod/moving_prod))%cnt,N FROM #prods p CROSS JOIN dbo.TallyN(@agg_prod) tn;
SELECT group_id=s.N, cWord=u.cSyn
FROM #unified u
INNER JOIN
#some_aggs s ON u.cWord=s.cWord AND u.rn=s.marker
ORDER BY s.N ASC, u.cWord ASC
OPTION (RECOMPILE);
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply