January 25, 2015 at 4:10 pm
Hello,
I have the following table with sample data:
create table ##test_table (LP varchar(100), Original varchar (250), New varchar (250))
insert into ##test_table
select '/page1' as LP, 'the quick brown fox jumped over lazy dog' as Original, '' as New
insert into ##test_table
select '/page1' as LP, '' as Original, 'the fat pig couldnt jumped over lazy dog' as New
What I'd like to do is have a query with 3 separate columns which displays the results, separated by commas:
Matching Words
Words In Original Not in New
Words in New Not in Original
The Results for
Matching Words should be: the, jumped, over, lazy, dog
Words In Original Not in New should be: quick, brown, fox
Words In NEW Not in Original should be: fat, pig, couldnt
Output Columns Should Be:
LP Original New Words In Original Not Words In NEW Not in Original
Thank You
January 26, 2015 at 1:23 am
Split the sentence into words, store each word in a new row. Then you can use INTERSECT and EXCEPT to get the results you want.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 26, 2015 at 10:06 pm
Thanks GilaMonster. So it's not possible as is for a query to answer this problem? As I will have many data sets, in rows, so not possible to split sentences into words.
January 27, 2015 at 2:12 am
Err, yes it is possible.
Query splits the sentences into words (look for a split function, Delimited8ksplit is fastest). If large, insert into temp tables, then use INTERSECT and EXCEPT to get your results.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
January 28, 2015 at 5:59 pm
Such an odd requirement could only come as homework. What Gail has suggested will work.
Here is another way that gets you the matching words (MW) column only.
create table #test_table (LP varchar(100), Original varchar (250), New varchar (250));
insert into #test_table
select '/page1' as LP, 'the quick brown fox jumped over lazy dog' as Original, '' as New
UNION ALL select '/page1' as LP, '' as Original, 'the fat pig couldnt jumped over lazy dog' as New;
WITH BothSentences AS
(
SELECT LP, Original=MAX(Original), New=MAX(New)
FROM #test_table
GROUP BY LP
),
AllWords AS
(
SELECT LP, item, grp=SUM(grp), itemnumber=MAX(itemnumber)
FROM
(
SELECT LP, grp=-1, item, itemnumber
FROM BothSentences
CROSS APPLY DelimitedSplit8K(Original, ' ')
UNION
SELECT LP, 1, item, itemnumber
FROM BothSentences
CROSS APPLY DelimitedSplit8K(New, ' ')
) a
GROUP BY LP, item
)
SELECT MW=STUFF(
(
SELECT ', ' + item
FROM AllWords b
WHERE grp = 0 AND a.LP = b.LP
ORDER BY itemnumber
FOR XML PATH('')
), 1, 2, '')
FROM AllWords a
GROUP BY LP;
GO
DROP TABLE #test_table;
Everything you need to get the other columns is available in the above. You just have to figure out what I'm doing and extract it. If you can do that, you get an A.
Edit: Post your final query so that we can all see what you learned.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
January 31, 2015 at 10:52 pm
Hi Dwain,
I tried you're query but am getting the following error:
(2 row(s) affected)
Msg 208, Level 16, State 1, Line 4
Invalid object name 'DelimitedSplit8K'.
By the thanks for helping..its appreciate.
February 1, 2015 at 11:30 am
Ok I read Jeff's article on http://www.sqlservercentral.com/articles/Tally+Table/72993/ and at first put the code in a seperate window and it didn't work, since i suppose the the #test table contained only one # tag. when i put it in same window as code you wrote, dwain it worked. i know there is a difference between sessions on # and ##, so thank you now let me try to understand it.
Thank You both.
February 2, 2015 at 6:01 pm
CELKO (2/2/2015)
This is a garbage pile; it has not key and no way to have a key.
I thought they called that a heap?
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 2, 2015 at 9:58 pm
Dwain.C,
I modified the grp=0, to a -1 and + 1, seperately, and the query got me the result expected, even though it wasn't in seperate columns, which is fine.
SELECT LP, item, grp=SUM(grp), itemnumber=MAX(itemnumber)
FROM
(
SELECT LP, grp=0, item, itemnumber
FROM BothSentences
CROSS APPLY DelimitedSplit8K(Original, ' ')
UNION
SELECT LP, 1, item, itemnumber
FROM BothSentences
CROSS APPLY DelimitedSplit8K(New, ' ')
) a
GROUP BY LP, item
)
I am wanting to add a new column, to display the word count next to the result of the column for matching / non matching...., i added this line to the query in front of the [MW]= STUFF that's all, but didn't work:
SELECT 1 + len([new] - len(replace ([new]),' ','')) AS WordCount, [MW]= STUFF
would you know how to add a word count column next to you're result set?
February 2, 2015 at 11:19 pm
Actually this is more like what I was thinking about.
create table #test_table (LP varchar(100), Original varchar (250), New varchar (250));
insert into #test_table
select '/page1' as LP, 'the quick brown fox jumped over lazy dog' as Original, '' as New
UNION ALL select '/page1' as LP, '' as Original, 'the fat pig couldnt jumped over lazy dog' as New;
WITH BothSentences AS
(
SELECT LP, Original=MAX(Original), New=MAX(New)
FROM #test_table
GROUP BY LP
),
AllWords AS
(
SELECT LP, item, grp=SUM(grp), itemnumber=MAX(itemnumber)
FROM
(
SELECT LP, grp=-1, item, itemnumber
FROM BothSentences
CROSS APPLY DelimitedSplit8K(Original, ' ')
UNION
SELECT LP, 1, item, itemnumber
FROM BothSentences
CROSS APPLY DelimitedSplit8K(New, ' ')
) a
GROUP BY LP, item
)
SELECT MW=STUFF(
(
SELECT ', ' + item
FROM AllWords b
WHERE grp = 0 AND a.LP = b.LP
ORDER BY itemnumber
FOR XML PATH('')
), 1, 2, '')
,ONotInN=STUFF(
(
SELECT ', ' + item
FROM AllWords b
WHERE grp = -1 AND a.LP = b.LP
ORDER BY itemnumber
FOR XML PATH('')
), 1, 2, '')
,NNotInO=STUFF(
(
SELECT ', ' + item
FROM AllWords b
WHERE grp = 1 AND a.LP = b.LP
ORDER BY itemnumber
FOR XML PATH('')
), 1, 2, '')
FROM AllWords a
GROUP BY LP;
GO
DROP TABLE #test_table;
Does that get you everything you want?
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
February 3, 2015 at 7:16 am
CELKO (2/2/2015)
--------------------------------------------------------------------------------
This is a garbage pile; it has not key and no way to have a key.
I thought they called that a heap?
A heap is a table without a clustered index. A table with a primary key may also be a heap. A table without a primary key is not good, or to use Celko's words, a garbage pile.
February 22, 2015 at 9:38 am
Hello Dwain.C,
Yes, thank you for your last update, it was spot on.
I modified you're original query and added a column to No_Of_Words, for the column result set of mw, but got 1 for answer, it should be 5, since i'm wanting it to count the result "the, jumped, over, lazy, dog", would you know what's wrong? ty
I put:
create table #test_table (LP varchar(100), Original varchar (250), New varchar (250));
insert into #test_table
select '/page1' as LP, 'the quick brown fox jumped over lazy dog' as Original, '' as New
UNION ALL select '/page1' as LP, '' as Original, 'the fat pig couldnt jumped over lazy dog' as New;
WITH BothSentences AS
(
SELECT LP, Original=MAX(Original), New=MAX(New)
FROM #test_table
GROUP BY LP
),
AllWords AS
(
SELECT LP, item, grp=SUM(grp), itemnumber=MAX(itemnumber)
FROM
(
SELECT LP, grp=-1, item, itemnumber
FROM BothSentences
CROSS APPLY DelimitedSplit8K(Original, ' ')
UNION
SELECT LP, 1, item, itemnumber
FROM BothSentences
CROSS APPLY DelimitedSplit8K(New, ' ')
) a
GROUP BY LP, item
)
SELECT len(LP) - len(replace(LP, ' ', '')) + 1 No_Of_Words, MW=STUFF(
(
SELECT ', ' + item
FROM AllWords b
WHERE grp = 0 AND a.LP = b.LP
ORDER BY itemnumber
FOR XML PATH('')
), 1, 2, '')
FROM AllWords a
GROUP BY LP;
February 22, 2015 at 5:26 pm
VegasL (2/22/2015)
Hello Dwain.C,Yes, thank you for your last update, it was spot on.
I modified you're original query and added a column to No_Of_Words, for the column result set of mw, but got 1 for answer, it should be 5, since i'm wanting it to count the result "the, jumped, over, lazy, dog", would you know what's wrong? ty
I put:
create table #test_table (LP varchar(100), Original varchar (250), New varchar (250));
insert into #test_table
select '/page1' as LP, 'the quick brown fox jumped over lazy dog' as Original, '' as New
UNION ALL select '/page1' as LP, '' as Original, 'the fat pig couldnt jumped over lazy dog' as New;
WITH BothSentences AS
(
SELECT LP, Original=MAX(Original), New=MAX(New)
FROM #test_table
GROUP BY LP
),
AllWords AS
(
SELECT LP, item, grp=SUM(grp), itemnumber=MAX(itemnumber)
FROM
(
SELECT LP, grp=-1, item, itemnumber
FROM BothSentences
CROSS APPLY DelimitedSplit8K(Original, ' ')
UNION
SELECT LP, 1, item, itemnumber
FROM BothSentences
CROSS APPLY DelimitedSplit8K(New, ' ')
) a
GROUP BY LP, item
)
SELECT len(LP) - len(replace(LP, ' ', '')) + 1 No_Of_Words, MW=STUFF(
(
SELECT ', ' + item
FROM AllWords b
WHERE grp = 0 AND a.LP = b.LP
ORDER BY itemnumber
FOR XML PATH('')
), 1, 2, '')
FROM AllWords a
GROUP BY LP;
Try using New or Original instead of LP in the LEN functions (not sure which words you are counting). Beware that this method may not work if you have repeating blanks in the data.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply