August 3, 2016 at 11:10 am
I have a table called Resume with columns in Text, Word and PDF format. I enabled Full Text Search on the table and able to search the keyword through three columns and also able to rank the profile according to the keyword search .. Used below query snippet for the same:
SELECT ct., ct.[RANK], br.ProfileID ,br.TextVersion, k.KeyWordName
FROM CONTAINSTABLE( dbo.tblResumeTag, TextVersion , @SearchPhrase) as ct
INNER JOIN dbo.tblResumeTag
ON ct. = br.ID
INNER JOIN dbo.tblKeywords k
ON br.TextVersion LIKE '%' + k.KeyWordName + '%'
AND k.KeyWordName LIKE '%' + @SearchPhrase + '%'
ORDER BY ct.[RANK] DESC
Now I want show the occurence(Count ) of the @search phrase in Textversion col, which I am struggling to return the exact count. Any suggestions will be helpful.
Thank you
August 3, 2016 at 1:51 pm
In SQL 2016 you can use the STRING_SPLIT() function to convert your long string value to a series of individual words and then use standard COUNT(*) GROUP BY to get your totals. In the code below, a couple of REPLACE functions were needed so that commas and periods would not be included as part of a word.
d eclare @treatise varchar(max) = ''
d eclare @x int = 0
while @x < 10000
begin
set @x = @x + 1
set @treatise = @treatise + 'Now is the time for all good men to come to the aid of their country. It is of the greatest importance. Man up, men, for fate awaits your country. '
end
;with cte as (select replace(replace(value,'.',''),',','') as Word from string_split(@treatise,' ') )
select word, count(*)
from cte
where word > ''
group by word
order by count(*) desc
Sorry about the D ECLARE, but I couldn't submit the code without making the change. In your code, you should substitute Word > '' in the WHERE clause with Word LIKE your keyword search strings.
Looking at your example, it seems that you want to enter the word BOOK, and if the words BOOKEND and BOOKWORM are in your keyword table, you want to find them, but not words like BOOKING. Is this correct?
Finally I have to ask how big your tables are going to get. I think you are going to take a big performance hit from your joins if you are running this against a large number of rows. If you are going to have hundreds of texts needing to be evaluated, you should first filter them down to WHERE br.TextVersion like @Searchphrase. (If your search phrase isn't in the text, none of the keywords you want will be in the text.) At the very least try producing the list of keywords you want separately with a cte or derived table, then join ON br.TextVersion LIKE '%' + k.KeyWordName + '%'. JOIN clauses should be kept simple as possible with filtering pushed out to the WHERE clause.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
August 3, 2016 at 2:34 pm
Thank you so much !! Appreciate your help. Yeah got it right I want to find words with '%BOOK% ' . The table does have approx 20,000 rows.
August 3, 2016 at 9:16 pm
I am using SQL server 2014 .. Apologies for posting in wrong section.
I don't understand the use of 'value' ... what does that stands for ?
with cte as (select replace(replace(value,'.',''),',','') as Word from string_split(@treatise,' ') )
Thank you
August 3, 2016 at 9:59 pm
shambhavi13 (8/3/2016)
I have a table called Resume with columns in Text, Word and PDF format. I enabled Full Text Search on the table and able to search the keyword through three columns and also able to rank the profile according to the keyword search .. Used below query snippet for the same:SELECT ct., ct.[RANK], br.ProfileID ,br.TextVersion, k.KeyWordName
FROM CONTAINSTABLE( dbo.tblResumeTag, TextVersion , @SearchPhrase) as ct
INNER JOIN dbo.tblResumeTag
ON ct. = br.ID
INNER JOIN dbo.tblKeywords k
ON br.TextVersion LIKE '%' + k.KeyWordName + '%'
AND k.KeyWordName LIKE '%' + @SearchPhrase + '%'
ORDER BY ct.[RANK] DESC
Now I want show the occurence(Count ) of the @search phrase in Textversion col, which I am struggling to return the exact count. Any suggestions will be helpful.
Thank you
As a bit of a sidebar, I think that this is a really poor way to evaluate resumes. If someone were looking for "SSIS" as a key word, I could type "I hate SSIS" in obfuscated text 100 times and this method would pick me as a leader for "SSIS" skills just based on a word count.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 4, 2016 at 7:16 am
shambhavi13 (8/3/2016)
I have a table called Resume with columns in Text, Word and PDF format. I enabled Full Text Search on the table and able to search the keyword through three columns and also able to rank the profile according to the keyword search .. Used below query snippet for the same:SELECT ct., ct.[RANK], br.ProfileID ,br.TextVersion, k.KeyWordName
FROM CONTAINSTABLE( dbo.tblResumeTag, TextVersion , @SearchPhrase) as ct
INNER JOIN dbo.tblResumeTag
ON ct. = br.ID
INNER JOIN dbo.tblKeywords k
ON br.TextVersion LIKE '%' + k.KeyWordName + '%'
AND k.KeyWordName LIKE '%' + @SearchPhrase + '%'
ORDER BY ct.[RANK] DESC
Now I want show the occurence(Count ) of the @search phrase in Textversion col, which I am struggling to return the exact count. Any suggestions will be helpful.
Thank you
See this article: Nasty Fast N-Grams (Part 1)[/url]. Specifically the section titled, "basic text analytics on resumes stored in a table".
This is exactly the kind of thing that a good NGrams function is for.
-- Itzik Ben-Gan 2001
August 4, 2016 at 9:44 am
Thank you . I will look into the same.
August 4, 2016 at 10:22 am
shambhavi13 (8/3/2016)
I am using SQL server 2014 .. Apologies for posting in wrong section.I don't understand the use of 'value' ... what does that stands for ?
with cte as (select replace(replace(value,'.',''),',','') as Word from string_split(@treatise,' ') )
Thank you
Value is simply the name of the column returned by the STRING_SPLIT() function.
If you aren't on 2016 yet, you should go HERE[/url] to get a copy of Jeff Moden's excellent delimited splitter function. ( By the way, Jeff. Would you trust any paper resume' nowadays? )
Also, with respect to the N-grams. All you are asking for is unigrams (in your example, individual words), and the rest of it is probably beyond the scope of your current problem.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
August 4, 2016 at 11:02 am
Also, with respect to the N-grams. All you are asking for is unigrams (in your example, individual words), and the rest of it is probably beyond the scope of your current problem.
In this case it would be <size of the search word>-grams...
-- Resume Table
DECLARE @tblResumeTag TABLE ( int identity primary key, TextVersion varchar(8000));
INSERT @tblResumeTag(TextVersion) VALUES
('Joe Doe - Objective: To use my many years of SQL experience... blah blah SQL...
blah SQL this & SQL that... SSIS.. SSIS.. SSIS..'),
('John Green, PMP, MSM - SQL Guru... yada yada SQL... blah blah blah... SSRS... SSAS'),
('Sue Jones Objective: To utilize my awesome skills and experience... yada SQL...
blah SQL this & SQL that... more experience with SQL... SSIS... SSRS');
-- Keywords table
DECLARE @keywords TABLE (KeyWordName varchar(100));
INSERT @keywords VALUES('SQL'),('SSIS'),('SSRS');
-- Getting all instances of the KeyWordName
SELECT , TextVersion, KeyWordName
FROM @tblResumeTag c
CROSS JOIN @keywords k
CROSS APPLY dbo.NGrams8k(c.TextVersion, LEN(k.KeyWordName))
WHERE token = k.KeyWordName
ORDER BY ; -- order by for display only, not required
-- To group by each keyword found
SELECT , KeyWordName, KWSubTotal = COUNT(*)
FROM @tblResumeTag c
CROSS JOIN @keywords k
CROSS APPLY dbo.NGrams8k(c.TextVersion, LEN(k.KeyWordName))
WHERE token = k.KeyWordName
GROUP BY , KeyWordName;
-- To group by number of all keywords found
SELECT , KWSubTotal = COUNT(*)
FROM @tblResumeTag c
CROSS JOIN @keywords k
CROSS APPLY dbo.NGrams8k(c.TextVersion, LEN(k.KeyWordName))
WHERE token = k.KeyWordName
GROUP BY ;
And doing it like this might outperform a splitter-based approach.
-- Itzik Ben-Gan 2001
August 4, 2016 at 12:13 pm
I've read the Nasty Fast N-Grams article and the approach is very similar to Jeff's DelimitedSplit8K() function. The primary difference being that the N-grams would be fixed length strings starting at every Nth character of the input string while the words produced by a delimited split would be of varying length, using space(1) as a delimiter.
By using fixed length strings the N-gram approach avoids having to use charindex to find the end of the word. That means each fixed-length N-gram requires slightly fewer CPU cycles to identify than would a word. But a much larger number of N-gram rows are being produced, as a opposed to word rows. These will have to be filtered which would add some cycles back. I'm sure the size of the input string(s) is important to the relative performance, but I haven't tested both techniques.
How would the N-gram handle a situation where the initial search identifies BOOKWORM and BOOKEND as two keywords to match? They have different lengths. Would two passes be necessary? I'm genuinely curious because, although the benefits of bigrams, trigrams, etc. are clear, the benefits of unigrams over words in this instance is muddy (to me).
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
August 4, 2016 at 12:48 pm
ALan....consider the following please
DECLARE @tblResumeTag TABLE ( int identity primary key, TextVersion varchar(8000));
INSERT @tblResumeTag(TextVersion) VALUES
('I accelerated the development of the system by making rationale decisions based on my collaboration with senior corporate managers')
DECLARE @keywords TABLE (KeyWordName varchar(100));
INSERT @keywords VALUES('Rat')
SELECT , TextVersion, KeyWordName
FROM @tblResumeTag c
CROSS JOIN @keywords k
CROSS APPLY dbo.NGrams8k(c.TextVersion, LEN(k.KeyWordName))
WHERE token = k.KeyWordName
is there a way to use your ngrams for whole words?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
August 4, 2016 at 1:26 pm
If you were looking for 'Rat' (the word) you could look for Ngrams like '% RAT %'. You'd still have to use REPLACE to handle punctuation marks. The problem I see is that multiple N-grams would be produced.
For example if we were doing unigrams of length 7.
'He is a rat Mugsy.' becomes
'He is a'
'e is a '
' s a ra'
's a rat'
' a rat '
'a rat M'
' rat MU'
etc
The last three of these would satisfy LIKE '% RAT %'.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
August 8, 2016 at 4:32 pm
Thank you all for your help. But as I was working on the queries I realized I have three columns in my table namely Text, Word and PDF Resumes .. I want to display the candidate rank as an avg of all the three columns . Any suggestions will be helpful. And also currently I am using if else statement to choose the column is there any better way so that I can search all three columns at once ?
August 8, 2016 at 4:59 pm
This kind of begs the question... If I had a resume like
SQL Developer 2<24 months>
Company ABC
SSMS, SSRS, SSAS
SQL Developer 1<36 months>
Company XYZ
SMS, SSRS
how would you count my experience? doing word counts seems kind of a simplistic metric. would it not be better to do {skill, time period} at the very least?
August 8, 2016 at 5:34 pm
You are absolutely right... but the idea here is just to tag the candidate with skills like JAVA(6), C++(5) .. where 6, 5 is the number of keyword occurrence.
Viewing 15 posts - 1 through 15 (of 27 total)
You must be logged in to reply to this topic. Login to reply