April 22, 2016 at 12:00 am
Comments posted to this topic are about the item Term Extraction Tokens
April 22, 2016 at 12:02 am
This was removed by the editor as SPAM
April 22, 2016 at 12:22 am
Very good question for the end of week, had to do a bit research on this so learnt something new, thank you.
...
April 22, 2016 at 1:54 am
Nice topic touched,
Thank you for the question
April 22, 2016 at 1:57 am
Never seen it before - no points to end the week!
April 22, 2016 at 1:59 am
paul s-306273 (4/22/2016)
Never seen it before - no points to end the week!
Having now read the tech. article, the answer is obvious.
April 22, 2016 at 5:01 am
Nice question. Thanks steve. Good timing , i'm going through some of SSIS topics on transformations in this week.
April 22, 2016 at 5:26 am
I'd never even heard of it before. Not being an SSIS user, I'm not surprised I missed it.
April 22, 2016 at 5:38 am
I am confused.
I never used this extraction and I cannot test it so I hope someone else will for the definitive answer. However, I expected the asnwer to be two based on reading exactly the article that is mentioned in the explanation.
The article describes the phases of the process. Splitting on word boundaries is the first, followed by tagging. In that phase "The" and "is" are discarded; "date" is tagged a noun; "January" is tagged a proper noun, and "4" and "2015" are tagged as number.
But these four words are not all returned, that is determined in a later phase depending on the configuration of the component - nouns only, noun phrases only, or both.
The word "date" clearly qualifies as a noun. "January", a proper noun, does the same. I am not sure if "4" and "2015" are added to make this a noun phrase or not, as all the examples focus on adjectives. So the second returned term might be either the noun "January" or the noun phrase "January 4, 2015".
Perhaps I am misreading something. Perhaps numbers are also considered nouns, in which case the article is misleading. I do not have an SSIS installation at hand so I cannot test it, but I hope someone else will, and then share the results here.
April 22, 2016 at 5:45 am
I was confused as well so tested it quickly - only Date and January are returned see attachment.
Also using the default settings nothing is returned. The frequency threshold needs to be changed from 2 to 1.
April 22, 2016 at 6:41 am
I had not seen this transformation before, but happen to be taking a course with an element of natural language processing right now. There are actually 6 tokens here. I would have gotten it wrong if 6 was a choice. So, I had to think through options to get a different answer. I correctly guessed that stopwords (the is) were ignored and correctly guessed 4.
Good timing for me personally with this question.
April 25, 2016 at 6:40 am
Thanks for the question.
April 26, 2016 at 3:41 am
Interesting question, thanks.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
May 9, 2016 at 11:07 am
I have been using tis one for a while, and only date and January will be returned. the numbers will be tokenized and discarded.
Joshua Perry
http://www.greenarrow.net
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply