Count Words in Text Field and/or Column (frequency of use)

  • JonFox (12/6/2011)


    bitbucket-25253 (12/6/2011)


    In case Jeff Moden does not see your request, I believe this is the article he was referring to:

    http://www.sqlservercentral.com/articles/Tally+Table/72993/

    The articles title is:

    Tally OH! An Improved SQL 8K “CSV Splitter” Function

    By Jeff Moden, 2011/05/02

    The resources links are at the very end of the article.

    Oh wow, I hadn't seen that article before! I guess somehow I'd only seen the original article. That is really cool.

    And now you understand my affinity for beer popsicles and dust bunnies. 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I give, how does DelimitedSplit8K split the field MyData into single words?

    CREATE TABLE #TempSplit

    (MyData varchar(200))

    INSERT INTO #TempSplit

    (MyData)

    SELECT 'My SQL skills are broken' UNION ALL

    SELECT 'SQL Server Central rocks' UNION ALL

    SELECT 'My dog at my computer' UNION ALL

    SELECT 'The dog programs average for a dog' UNION ALL

    SELECT 'sql lower case SQL upper CASE' UNION ALL

    SELECT '111 222 3333 111'

  • Normally, each row would have some simple unique identifier (perhaps an IDENTITY column or somesuch) so that you could tell where each split word came from. With that in mind, take a look at the following example to do what you ask...

    DROP TABLE #TempSplit

    --===== Create a table to hold some test data.

    -- This isn't part of the solution.

    CREATE TABLE #TempSplit

    (

    SomeUniqueIdentifier INT IDENTITY(1,1),

    MyData VARCHAR(200)

    )

    ;

    --===== Populate the test table with some test data.

    -- This isn't part of the solution.

    INSERT INTO #TempSplit

    (MyData)

    SELECT 'My SQL skills are broken' UNION ALL

    SELECT 'SQL Server Central rocks' UNION ALL

    SELECT 'My dog at my computer' UNION ALL

    SELECT 'The dog programs average for a dog' UNION ALL

    SELECT 'sql lower case SQL upper CASE' UNION ALL

    SELECT '111 222 3333 111'

    ;

    --===== Demonstrate a simple "split" with the necessary

    -- columns to potentially do a proximity search as

    -- well as other things.

    SELECT source.SomeUniqueIdentifier,

    WordPosition = split.ItemNumber,

    Word = split.Item

    INTO #SplitWord

    FROM #TempSplit source

    CROSS APPLY dbo.DelimitedSplit8K(MyData,' ') split

    ;

    --===== Show the simple content of the table

    SELECT *

    FROM #SplitWord

    ORDER BY SomeUniqueIdentifier, WordPosition

    ;

    --===== Show a simple count of the different words.

    SELECT Word, COUNT(*)

    FROM #SplitWord

    GROUP BY Word

    ;

    Again, if you're going to use the splitter for such things, you may want to modify it to treat various symbols and punctuation marks as if they were spaces and then deduplicate the spaces before splitting on spaces.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff that works flawless, can't wait to run it against the comment database. Thank you!

    It is a neat tool. Would be interesting to run it against Sql Server Centrals forum posts.

  • I have to run this on a few different databases, some of which I only have read permissions. I'm assuming there is a way to work the function into the stored proc, but I can't seem to get it to work.

    SET ANSI_NULLS ON

    GO

    SET QUOTED_IDENTIFIER ON

    GO

    ALTER FUNCTION [dbo].[DelimitedSplit8K]

    --===== Define I/O parameters

    (@pString VARCHAR(8000), @pDelimiter CHAR(1))

    RETURNS TABLE WITH SCHEMABINDING AS

    RETURN

    --===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000...

    -- enough to cover VARCHAR(8000)

    WITH E1(N) AS (

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL

    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1

    ), --10E+1 or 10 rows

    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows

    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max

    cteTally(N) AS (--==== This provides the "zero base" and limits the number of rows right up front

    -- for both a performance gain and prevention of accidental "overruns"

    SELECT 0 UNION ALL

    SELECT TOP (DATALENGTH(ISNULL(@pString,1))) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4

    ),

    cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)

    SELECT t.N+1

    FROM cteTally t

    WHERE (SUBSTRING(@pString,t.N,1) = @pDelimiter OR t.N = 0)

    )

    --===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.

    SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY s.N1),

    Item = SUBSTRING(@pString,s.N1,ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000))

    FROM cteStart s

    ;

  • texpic (12/7/2011)


    I have to run this on a few different databases, some of which I only have read permissions. I'm assuming there is a way to work the function into the stored proc, but I can't seem to get it to work.

    Are you saying that you can't create the function in your database?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff, yes. In probably 1/2 the databases I have full permissions, the rest I work with temp files.

  • texpic (12/9/2011)


    Jeff, yes. In probably 1/2 the databases I have full permissions, the rest I work with temp files.

    Pull the data over into a local sandbox? There's really no solution you're going to have that'll run within reason if you can't change something to work with.

    Full Text Indexing is where I'd usually go from here and then dig into the index and histograms, but it's probably overkill if you're not familiar with it.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Ahh ok, I though a function could be part of the stored procedure, that the reason for it was to make it easier to reuse that part of the code. I'll figure out another way to do the servers I'm with read only. Thanks.

  • texpic (12/10/2011)


    Ahh ok, I though a function could be part of the stored procedure, that the reason for it was to make it easier to reuse that part of the code. I'll figure out another way to do the servers I'm with read only. Thanks.

    I'm confused. If you have the privs to make a stored procedure, you should have the privs to make a function. Also, if this is worth doing, why not have the DBA instantiate the splitter function for you on the servers you don't have privs on?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • All I have is read rights on a few servers (can't save a sp either to those databases). I'll just extract the data I need and move it. I was more interested in understanding how to move the function into a sp. Thought I've done that before with other functions.

  • texpic (12/10/2011)


    All I have is read rights on a few servers (can't save a sp either to those databases). I'll just extract the data I need and move it. I was more interested in understanding how to move the function into a sp. Thought I've done that before with other functions.

    Understood but that's the confusing part... you're allowed to create a Stored Procedure but not a Function?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • On the restricted servers I can execute a query, create temp tables, etc. Maybe I can create a sp on them but I'm not supposed to (pretty sure those permissions do not allow it).

  • I didn't find any code for the tally table that you mentioned here. Can you please let me know where I can find that

  • satya.kota2001 (5/21/2013)


    I didn't find any code for the tally table that you mentioned here. Can you please let me know where I can find that

    http://www.sqlservercentral.com/articles/62867/[/url]

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Viewing 15 posts - 16 through 30 (of 32 total)

You must be logged in to reply to this topic. Login to reply