Grouping the words which has maximum count

  • Hi All,

    I have to highest two word in the columns. It should have the maximum two words combination.  I have attached the excel for reference.

     

     

    Attachments:
    You must be logged in to view attached files.
  • This was removed by the editor as SPAM

  • This was removed by the editor as SPAM

  • Is it possible to have an sql-script posted as code to reproduce your question so we don't have to open a possible dangerous excelfile

  • This is my SQL Query to get the single words. I need two words grouping

     

    SELECT value AS keyword, COUNT(*) AS keyword_count

    FROM (

    SELECT value

    FROM [title_one_space]

    CROSS APPLY STRING_SPLIT([title], ' ')

    ) keywords

    GROUP BY value

    ORDER BY keyword_count DESC;

  • This is the column

  • Need the result like this

     

  • We cannot dump an image of your data in SSMS to test any code.

    So, I made up some dummy data for myself.

    The following code appears to do what you are looking for (NOTE the extra parameter in STRING_SPLIT)

    WITH cteTwoWords AS (
    SELECT TwoWords = ss.value + ' ' + LEAD(ss.value) OVER (PARTITION BY td.[title] ORDER BY ss.ordinal)
    FROM [title_one_space] AS td
    CROSS APPLY STRING_SPLIT(td.[title], ' ', 1) AS ss
    )
    SELECT TwoWord_group = cte.TwoWords
    , TwoWord_count = COUNT(*)
    FROM cteTwoWords AS cte
    WHERE TwoWords IS NOT NULL
    GROUP BY TwoWords
    ORDER BY COUNT(*) DESC;
  • This is the error message I had got

     

    Msg 8144, Level 16, State 3, Line 17

    Procedure or function STRING_SPLIT has too many arguments specified.

    Completion time: 2023-05-03T21:27:20.2644947+05:30

  • That error seems to indicate you're not using SQL Server 2022. (Which is probably why DesNorton wrote " (NOTE the extra parameter in STRING_SPLIT)"

    Are you on SQL Server 2022? You posted in a 2022 forum.

  • As ratbak indicated, you have posted in a SQL 2022 forum.  As such, my proposed solution uses functionality that is available in SQL 2022.

    If you are using SQL 2012 to 2019, then I suggest that you read Reaping the benefits of the Window functions in T-SQL by Eirikur Eiriksson.  Under the resources section, you will find a link to download the code for CREATE_FUNCTION_dbo_DelimitedSplit8K_LEAD, which you can install and use.

    My suggested solution is updated to use DelimitedSplit8K_LEAD

    WITH cteTwoWords AS (
    SELECT TwoWords = ss.Item + ' ' + LEAD(ss.Item) OVER (PARTITION BY td.[title] ORDER BY ss.ItemNumber)
    FROM [title_one_space] AS td
    CROSS APPLY dbo.DelimitedSplit8K_LEAD(td.[title], ' ') AS ss
    )
    SELECT TwoWord_group = cte.TwoWords
    , TwoWord_count = COUNT(*)
    FROM cteTwoWords AS cte
    WHERE TwoWords IS NOT NULL
    GROUP BY TwoWords
    ORDER BY COUNT(*) DESC;
  • If you are using an earlier version of SQL, then I suggest that you read Tally OH! An Improved SQL 8K “CSV Splitter” Function by Jeff Moden.  Under the resources section, you will find a link to download the code for the function, which you can install and use.  There is even a version that will work with nvarchar(4000) strings.

    My suggested solution is updated to use DelimitedSplit8K

    WITH cteTwoWords AS (
    SELECT TwoWords = ss.Item + ' ' + LEAD(ss.Item) OVER (PARTITION BY td.[title] ORDER BY ss.ItemNumber)
    FROM [title_one_space] AS td
    CROSS APPLY dbo.DelimitedSplit8K(td.[title], ' ') AS ss
    )
    SELECT TwoWord_group = cte.TwoWords
    , TwoWord_count = COUNT(*)
    FROM cteTwoWords AS cte
    WHERE TwoWords IS NOT NULL
    GROUP BY TwoWords
    ORDER BY COUNT(*) DESC;
  • This was the error message

    Msg 208, Level 16, State 1, Line 14

    Invalid object name 'dbo.DelimitedSplit8K'

     

  • sathishkm wrote:

    This was the error message

    Msg 208, Level 16, State 1, Line 14 Invalid object name 'dbo.DelimitedSplit8K'

    Did you download and install dbo.DelimitedSplit8K on your SQL instance?

  • An alternative

    drop table if exists #title_one_space;
    go
    select * into #title_one_space from (values
    ('one word')
    ,('one word two words')
    ,('one word two words three words')
    ,('one word two words three words four')) v(title);

    select x.two_word, count(*) two_word_count
    from #title_one_space tos
    cross apply dbo.fnTally(1, len(title)-len(replace(title, ' ', ''))) fn
    cross apply (select string_agg(splt.value, ' ')
    within group (order by splt.ordinal)
    from string_split(tos.title, ' ', 1) splt
    where splt.ordinal between fn.n and fn.n+1) x(two_word)
    group by x.two_word
    order by two_word_count desc;

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply