Count Data in Large Seprated Text

  • dwain.c (4/8/2012)


    It turns out this will be extensible to when you, despite others recommendations to not do this, you extend your strings to contain other values besides City and Country. 😛

    Nice code. It's likely going to be a bit slower than using a Tally Table splitter though. Concatenation is relatively expensive.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff,

    Thanks for the compliment even if it was a bit hesitant.

    You can't blame me for wanting to have a bit of fun with this one as I'm still in my learning years!

    I knew already, having worked with XML solutions like this in the past, that performance isn't always that great. But sometimes I just like to offer an unusual approach. While performance is important, there are other concerns sometimes when the performance differential isn't that big.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (4/9/2012)


    Jeff,

    Thanks for the compliment even if it was a bit hesitant.

    You can't blame me for wanting to have a bit of fun with this one as I'm still in my learning years!

    I knew already, having worked with XML solutions like this in the past, that performance isn't always that great. But sometimes I just like to offer an unusual approach. While performance is important, there are other concerns sometimes when the performance differential isn't that big.

    Not really true. What if someone comes along and sees some code that will work for their task, but doesn't realize that the code they copied is inefficient for the task they are using it for?

  • Lynn,

    Not really true. What if someone comes along and sees some code that will work for their task, but doesn't realize that the code they copied is inefficient for the task they are using it for?

    The difference is rational behaviour. If someone copies some code because it looks like it will work for their situation but it performs more poorly than another method, and they have not investigated alternatives, that is because they are lazy.

    If I choose to use a slightly slower performing version of a piece of code because of maintainability, extensibility or sometimes just to maintain consistency within a solution, I have made a rational decision. Granted, my rational thinking may not always be apparent to followers, nor would it necessarily stand the test of time, but it was rational for the time and place.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • thanks dwain.c

    what if the cell may contain more than one country or more than one city

    SELECT 1, 'Ronaldo', 'Country:Spain, City:Madrid, Country:Spain'

    UNION ALL SELECT 2, 'Messi','Country:Spain, City:Barcelona, City:Madrid'

    UNION ALL SELECT 3,'Rooney','Country:England, City:Manchester'

    UNION ALL SELECT 4,'Robben','Country:Netherlands, City:Amsterdam'

    UNION ALL SELECT 5,'Del Piero','Country:Italy, City:Milan'

    UNION ALL SELECT 6,'Totti','Country:Italy, City:Milan'

    UNION ALL SELECT 7,'Benzema','Country:Spain, City:Madrid'

    UNION ALL SELECT 8,'Ozil','Country:Spain, City:Madrid'

    UNION ALL SELECT 9,'KAKA','Country:Spain, City:Madrid'

    UNION ALL SELECT 10,'Beckham','Country:England, City:Manchester'

    UNION ALL SELECT 11,'Dwain', 'Country:Thailand'

    how can i handle that

  • I tried that ...to convert each attribute as separated Element

    -----------------------------------------------------------------------

    --------------------------------------------------------------------------

    DECLARE @l TABLE (ID INT, [Name] VARCHAR(30), Location VARCHAR(100))

    INSERT INTO @l (ID, [Name], Location)

    SELECT 1, 'Ronaldo', 'Country:Spain, City:Madrid, Country:France'

    UNION ALL SELECT 2, 'Messi','Country:Spain, City:Barcelona'

    UNION ALL SELECT 3,'Rooney','Country:England, City:Manchester'

    ;WITH CTE AS (

    SELECT CAST('<R><L '+REPLACE(REPLACE(Location, ':', '="'), ',', '"/></R><R><L ') + '"/></R>' AS XML) AS Y

    FROM @l)

    SELECT Country, COUNT(Country)

    FROM (

    SELECT x.I.value('@Country[1]', 'VARCHAR(MAX)') AS Country

    FROM (SELECT * FROM CTE) z

    CROSS APPLY Y.nodes('//R/L') AS x(I)) w

    GROUP BY Country

    ----------------------------------------------------------------

    --------------------------------------------------------------------------

    but always the first record come with null value with count 0 ....i don't know why ...could you tell me how avoid that ...without put that condition in select statement "WHERE Country is not null"

  • ahmedhussein787 (5/8/2012)


    thanks dwain.c

    what if the cell may contain more than one country or more than one city

    SELECT 1, 'Ronaldo', 'Country:Spain, City:Madrid, Country:Spain'

    UNION ALL SELECT 2, 'Messi','Country:Spain, City:Barcelona, City:Madrid'

    UNION ALL SELECT 3,'Rooney','Country:England, City:Manchester'

    UNION ALL SELECT 4,'Robben','Country:Netherlands, City:Amsterdam'

    UNION ALL SELECT 5,'Del Piero','Country:Italy, City:Milan'

    UNION ALL SELECT 6,'Totti','Country:Italy, City:Milan'

    UNION ALL SELECT 7,'Benzema','Country:Spain, City:Madrid'

    UNION ALL SELECT 8,'Ozil','Country:Spain, City:Madrid'

    UNION ALL SELECT 9,'KAKA','Country:Spain, City:Madrid'

    UNION ALL SELECT 10,'Beckham','Country:England, City:Manchester'

    UNION ALL SELECT 11,'Dwain', 'Country:Thailand'

    how can i handle that

    What do you expect the output to be for rows 1 and 2?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The Output shall get the Count For Each Country and Each City

  • Is this the result you are looking for:

    DECLARE @l TABLE (ID INT, [Name] VARCHAR(30), Location VARCHAR(100))

    DECLARE @d CHAR(1)

    SET @d = ','-- Delimiter

    INSERT INTO @l (ID, [Name], Location)

    SELECT 1, 'Ronaldo', 'Country:Spain, City:Madrid, Country:Spain'

    UNION ALL SELECT 2, 'Messi','Country:Spain, City:Barcelona, City:Madrid'

    UNION ALL SELECT 3,'Rooney','Country:England, City:Manchester'

    UNION ALL SELECT 4,'Robben','Country:Netherlands, City:Amsterdam'

    UNION ALL SELECT 5,'Del Piero','Country:Italy, City:Milan'

    UNION ALL SELECT 6,'Totti','Country:Italy, City:Milan'

    UNION ALL SELECT 7,'Benzema','Country:Spain, City:Madrid'

    UNION ALL SELECT 8,'Ozil','Country:Spain, City:Madrid'

    UNION ALL SELECT 9,'KAKA','Country:Spain, City:Madrid'

    UNION ALL SELECT 10,'Beckham','Country:England, City:Manchester'

    UNION ALL SELECT 11,'Dwain', 'Country:Thailand'

    SELECT [CountryCity], [Name], COUNT([Name]) AS [Count]

    FROM (

    SELECT SUBSTRING(LTRIM(strcol), 1, CHARINDEX(':', LTRIM(strcol))) AS [CountryCity]

    ,SUBSTRING(LTRIM(strcol), 1+CHARINDEX(':', LTRIM(strcol)), LEN(LTRIM(strcol))) AS [Name]

    FROM @l

    CROSS APPLY SplitString(',', Location)) x

    GROUP BY [CountryCity], [Name]

    Note that SplitString is any string splitter function (like the excellent one from Jeff posted here: http://www.sqlservercentral.com/articles/Tally+Table/72993/) and strcol is the name of the column returned by that SplitString function.

    The SplitString function I used may return a different column name than Jeff's.

    The result produced by the above is:

    CountryCityNameCount

    City:Amsterdam1

    City:Barcelona1

    Country:England2

    Country:Italy2

    City:Madrid5

    City:Manchester2

    City:Milan2

    Country:Netherlands1

    Country:Spain6

    Country:Thailand1


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Thanks 🙂

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply