Count Data in Large Seprated Text

Question

Count Data in Large Seprated Text

Viewing 10 posts - 16 through 24 (of 24 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 1

dwain.c (4/8/2012)
It turns out this will be extensible to when you, despite others recommendations to not do this, you extend your strings to contain other values besides City and Country. 😛

Nice code. It's likely going to be a bit slower than using a Tally Table splitter though. Concatenation is relatively expensive.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86978 More actions · Answer 2

Jeff,

Thanks for the compliment even if it was a bit hesitant.

You can't blame me for wanting to have a bit of fun with this one as I'm still in my learning years!

I knew already, having worked with XML solutions like this in the past, that performance isn't always that great. But sometimes I just like to offer an unusual approach. While performance is important, there are other concerns sometimes when the performance differential isn't that big.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 3

dwain.c (4/9/2012)
Jeff,
Thanks for the compliment even if it was a bit hesitant.
You can't blame me for wanting to have a bit of fun with this one as I'm still in my learning years!
I knew already, having worked with XML solutions like this in the past, that performance isn't always that great. But sometimes I just like to offer an unusual approach. While performance is important, there are other concerns sometimes when the performance differential isn't that big.

Not really true. What if someone comes along and sees some code that will work for their task, but doesn't realize that the code they copied is inefficient for the task they are using it for?

Dwain Camps SSC Guru Points: 86978 More actions · Answer 4

Lynn,

Not really true. What if someone comes along and sees some code that will work for their task, but doesn't realize that the code they copied is inefficient for the task they are using it for?

The difference is rational behaviour. If someone copies some code because it looks like it will work for their situation but it performs more poorly than another method, and they have not investigated alternatives, that is because they are lazy.

If I choose to use a slightly slower performing version of a piece of code because of maintainability, extensibility or sometimes just to maintain consistency within a solution, I have made a rational decision. Granted, my rational thinking may not always be apparent to followers, nor would it necessarily stand the test of time, but it was rational for the time and place.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

ahmedhussein787 SSC Veteran Points: 231 More actions · Answer 5

thanks dwain.c

what if the cell may contain more than one country or more than one city

SELECT 1, 'Ronaldo', 'Country:Spain, City:Madrid, Country:Spain'

UNION ALL SELECT 2, 'Messi','Country:Spain, City:Barcelona, City:Madrid'

UNION ALL SELECT 3,'Rooney','Country:England, City:Manchester'

UNION ALL SELECT 4,'Robben','Country:Netherlands, City:Amsterdam'

UNION ALL SELECT 5,'Del Piero','Country:Italy, City:Milan'

UNION ALL SELECT 6,'Totti','Country:Italy, City:Milan'

UNION ALL SELECT 7,'Benzema','Country:Spain, City:Madrid'

UNION ALL SELECT 8,'Ozil','Country:Spain, City:Madrid'

UNION ALL SELECT 9,'KAKA','Country:Spain, City:Madrid'

UNION ALL SELECT 10,'Beckham','Country:England, City:Manchester'

UNION ALL SELECT 11,'Dwain', 'Country:Thailand'

how can i handle that

ahmedhussein787 SSC Veteran Points: 231 More actions · Answer 6

I tried that ...to convert each attribute as separated Element

-----------------------------------------------------------------------

--------------------------------------------------------------------------

DECLARE @l TABLE (ID INT, [Name] VARCHAR(30), Location VARCHAR(100))

INSERT INTO @l (ID, [Name], Location)

SELECT 1, 'Ronaldo', 'Country:Spain, City:Madrid, Country:France'

UNION ALL SELECT 2, 'Messi','Country:Spain, City:Barcelona'

UNION ALL SELECT 3,'Rooney','Country:England, City:Manchester'

;WITH CTE AS (

SELECT CAST('<R><L '+REPLACE(REPLACE(Location, ':', '="'), ',', '"/></R><R><L ') + '"/></R>' AS XML) AS Y

FROM @l)

SELECT Country, COUNT(Country)

FROM (

SELECT x.I.value('@Country[1]', 'VARCHAR(MAX)') AS Country

FROM (SELECT * FROM CTE) z

CROSS APPLY Y.nodes('//R/L') AS x(I)) w

GROUP BY Country

----------------------------------------------------------------

--------------------------------------------------------------------------

but always the first record come with null value with count 0 ....i don't know why ...could you tell me how avoid that ...without put that condition in select statement "WHERE Country is not null"

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 7

ahmedhussein787 (5/8/2012)
thanks dwain.c
what if the cell may contain more than one country or more than one city
SELECT 1, 'Ronaldo', 'Country:Spain, City:Madrid, Country:Spain'
UNION ALL SELECT 2, 'Messi','Country:Spain, City:Barcelona, City:Madrid'
UNION ALL SELECT 3,'Rooney','Country:England, City:Manchester'
UNION ALL SELECT 4,'Robben','Country:Netherlands, City:Amsterdam'
UNION ALL SELECT 5,'Del Piero','Country:Italy, City:Milan'
UNION ALL SELECT 6,'Totti','Country:Italy, City:Milan'
UNION ALL SELECT 7,'Benzema','Country:Spain, City:Madrid'
UNION ALL SELECT 8,'Ozil','Country:Spain, City:Madrid'
UNION ALL SELECT 9,'KAKA','Country:Spain, City:Madrid'
UNION ALL SELECT 10,'Beckham','Country:England, City:Manchester'
UNION ALL SELECT 11,'Dwain', 'Country:Thailand'
how can i handle that

What do you expect the output to be for rows 1 and 2?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

ahmedhussein787 SSC Veteran Points: 231 More actions · Answer 8

The Output shall get the Count For Each Country and Each City

Dwain Camps SSC Guru Points: 86978 More actions · Answer 9

Is this the result you are looking for:

DECLARE @l TABLE (ID INT, [Name] VARCHAR(30), Location VARCHAR(100))

DECLARE @d CHAR(1)

SET @d = ','-- Delimiter

INSERT INTO @l (ID, [Name], Location)

SELECT 1, 'Ronaldo', 'Country:Spain, City:Madrid, Country:Spain'

UNION ALL SELECT 2, 'Messi','Country:Spain, City:Barcelona, City:Madrid'

UNION ALL SELECT 3,'Rooney','Country:England, City:Manchester'

UNION ALL SELECT 4,'Robben','Country:Netherlands, City:Amsterdam'

UNION ALL SELECT 5,'Del Piero','Country:Italy, City:Milan'

UNION ALL SELECT 6,'Totti','Country:Italy, City:Milan'

UNION ALL SELECT 7,'Benzema','Country:Spain, City:Madrid'

UNION ALL SELECT 8,'Ozil','Country:Spain, City:Madrid'

UNION ALL SELECT 9,'KAKA','Country:Spain, City:Madrid'

UNION ALL SELECT 10,'Beckham','Country:England, City:Manchester'

UNION ALL SELECT 11,'Dwain', 'Country:Thailand'

SELECT [CountryCity], [Name], COUNT([Name]) AS [Count]

FROM (

SELECT SUBSTRING(LTRIM(strcol), 1, CHARINDEX(':', LTRIM(strcol))) AS [CountryCity]

,SUBSTRING(LTRIM(strcol), 1+CHARINDEX(':', LTRIM(strcol)), LEN(LTRIM(strcol))) AS [Name]

FROM @l

CROSS APPLY SplitString(',', Location)) x

GROUP BY [CountryCity], [Name]

Note that SplitString is any string splitter function (like the excellent one from Jeff posted here: http://www.sqlservercentral.com/articles/Tally+Table/72993/) and strcol is the name of the column returned by that SplitString function.

The SplitString function I used may return a different column name than Jeff's.

The result produced by the above is:

CountryCityNameCount

City:Amsterdam1

City:Barcelona1

Country:England2

Country:Italy2

City:Madrid5

City:Manchester2

City:Milan2

Country:Netherlands1

Country:Spain6

Country:Thailand1

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

ahmedhussein787 SSC Veteran Points: 231 More actions · Answer 10

ahmedhussein787

SSC Veteran

Points: 231

May 9, 2012 at 6:03 am

#1484995

Thanks 🙂