Group on predominant fuzzy time

Question

Post reply

Group on predominant fuzzy time

Erin Ramsay

SSCrazy Eights

Points: 9728
More actions
October 4, 2013 at 10:22 am

#278698

Ok, this will get you started with sample data:
create table #groupings (
ID int,
EventName varchar(255),
Venue varchar(255),
EventDate datetime
)
insert into #groupings
values (1,'Football','Arena','2013-10-04 09:58:42.670'),
(2,'Football','Arena','2013-10-04 09:58:42.670'),
(3,'Football','Arena','2013-10-04 10:45:42.670'),
(4,'Football','Arena','2013-10-04 09:58:42.670'),
(1,'Soccer','Field','2013-10-04 09:58:42.670'),
(2,'Soccer','Field','2013-10-04 09:38:42.670'),
(3,'Soccer','Field','2013-10-04 09:58:42.670'),
(4,'Soccer','Field','2013-10-04 09:28:42.670')
select min(EventName) EventName, Venue, EventDate,
stuff((select ',' + cast(ID as varchar) from #groupings where Venue = g.venue and Eventdate = g.eventdate for xml path ('')),1,1,'') IDs
from #groupings g
group by Venue, EventDate
order by venue, eventdate
drop table #groupings
This gives me:
EventName VenueEventDate IDs
FootballArena 2013-10-04 09:58:42.6701,2,4
FootballArena 2013-10-04 10:45:42.6703
SoccerField 2013-10-04 09:28:42.6704
SoccerField 2013-10-04 09:38:42.6702
SoccerField 2013-10-04 09:58:42.6701,3
and what I need is:
EventName VenueEventDate IDs
FootballArena 2013-10-04 09:58:42.6701,2,3,4
SoccerField 2013-10-04 09:58:42.6701,2,3,4
I know, group on EventName and Venue instead of time but that's not possible as the Event names in my real life data are all different and I have no option but to group on Venue and DateTime (thus the min(EventName) to homogenize them).
What I don't know how to do is how to do something like:
group by Venue, fuzzy-time(EventDate +/- an hour). Oh, and if I really want the data to be correct I need the incorrect times to yield to the majority. If there are two identical times and one unique time I need the result to be the time the "majority" agrees on.
Ideas?
Erin

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

dva2007 SSCertifiable Points: 7707 More actions · Answer 1

Sorry could you please correct your SQL which populates the table?

Erin Ramsay SSCrazy Eights Points: 9728 More actions · Answer 2

(8 row(s) affected)

Works just fine on my machine. Anyone else having difficulties with the SQL?

dva2007 SSCertifiable Points: 7707 More actions · Answer 3

I had to change few things but try this:

create table #groupings (

ID int,

EventName varchar(255),

Venue varchar(255),

EventDate datetime

)

insert into #groupings

select 1,'Football','Arena','2013-10-04 09:58:42.670'

UNion

select 2,'Football','Arena','2013-10-04 09:58:42.670'

union

select 3,'Football','Arena','2013-10-04 10:45:42.670'

union

select 4,'Football','Arena','2013-10-04 09:58:42.670'

union

select 1,'Soccer','Field','2013-10-04 09:58:42.670'

union

select 2,'Soccer','Field','2013-10-04 09:38:42.670'

union

select 3,'Soccer','Field','2013-10-04 09:58:42.670'

union

select 4,'Soccer','Field','2013-10-04 09:28:42.670'

select EventName, Venue, min(EventDate),

substring((select ',' + cast(ID as varchar)

from #groupings g1

where g1.EventName= g2.EventName

order by g1.EventName

for xml path ('')),2, 1000) [Students]

from #groupings g2

group by EventName, Venue

Erin Ramsay SSCrazy Eights Points: 9728 More actions · Answer 4

A valiant effort, DVA2007, but you'll note in my first post that I cannot group on Event Name because they are disparate (sorry I mislead you with my test data).

For example, you've got:

Football

Raiders vs Broncos

Pigskin huddle

Football

as event names instead of the generic 'football' that I've put in place there. Thus the reason for using the min() function.

More ideas.

Sean Lange SSC Guru Points: 286590 More actions · Answer 5

Erin Ramsay (10/4/2013)
More ideas.

Normalize your data and then you aren't fighting this. Honestly the biggest challenge you are facing is the lack of normalization.

That being said you have a real challenge here. You might be able to use the logic from today's article about grouping islands of contiguous dates. I am pretty swamped at the moment but if nobody else jumps in to help I can try to take a look at this early next week.

http://www.sqlservercentral.com/articles/T-SQL/71550/[/url]

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Erin Ramsay SSCrazy Eights Points: 9728 More actions · Answer 6

Nevermind... if I flatten the hour out of the datetime I can do it. Thanks to SQLPrincess for the co-solution!!

Erin Ramsay SSCrazy Eights Points: 9728 More actions · Answer 7

You know, Sean, if I could normalize the data I would but it's third party data and I don't have that option.. good thought though.

mickyT SSChampion Points: 10360 More actions · Answer 8

I know this is a few days late, but I wanted to have a try at this. Here is what I came up with. Does it do what you require?

with countDates as (

select venue,

eventdate,

dateadd(hour, -1, eventdate) beforedate,

dateadd(hour, 1, eventdate) afterdate,

count(*) over (partition by venue, eventdate) repeated

from #groupings

)

select

eventname,

venue,

ca.eventdate

from #groupings g

cross apply (

select top 1 eventdate

from countDates c

where g.venue = c.venue and

g.eventdate between beforedate and afterdate

order by repeated desc) ca

group by

eventname,

venue,

ca.eventdate;

Dwain Camps SSC Guru Points: 86978 More actions · Answer 9

It appears I'm also late to the party but perhaps this is another alternative:

SELECT EventName, Venue, EventDate

,IDs=STUFF((

SELECT ', ' + CAST(ID AS VARCHAR(5))

FROM #groupings b

WHERE a.EventName = b.EventName AND a.Venue = b.Venue

ORDER BY ID

FOR XML PATH('')

), 1, 2, '')

FROM

(

SELECT ID, EventName, Venue, rc, EventDate

,mr=MAX(rc) OVER (PARTITION BY EventName, Venue)

FROM

(

SELECT ID, EventName, Venue, EventDate

,rc=COUNT(*) OVER (PARTITION BY EventName, Venue, EventDate)

FROM #groupings

) a

WHERE mr=rc

GROUP BY EventName, Venue, EventDate;

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St