4 hour plus query

Question

4 hour plus query

jraha

SSC Eights!

Points: 817
More actions
June 24, 2003 at 10:33 am

#160257

Hi there,
I have the following tables:
UserActivity (userId, date, col1, col2, col3)
UserImpression (userId, date, cola, colb, colc)
UserDate (userId, latestDate)
UserActivity has 10 million rows.
UserImpression has 100 million rows.
UserDate has 70 million rows.
UserImpression and UserActivity store information based on the joint primary key of userId and date
UserDate stores the latest date that any information for a particular userId has been stored thus a primary key on userId alone.
These tables need to be joined together as follows:
select I.*
from UserActivity A
join UserDate D on D.userId = A.userId
and D.date = A.date
join UserImpression I on D.userId= I.userId and D.latestDate = I.date
If this query returns 70 million rows, what is a reasonable expected runtime?
I'm currently running this on a machine with 2 Gigs memory and 4 PIII Xeon 700MHz processors
Due to the enormous size and frequency of data turnover of UserImpression, I have created that table as a partitioned view with the check constraint on the date column. Each day has approximately 30 million rows and 30 days of data are stored.
Please help.
Edited by - jraha on 06/24/2003 11:32:27 AM

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply

Andy Warren SSC Guru Points: 119902 More actions · Answer 1

No idea. Kinda overwhelmed at the idea of actually returning 70 million rows - what do you do with them? You could try just returning a subset and extrapolating from there.

Andy

http://www.sqlservercentral.com/columnists/awarren/

Andy
SQLAndy - My Blog!
Connect with me on LinkedIn
Follow me on Twitter

jraha SSC Eights! Points: 817 More actions · Answer 2

Actually I didn't express that correctly... I would return the 70 million rows into a new table and eventually run some aggregating queries on that.

Johan Bijnens SSC Guru Points: 135376 More actions · Answer 3

the ultimate wet dream would be getting this one sub-second 😉 I'd recomment on matching the indexes (clustered ? ) and trying to get them in ram (more than 2gb). Of course you'll be disk dependant because of massive IO-ops (at least for userimpression).

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Johan Bijnens SSC Guru Points: 135376 More actions · Answer 4

the ultimate wet dream would be getting this one sub-second 😉 I'd recomment on matching the indexes (clustered ? ) and trying to get them in ram (more than 2gb). Of course you'll be disk dependant because of massive IO-ops (at least for userimpression).

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

johncyriac SSCommitted Points: 1542 More actions · Answer 5

Normally such operations are required for data warehousing sort of applications. But even there we can better represent information using some statistical ideas like summing, frequency table etc. If you can post the exact requirement I think many will be there to help you out

Instead of join if you are for “exists “ in where clause then it will not block the tables as a whole since the sub query runs row by row, even it may take one more hour (joke)

Regards

John





select I.*

from 

fom UserImpression I 

where

exists (

select * 

from

UserActivity A ,

UserDate D

where

D.userId= I.userId and D.latestDate = I.date

and UserDate D on D.userId = A.userId

)

Antares686 SSC Guru Points: 125444 More actions · Answer 6

First off based on

quote:
UserDate (userId, latestDate)

should this line

quote:
join UserDate D on D.userId = A.userId
and D.date = A.date

actually be

join UserDate D on D.userId = A.userId

and D.latestDate = A.date

and if so then why not just do



SELECT 

I.*

FROM 

UserActivity A

INNER JOIN 

UserImpression I 

ON 

A.userId= I.userId AND 

A.[date] = I.[date]

Even so based on this

quote:
UserImpression (userId, date, cola, colb, colc)

70 million rows may or may not fit in 2GB as I don't know what userid, date, cola, colb, colc are as far as width so I can't tell you if your running into any memory bottlenecks here.

But you come back and say

quote:
I would return the 70 million rows into a new table

To get maximum insert performance make sure all constraints, triggers and indexes are disabled or removed from this table then add or enable after insert is done.

Also I assume your indexes would be

UserDate (userId, latestDate) CLUSTERED

UserImpression (userId, date) CLUSTERED

UserActivity (userId, date) CLUSTERED

To maximize join performance, if you always use userid and latestdate/date to join these tables and work with them I would consider also making them

datefield, userid as the order of the columns in the index. It is best to have you most unique value first in a composite (multicolumn) index because only the first column is sampled to build the statistics value.

Another thing which I assume you have already done is the columns used in the join work fastest as long as they are the same datatype. Otherwise SQL will auto-type or you will need to perform an explicit CAST (which is better than SQL auto-typing) if they are not.

Other than that it is a matter of execution plan and tweaking to see what can be done.

NPeeters SSChampion Points: 12541 More actions · Answer 7

If you do run into performance problems due to a lack of memory, consider breaking up the whole insert into smaller chunks.

Should be fairly easy, based on the description you gave of the UserDate table.