Time Estimates for complex queries

Question

Time Estimates for complex queries

TheGreenShepherd

SSCertifiable

Points: 6979
More actions
July 29, 2011 at 9:48 am

#245268

I'm just curious as to how other database engineers/developers handle the usual onslaught of "how long is this going to take," particularly for queries against a large dataset (1 billion potential records) that you've never run before.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

sturner SSC-Insane Points: 22411 More actions · Answer 1

The best way is to perform some actual tests. Create as similar an environment that you can, create the database objects and populate them with test data.

You may also have to simulate a typical load on the server also with some other queries and updates or whatever.

The probability of survival is inversely proportional to the angle of arrival.

TheGreenShepherd SSCertifiable Points: 6979 More actions · Answer 2

Thanks. I'm not so much concerned with the server load as far as other connections go. This is essentially an ETL box that, when the processes are finished, the data is published to SOLR. In essence, I've already performed test queries...I took metrics from my initial and subsequently smaller data loads and tried to extrapolate that into estimates for the larger data sets. Unfortunately, it doesn't seem to scale in a linear fashion (i.e., loading 400M records as opposed to 1.2B).

sturner SSC-Insane Points: 22411 More actions · Answer 3

In my experience scaling up to many magnitudes is never linear because there are too many different resources involved that don't scale the same way.

Good luck.

The probability of survival is inversely proportional to the angle of arrival.

TheGreenShepherd SSCertifiable Points: 6979 More actions · Answer 4

heh...right. So "good luck" means there's no way to give accurate estimates? 🙂

TheGreenShepherd SSCertifiable Points: 6979 More actions · Answer 5

TheGreenShepherd

SSCertifiable

Points: 6979

July 29, 2011 at 10:22 am

#1360921

at least, in my scenario, that is.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

Well you could always say 3X the data = 5 to 8 fold. That way when it finishes earlier you look like a genius. If it goes longer it's much less fun.

TheGreenShepherd SSCertifiable Points: 6979 More actions · Answer 7

After my linear guesstimates didn't work out, that's what I tried doing instead, really hoping to look like a genius. You're right, no so fun when they weren't thrilled about the ETC in the first place and I'm four days past with the query still running. 🙂

Manager: "How much longer is it going to take?"

Me: "I thought it was going to be done 3 days ago. So, I'd guess that it has to be on the verge of finishing."

Manager: "So it's going to be finished today?"

Me: "I would certainly hope so."

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 8

coronaride (7/29/2011)
After my linear guesstimates didn't work out, that's what I tried doing instead, really hoping to look like a genius. You're right, no so fun when they weren't thrilled about the ETC in the first place and I'm four days past with the query still running. 🙂
Manager: "How much longer is it going to take?"
Me: "I thought it was going to be done 3 days ago. So, I'd guess that it has to be on the verge of finishing."
Manager: "So it's going to be finished today?"
Me: "I would certainly hope so."

Sounds about right. That's why you under-promise and over-deliver in a uthopic world :hehe:.

Did you check to see if there is any blocking or issues with the hardware?

This query gives the estimates for backups and restore. Maybe (0.01% chance), it can give you the estimate for your query.

SELECT r.[session_id]

, c.[client_net_address]

, s.[host_name]

, c.[connect_time]

, [request_start_time] = s.[last_request_start_time]

, [current_time] = CURRENT_TIMESTAMP

, r.[percent_complete]

, [estimated_finish_time] = DATEADD(MILLISECOND, r.[estimated_completion_time], CURRENT_TIMESTAMP)

, [estimated_seconds_left] = DATEDIFF(s, CURRENT_TIMESTAMP, DATEADD(MILLISECOND, r.[estimated_completion_time], CURRENT_TIMESTAMP))

, current_command = SUBSTRING ( t.[text], r.[statement_start_offset]/2, COALESCE(NULLIF(r.[statement_end_offset], -1)/2, 2147483647) )

, module = COALESCE(QUOTENAME(OBJECT_SCHEMA_NAME(t.[objectid], t.[dbid])) + '.' + QUOTENAME(OBJECT_NAME(t.[objectid], t.[dbid])), '<ad hoc>')

FROM sys.dm_exec_requests AS r

INNER JOIN sys.dm_exec_connections AS c

ON r.[session_id] = c.[session_id]

INNER JOIN sys.dm_exec_sessions AS s

ON r.[session_id] = s.[session_id]

CROSS APPLY sys.dm_exec_sql_text(r.[sql_handle]) AS t

WHERE r.[percent_complete] <> 0;

TheGreenShepherd SSCertifiable Points: 6979 More actions · Answer 9

Thanks. I've used a very similar query and no luck. The only blocking that's occurring is coming from CXPACKET wait types. I've turned down MAXDOP and it only runs slower.

The best that I've found that I can do is (since I'm the only one accessing the SS instance) monitor read activities from the source db files, monitor reads/writes in tempdb, and then monitor writes to the destination db files, as well as check the row counts in the various tables compared to what I expect to be in there.

Unfortunately, this only allows me to see where, relatively, the process is, not how much longer it has left.

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 10

That's your problem. CXpacket is NOT a real wait. Remove the maxdop hint (not sure you can while it's running) and let that baby fly!

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 11

Here's a nice thread... read from there on down.

http://www.sqlservercentral.com/Forums/FindPost1148660.aspx

TheGreenShepherd SSCertifiable Points: 6979 More actions · Answer 12

yeah...i changed it on one particular query just to see if it made any difference. quickly changed it back. my cxpacket wait times weren't anything crazy anyhow. thanks..

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 13

coronaride (7/29/2011)
yeah...i changed it on one particular query just to see if it made any difference. quickly changed it back. my cxpacket wait times weren't anything crazy anyhow. thanks..

How can you change it back while it's running?? No wonder it's taking days to complete! 🙂

TheGreenShepherd SSCertifiable Points: 6979 More actions · Answer 14

the thread that will never end. i changed it back quickly *AFTER* it finished. 🙂