Performance problem with SQL queries having a specific column mentioned in WHERE or GROUP BY clause.

Question

Performance problem with SQL queries having a specific column mentioned in WHERE or GROUP BY clause.

daniell-1020276

SSC Rookie

Points: 36
More actions
November 5, 2009 at 9:35 am

#96022

Performance problem with SQL queries having a specific column mentioned in WHERE or GROUP BY clause.
**Table structure:**
CREATE TABLE [dbo].[RPT_LOG_SEARCH_WEB_T](
[SEARCH_DATE] [smalldatetime] NULL,
127.0.0.1 [nvarchar](20) NULL,
[BAR_ID] [uniqueidentifier] NULL,
[QUERY] [nvarchar](2000) NULL,
[QUERY_STRING] [nvarchar](3000) NULL,
[SRC] [nvarchar](6) NULL,
[HTTP_STATUS] [int] NULL,
[CS_USER_AGENT] [nvarchar](4000) NULL,
[COUNTRY] [nvarchar](6) NULL,
[PID] [nvarchar](6) NULL,
[TEST_ID] [nvarchar](6) NULL,
[TEST_GROUP_ID] [nvarchar](500) NULL,
[S_COMPUTERNAME] [nvarchar](50) NULL,
[BROWSER_TYPE] [nvarchar](15) NULL,
[BROWSER_TYPE_ID] [smallint] NULL,
[BROWSER_VERSION] [varchar](10) NULL
) ON [RPT_LOG_SEARCH_WEB_PS]([SEARCH_DATE])
GO
SET ANSI_PADDING OFF
GO
/****** Object: Index [RPT_LOG_SEARCH_WEB_CI] Script Date: 11/05/2009 18:19:33 ******/
CREATE CLUSTERED INDEX [RPT_LOG_SEARCH_WEB_CI] ON [dbo].[RPT_LOG_SEARCH_WEB_T]
(
[SEARCH_DATE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [RPT_LOG_SEARCH_WEB_PS]([SEARCH_DATE])
**Row Count:**
4,137,403,549
**The following query runs fine:**
select query, count(*) from dbo.RPT_LOG_SEARCH_WEB_T
where SEARCH_DATE > '20091105'
group by query
**The following query RUNS FOREVER:**
select ip, count(*) from dbo.RPT_LOG_SEARCH_WEB_T
where SEARCH_DATE > '20091105'
group by ip
Any help on the subject will be appretiated...

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

Al-279884 Old Hand Points: 383 More actions · Answer 1

Is that 4 billion records?!

Without having a clue about the data I would think that [Query] column would have a smaller number of distinct values (how different can be [Query] values from one another?). Hence there is more "counting" work to do with 127.0.0.1 values. You may need an index on 127.0.0.1 field to help you with the second query.

John Rowan SSC Guru Points: 56440 More actions · Answer 2

I'll agree with Al-279884 and add that a compare of the execution plans between the 2 queries may shed some light on the subject.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

daniell-1020276 SSC Rookie Points: 36 More actions · Answer 3

I tried to get an execution plan for the problematic query.

However, once the "IP" field is mentioned in the query I can not get an execution plan for the query as well.

daniell-1020276 SSC Rookie Points: 36 More actions · Answer 4

Yes - 4 billion records. One of our smaller tables :-D.

Actually, the number of distinct values for "IP" is much lower than the one for "QUERY"

John Rowan SSC Guru Points: 56440 More actions · Answer 5

daniell-1020276 (11/5/2009)
I tried to get an execution plan for the problematic query.
However, once the "IP" field is mentioned in the query I can not get an execution plan for the query as well.

SSMS outputs the execution plan after the query completes, but if you use Profiler to capture the query, it will display the execution plan first, then process the query.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Al-279884 Old Hand Points: 383 More actions · Answer 6

daniell-1020276 (11/5/2009)
I tried to get an execution plan for the problematic query.
However, once the "IP" field is mentioned in the query I can not get an execution plan for the query as well.

Try to get Estimated Execution Plan rather than Actual Execution Plan (if it isn't what you tried already).

Also 4 billion records with some wide nvarchar fields must be some serious volume of data. Since these seem to be logging/reporting tables, have you considered partitioning the data?

daniell-1020276 SSC Rookie Points: 36 More actions · Answer 7

Al-279884 (11/5/2009)
daniell-1020276 (11/5/2009)
I tried to get an execution plan for the problematic query.
However, once the "IP" field is mentioned in the query I can not get an execution plan for the query as well.
Try to get Estimated Execution Plan rather than Actual Execution Plan (if it isn't what you tried already).
Also 4 billion records with some wide nvarchar fields must be some serious volume of data. Since these seem to be logging/reporting tables, have you considered partitioning the data?

I'll try to get an Estimated Execution Plan.

The table is partitioned by SEARCH_DATE.

aravind.s SSC-Addicted Points: 449 More actions · Answer 8

Also You can try dropping the index and check whether the table scan is faster.If its faster with 4 billion records then it will surely be faster with 4 billion + records considering that You might insert more records into the table in the future.

John Rowan SSC Guru Points: 56440 More actions · Answer 9

The only index on the table is the clustered index. I would not drop the clustered index on a table w/ 4 billion records just to see what happens. Check the query plan before you do anything with your indexing.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden