November 5, 2009 at 9:35 am
Performance problem with SQL queries having a specific column mentioned in WHERE or GROUP BY clause.
**Table structure:**
CREATE TABLE [dbo].[RPT_LOG_SEARCH_WEB_T](
[SEARCH_DATE] [smalldatetime] NULL,
127.0.0.1 [nvarchar](20) NULL,
[BAR_ID] [uniqueidentifier] NULL,
[QUERY] [nvarchar](2000) NULL,
[QUERY_STRING] [nvarchar](3000) NULL,
[SRC] [nvarchar](6) NULL,
[HTTP_STATUS] [int] NULL,
[CS_USER_AGENT] [nvarchar](4000) NULL,
[COUNTRY] [nvarchar](6) NULL,
[PID] [nvarchar](6) NULL,
[TEST_ID] [nvarchar](6) NULL,
[TEST_GROUP_ID] [nvarchar](500) NULL,
[S_COMPUTERNAME] [nvarchar](50) NULL,
[BROWSER_TYPE] [nvarchar](15) NULL,
[BROWSER_TYPE_ID] [smallint] NULL,
[BROWSER_VERSION] [varchar](10) NULL
) ON [RPT_LOG_SEARCH_WEB_PS]([SEARCH_DATE])
GO
SET ANSI_PADDING OFF
GO
/****** Object: Index [RPT_LOG_SEARCH_WEB_CI] Script Date: 11/05/2009 18:19:33 ******/
CREATE CLUSTERED INDEX [RPT_LOG_SEARCH_WEB_CI] ON [dbo].[RPT_LOG_SEARCH_WEB_T]
(
[SEARCH_DATE] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [RPT_LOG_SEARCH_WEB_PS]([SEARCH_DATE])
**Row Count:**
4,137,403,549
**The following query runs fine:**
select query, count(*) from dbo.RPT_LOG_SEARCH_WEB_T
where SEARCH_DATE > '20091105'
group by query
**The following query RUNS FOREVER:**
select ip, count(*) from dbo.RPT_LOG_SEARCH_WEB_T
where SEARCH_DATE > '20091105'
group by ip
Any help on the subject will be appretiated...
November 5, 2009 at 10:08 am
Is that 4 billion records?!
Without having a clue about the data I would think that [Query] column would have a smaller number of distinct values (how different can be [Query] values from one another?). Hence there is more "counting" work to do with 127.0.0.1 values. You may need an index on 127.0.0.1 field to help you with the second query.
November 5, 2009 at 10:13 am
I'll agree with Al-279884 and add that a compare of the execution plans between the 2 queries may shed some light on the subject.
November 5, 2009 at 10:18 am
I tried to get an execution plan for the problematic query.
However, once the "IP" field is mentioned in the query I can not get an execution plan for the query as well.
November 5, 2009 at 10:21 am
Yes - 4 billion records. One of our smaller tables :-D.
Actually, the number of distinct values for "IP" is much lower than the one for "QUERY"
November 5, 2009 at 10:31 am
daniell-1020276 (11/5/2009)
I tried to get an execution plan for the problematic query.However, once the "IP" field is mentioned in the query I can not get an execution plan for the query as well.
SSMS outputs the execution plan after the query completes, but if you use Profiler to capture the query, it will display the execution plan first, then process the query.
November 5, 2009 at 10:33 am
daniell-1020276 (11/5/2009)
I tried to get an execution plan for the problematic query.However, once the "IP" field is mentioned in the query I can not get an execution plan for the query as well.
Try to get Estimated Execution Plan rather than Actual Execution Plan (if it isn't what you tried already).
Also 4 billion records with some wide nvarchar fields must be some serious volume of data. Since these seem to be logging/reporting tables, have you considered partitioning the data?
November 5, 2009 at 10:49 am
Al-279884 (11/5/2009)
daniell-1020276 (11/5/2009)
I tried to get an execution plan for the problematic query.However, once the "IP" field is mentioned in the query I can not get an execution plan for the query as well.
Try to get Estimated Execution Plan rather than Actual Execution Plan (if it isn't what you tried already).
Also 4 billion records with some wide nvarchar fields must be some serious volume of data. Since these seem to be logging/reporting tables, have you considered partitioning the data?
I'll try to get an Estimated Execution Plan.
The table is partitioned by SEARCH_DATE.
November 6, 2009 at 2:04 am
Also You can try dropping the index and check whether the table scan is faster.If its faster with 4 billion records then it will surely be faster with 4 billion + records considering that You might insert more records into the table in the future.
November 6, 2009 at 10:45 am
The only index on the table is the clustered index. I would not drop the clustered index on a table w/ 4 billion records just to see what happens. Check the query plan before you do anything with your indexing.
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply