Query is slow because of TOP and Order By Count(*)

  • Hi,

    I have a log table that saves web visits. And I have this query which is pretty slow :

    select top 500 count(1) as nbCount, currentPage as item,

    dateadd(dd, DATEDIFF(dd, 0, dateadd(hh, 0, date)), 0) as sDate from tbl_logs

    where fk_tbl_websites_Id = 10203 and date between '2008-08-28 00:00:00' and '2008-09-29 23:59:00' group by currentPage, dateadd(dd, DATEDIFF(dd, 0, dateadd(hh, 0, date)), 0) order by nbCount desc

    Having analyzed the execution plan, I know why this is slow. I think this is because the set of data is too huge. So, for the last month and for the website #10203, the set has around 2 000 000 rows. The Oerder By clause and the TOP clause show that they account for 44% each of the execution time. That's almost 90% of the total execution time.

    So, my question is how to speed up that kind of query? Any idea?

    Thanks

    Stephane

  • Create a reporting table.

    Run a query daily to load the data into that reporting table.

    Run your reports against that table with proper indexing and you should get very efficient queries.

  • stephane3d (9/27/2008)


    Hi,

    I have a log table that saves web visits. And I have this query which is pretty slow :

    select top 500 count(1) as nbCount, currentPage as item,

    dateadd(dd, DATEDIFF(dd, 0, dateadd(hh, 0, date)), 0) as sDate from tbl_logs

    where fk_tbl_websites_Id = 10203 and date between '2008-08-28 00:00:00' and '2008-09-29 23:59:00' group by currentPage, dateadd(dd, DATEDIFF(dd, 0, dateadd(hh, 0, date)), 0) order by nbCount desc

    Having analyzed the execution plan, I know why this is slow. I think this is because the set of data is too huge. So, for the last month and for the website #10203, the set has around 2 000 000 rows. The Oerder By clause and the TOP clause show that they account for 44% each of the execution time. That's almost 90% of the total execution time.

    So, my question is how to speed up that kind of query? Any idea?

    Thanks

    Stephane

    Stephane... you said the query was slow but gave no actual performance times. That would be a good start for us to help.

    Also, you've given no real clues as to the condition of the table... for example, what is the primary key? Please take a look at the article found in the link in my signature below. Take the time to post the full table creation statement and about 10 lines of code using the methods prescribed in the article. Also, include any indexes or other keys that may be present on the table.

    There's no reason for the code you've presented to be "slow"... but we can't really help if we don't have more detail about the environment. Thanks...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi,

    Thanks for your answers.

    I forgot to tell that the primary key is fk_tbl_websites_id and date. Without the order by clause, it executes in a flash. With it, the clustered index seek account for 13% of the execution time. There is also a non-clustered index on currentPage, but it's not used according to the query plan.

    Here are the table creation plus a sample of data

    CREATE TABLE [dbo].[tbl_logs](

    [fk_tbl_websites_id] [int] NOT NULL,

    [date] [datetime] NOT NULL CONSTRAINT [DF_tbl_logs_tmp_date] DEFAULT (getdate()),

    [id] [bigint] IDENTITY(1,1) NOT NULL,

    [refererDomain] [varchar](200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [refererUrl] [varchar](500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [refererQueryString] [varchar](500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [currentDomain] [varchar](200) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [currentPage] [varchar](500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [currentQueryString] [varchar](500) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [timeSpent] [int] NULL CONSTRAINT [DF_tbl_logs_tmp_timeSpent] DEFAULT ((0)),

    [fk_tbl_visits_id] [bigint] NULL,

    CONSTRAINT [PK_tbl_logs_tmp] PRIMARY KEY CLUSTERED

    (

    [fk_tbl_websites_id] ASC,

    [date] ASC,

    [id] ASC

    )WITH (PAD_INDEX = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY]

    ) ON [PRIMARY]

    INSERT INTO [tbl_logs] ([fk_tbl_websites_id],[date],[id],[refererDomain],[refererUrl],[refererQueryString],[currentDomain],[currentPage],[currentQueryString],[timeSpent],[fk_tbl_visits_id])VALUES(10203,'Aug 28 2008 12:00:04:190AM',139959293,'http://www.jpr.com','/videos/show/4950-scarecrow-hidden-camera-pranks','','http://www.jprs.com','/videos/show/4836-free-kids-hidden-camera-pranks','',109,40705532)

    INSERT INTO [tbl_logs] ([fk_tbl_websites_id],[date],[id],[refererDomain],[refererUrl],[refererQueryString],[currentDomain],[currentPage],[currentQueryString],[timeSpent],[fk_tbl_visits_id])VALUES(10203,'Aug 28 2008 12:00:09:813AM',139959296,'http://www.jpr.com','/search','q=france$t=videos','http://www.jpr.com','/search','q=rachid$t=videos',241,40706115)

    INSERT INTO [tbl_logs] ([fk_tbl_websites_id],[date],[id],[refererDomain],[refererUrl],[refererQueryString],[currentDomain],[currentPage],[currentQueryString],[timeSpent],[fk_tbl_visits_id])VALUES(10203,'Aug 28 2008 12:00:16:190AM',139682279,'http://www.jpr.com','/videos/top','cat=gags','http://www.jprs.com','/channels','',22,40705317)

    INSERT INTO [tbl_logs] ([fk_tbl_websites_id],[date],[id],[refererDomain],[refererUrl],[refererQueryString],[currentDomain],[currentPage],[currentQueryString],[timeSpent],[fk_tbl_visits_id])VALUES(10203,'Aug 28 2008 12:00:28:703AM',139682287,'http://www.jpr.com','/videos/show/4755-sandwich-matress-hidden-camera-pranks','','http://www.jprs.com','/videos/show/4986-blindman-has-to-go-hidden-camera-pranks','',96,40706111)

    INSERT INTO [tbl_logs] ([fk_tbl_websites_id],[date],[id],[refererDomain],[refererUrl],[refererQueryString],[currentDomain],[currentPage],[currentQueryString],[timeSpent],[fk_tbl_visits_id])VALUES(10203,'Aug 28 2008 12:00:31:847AM',139682289,'','','','http://www.jprs.com','/home','',189,40706306)

    INSERT INTO [tbl_logs] ([fk_tbl_websites_id],[date],[id],[refererDomain],[refererUrl],[refererQueryString],[currentDomain],[currentPage],[currentQueryString],[timeSpent],[fk_tbl_visits_id])VALUES(10203,'Aug 28 2008 12:00:33:143AM',139682290,'http://www.jprs.com','/videos/show/4979-wrong-side-hidden-camera-pranks','','http://www.jprs.com','/videos/show/3966-cabin-cleaning-hidden-camera-pranks','',21,40705691)

    INSERT INTO [tbl_logs] ([fk_tbl_websites_id],[date],[id],[refererDomain],[refererUrl],[refererQueryString],[currentDomain],[currentPage],[currentQueryString],[timeSpent],[fk_tbl_visits_id])VALUES(10203,'Aug 28 2008 12:00:38:830AM',139682293,'http://www.jprs.com','/channels','','http://www.jprs.com','/community','',5,40705317)

    INSERT INTO [tbl_logs] ([fk_tbl_websites_id],[date],[id],[refererDomain],[refererUrl],[refererQueryString],[currentDomain],[currentPage],[currentQueryString],[timeSpent],[fk_tbl_visits_id])VALUES(10203,'Aug 28 2008 12:00:43:220AM',139682296,'http://www.jprs.com','/videos/show/472-lovers-mistake-hidden-camera-pranks','','http://www.jprs.com','/videos/show/491-impossible-to-attach-box-hidden-camera-pranks','',128,40704574)

    INSERT INTO [tbl_logs] ([fk_tbl_websites_id],[date],[id],[refererDomain],[refererUrl],[refererQueryString],[currentDomain],[currentPage],[currentQueryString],[timeSpent],[fk_tbl_visits_id])VALUES(10203,'Aug 28 2008 12:00:43:593AM',139682297,'http://www.jprs.com','/community','','http://www.jprs.com','/comedians','',7,40705317)

    INSERT INTO [tbl_logs] ([fk_tbl_websites_id],[date],[id],[refererDomain],[refererUrl],[refererQueryString],[currentDomain],[currentPage],[currentQueryString],[timeSpent],[fk_tbl_visits_id])VALUES(10203,'Aug 28 2008 12:00:45:017AM',139682298,'http://www.jprs.com','/videos/show/4757-that-s-not-my-dog-hidden-camera-pranks','','http://www.jprs.com','/videos/show/4986-blindman-has-to-go-hidden-camera-pranks','',90,40706109)

    Thanks a lot for any help!

    Stephane

  • Can you post the execution plan please (saved as a .sqlplan file, zipped and attached to your post)

    Also, you say it's slow. What do you mean by slow? (5 sec, 5 min, 5 hours?)

    How many rows in the table total? How many would the query return if the top wasn't there?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Hi,

    Thanks for your response.

    There are around 100 M rows in the table. Without the TOP, it would return around 2 M rows. Without the order by, the query runs in less than a second. With it, it runs in about 20 seconds. I know this is not hours, but it's slow enough to annoy the user!

    I'm sure there's a way to speed that up. You guys on SQL forums are unbelievable at finding sql tricks 😉

    The query plan is attached as asked.

    Thanks

    Stephane

  • The first thing I will suggest is that you do an update stats with full scan on that table.

    The execution plan shows that the optimiser is estimating 1 row and getting a lot more. It indicates that the stats are out of date and the plan is very likely sub-optimal. (with two sorts, I think it's an extremely bad plan.). Note that because the row estimates are wrong, the costs of the operators are completely meaningless.

    If you do that and the plan changes, please post the new execution plan.

    Since the cluster contains a ascending date column and the table is large, you may want to schedule a full stats update daily, as the auto-update of stats is probably not keeping up.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The other thing I will suggest is that you consider moving the clustered index to the date column or the ID column (mainly because it will fragment less)

    and put a nonclustered index on (website_id, date, currentpage). The PK is redundant where it is, the ID is unique alone and doesn't need the other two columns

    Add the NC even if you leave the cluster/PK as it is.

    The nonclustered index will provide a more optimal execution path for this query, as the leaf pages of the NC index will be smaller than those of the cluster and hence this query will have to read fewer pages.

    You may also want to consider table partitioning (if you're using enterprise edition) as that's a rather large amount of rows in a single table.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • You're getting great advice from some very smart folks on optimizing this. For my two cents, I'll suggest you consider not using the BETWEEN operator on your datetime values. Note that any data timestamped '2008-09-29 23:59:03' would be outside your criteria. It appears you may want to see all data for dates in your range, and the BETWEEN drops that last minute. It's much more accurate to specify the start and end of the range separately:and date >= '2008-08-28 00:00:00' and date < '2008-09-30 00:00:00'

    Note that to do this you first determine the next date after the last one in your range.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply