SELECT COUNT(*)

  • I have about 2Terabytes of data in which I have to know how may rows are there. I did a SELECT COUNT (*) FROM DB. But even after 20 minutes, it was still executing. I need to know how many rows there are so that I can further do some other development that I was asked to. Anyone worked with this gigantic database before?

    [font="Verdana"]Imagination is more important than knowledge-Albert Einstein[/font]

  • Check out this query/view: http://www.sqlservercentral.com/scripts/tables/62545/

    I'd recommend showing the query plan before execution.

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • You can query sysobjects.

    SELECT o.name AS [Table Name], i.rowcnt AS [Row Count]

    FROM sysobjects o, sysindexes i

    WHERE i.id = o.id

    AND indid IN(0,1)

    AND xtype = 'u'

    AND o.name <> 'sysdiagrams'

    AND o.name = 'MyTable' --MyTableName

  • Got the above from this author. Have to give credit where it is due 🙂

    http://www.sqlservercentral.com/scripts/Administration/61766/

  • Okay, after that, I have to delete data that is more than 12 months from today. whatever is older than a year today have to be deleted. How do I do this real fast and efficiently? I have an identity row as the first column, just FYI.....

    [font="Verdana"]Imagination is more important than knowledge-Albert Einstein[/font]

  • What percentage of the total number of records is 12 months? This will allow me to better determine how to delete the data.

    You have plenty of options here. You can save the data to a table that you plan to keep and truncate the source table. Truncating a table is signifantly faster than deleting data and the deletes are not logged, so it does not inflate you t-log. Note that this option is more benificial if you plan to delete a significant portion of the table, like 60%.

    Another option is to delete records in batches. It is sometimes faster to delete a few thousand records at a time than 100,000.

    You could do something like;

    DECLARE @Batch INT

    SET @Batch = 1000

    WHILE

    EXISTS(SELECT 1

    FROM MyTable

    WHERE CONVERT(VARCHAR,MyColumn,101) <

    CONVERT(VARCHAR,DATEADD(mm,-12,GETDATE()),101))

    )

    BEGIN

    DELETE TOP (@Batch)

    FROM MyTable

    WHERE CONVERT(VARCHAR,MyColumn,101) <

    CONVERT(VARCHAR,DATEADD(mm,-12,GETDATE()),101))

    END

  • How much data does that 12-month period represent? if that's small compared to the 2TB - it will be faster to copy out the stuff you want to keep to a new table, then DROP the old table, and rename the table you just created. Of course - you end up having to recreate constraints, indexs, etc... but then again - it can still be a lot faster.

    If that's not feasible - the perhaps try "walking the clustered index" (which hopefully is that identity column you just mentioned.)

    looks something like this:

    declare @batchsize int;

    set @batchsize=50000; --how many records to kill in one operation

    declare @startid int;

    set @startID =0;

    declare @endid int;

    set @endid=max(id) from MyTableToClean;

    declare @datetokill datetime;

    set @datetokill=dateadd(dd,datediff(dd,0,getdate()),0); --kill in "full days"

    while @startID<@endid

    begin

    delete

    from MyTableToClean

    where

    id between @startid and @startid+@batchsize

    AND mydate<@datetokill;

    Set @startid=@startID+@batchsize+1;

    checkpoint;

    end

    If that really is your clustered index - then it should take your deletions to task pretty good. The nice part is - you should be able to stop and start this loop if it's making your server miserable.

    Either way - don't forget to rebuild your indexes (and update your stats) when you're done.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Here is a function we wrote based on info posted here and elsewhere:

    /*

    This function will return the number of rows in a table, the same result

    as executing a "Select Count(*) from tableName" except that this is much

    more efficient in that it doesn't have to scan the entire table.

    Note: the value returned could be off by a few records, especially if there

    has been some recent insert and/or delete activity against the table. If

    absolute accuracy is important, be sure to execute the following

    command first:

    DBCC Updateusage( '', @TableName ) [WITH NO_INFOMSGS]

    For large tables, this will still be MUCH faster than "Select Count(*)"

    */

    Create function [dbo].[GetRowCount](

    @TableName sysname

    )

    returns int

    as begin

    declare @Result int;

    Select @Result = rows

    from sysindexes

    where id = object_id( @TableName, 'U' )

    and indid < 2;

    return IsNull( @Result , 0 );

    end

    Tomm Carr
    --
    Version Normal Form -- http://groups.google.com/group/vrdbms

  • Thank you everybody above. Thanks a bunch guys for the quick help!

    [font="Verdana"]Imagination is more important than knowledge-Albert Einstein[/font]

  • How did it work out?

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply