March 17, 2008 at 8:46 pm
I have about 2Terabytes of data in which I have to know how may rows are there. I did a SELECT COUNT (*) FROM DB. But even after 20 minutes, it was still executing. I need to know how many rows there are so that I can further do some other development that I was asked to. Anyone worked with this gigantic database before?
[font="Verdana"]Imagination is more important than knowledge-Albert Einstein[/font]
March 17, 2008 at 9:03 pm
Check out this query/view: http://www.sqlservercentral.com/scripts/tables/62545/
I'd recommend showing the query plan before execution.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
March 17, 2008 at 9:30 pm
You can query sysobjects.
SELECT o.name AS [Table Name], i.rowcnt AS [Row Count]
FROM sysobjects o, sysindexes i
WHERE i.id = o.id
AND indid IN(0,1)
AND xtype = 'u'
AND o.name <> 'sysdiagrams'
AND o.name = 'MyTable' --MyTableName
March 17, 2008 at 9:31 pm
Got the above from this author. Have to give credit where it is due 🙂
http://www.sqlservercentral.com/scripts/Administration/61766/
March 17, 2008 at 9:53 pm
Okay, after that, I have to delete data that is more than 12 months from today. whatever is older than a year today have to be deleted. How do I do this real fast and efficiently? I have an identity row as the first column, just FYI.....
[font="Verdana"]Imagination is more important than knowledge-Albert Einstein[/font]
March 17, 2008 at 10:16 pm
What percentage of the total number of records is 12 months? This will allow me to better determine how to delete the data.
You have plenty of options here. You can save the data to a table that you plan to keep and truncate the source table. Truncating a table is signifantly faster than deleting data and the deletes are not logged, so it does not inflate you t-log. Note that this option is more benificial if you plan to delete a significant portion of the table, like 60%.
Another option is to delete records in batches. It is sometimes faster to delete a few thousand records at a time than 100,000.
You could do something like;
DECLARE @Batch INT
SET @Batch = 1000
WHILE
EXISTS(SELECT 1
FROM MyTable
WHERE CONVERT(VARCHAR,MyColumn,101) <
CONVERT(VARCHAR,DATEADD(mm,-12,GETDATE()),101))
)
BEGIN
DELETE TOP (@Batch)
FROM MyTable
WHERE CONVERT(VARCHAR,MyColumn,101) <
CONVERT(VARCHAR,DATEADD(mm,-12,GETDATE()),101))
END
March 17, 2008 at 10:20 pm
How much data does that 12-month period represent? if that's small compared to the 2TB - it will be faster to copy out the stuff you want to keep to a new table, then DROP the old table, and rename the table you just created. Of course - you end up having to recreate constraints, indexs, etc... but then again - it can still be a lot faster.
If that's not feasible - the perhaps try "walking the clustered index" (which hopefully is that identity column you just mentioned.)
looks something like this:
declare @batchsize int;
set @batchsize=50000; --how many records to kill in one operation
declare @startid int;
set @startID =0;
declare @endid int;
set @endid=max(id) from MyTableToClean;
declare @datetokill datetime;
set @datetokill=dateadd(dd,datediff(dd,0,getdate()),0); --kill in "full days"
while @startID<@endid
begin
delete
from MyTableToClean
where
id between @startid and @startid+@batchsize
AND mydate<@datetokill;
Set @startid=@startID+@batchsize+1;
checkpoint;
end
If that really is your clustered index - then it should take your deletions to task pretty good. The nice part is - you should be able to stop and start this loop if it's making your server miserable.
Either way - don't forget to rebuild your indexes (and update your stats) when you're done.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
March 19, 2008 at 3:22 pm
Here is a function we wrote based on info posted here and elsewhere:
/*
This function will return the number of rows in a table, the same result
as executing a "Select Count(*) from tableName" except that this is much
more efficient in that it doesn't have to scan the entire table.
Note: the value returned could be off by a few records, especially if there
has been some recent insert and/or delete activity against the table. If
absolute accuracy is important, be sure to execute the following
command first:
DBCC Updateusage( '', @TableName ) [WITH NO_INFOMSGS]
For large tables, this will still be MUCH faster than "Select Count(*)"
*/
Create function [dbo].[GetRowCount](
@TableName sysname
)
returns int
as begin
declare @Result int;
Select @Result = rows
from sysindexes
where id = object_id( @TableName, 'U' )
and indid < 2;
return IsNull( @Result , 0 );
end
Tomm Carr
--
Version Normal Form -- http://groups.google.com/group/vrdbms
March 19, 2008 at 4:23 pm
Thank you everybody above. Thanks a bunch guys for the quick help!
[font="Verdana"]Imagination is more important than knowledge-Albert Einstein[/font]
March 20, 2008 at 2:56 pm
How did it work out?
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply