February 5, 2008 at 5:44 am
I have a table (SS2000) containing 3m rows which takes hours to process by a stored proc. Yet if I limit the rows to 1m - the sp only takes 20 minutes to process. I'm therefore thinking of doing all the processing in a loop of 1m rows at a time. The unique key is an INT called ID however, there can be numerical gaps between consecutive IDs. Does anyone have a neat way of populating a temporary table like this:
Identity IDFrom IDTo
1 3 1,000,987
2 1,000,988 2,122,338
3 2,122,340 3,277,647
which my loop could then process down enabling me to use a statement like
AND ID BETWEEN IDFROM and IDTO to get my 1m row chunks?
Thanks!
February 5, 2008 at 7:14 am
Are you looking to make sure you run this process on EXACTLY 1M rows, or just looking to limit the recordset? I mean - creating a new ID isn't hard, but if you processing relies on things like indexes, etc. on the main table, you're going to have to rebuild all of that on the temp table.
If you're just looking to throttle back the process, then consider just walking your permanent ID:
declare @chunksize int
declare @ptr int
declare @maxptr int
select @chunksize=1000000,@ptr=0;
select @maxptr=max(id) from mytable --find the highest ID in the table I am trying to walk
WHILE (@ptr<@maxptr)
BEGIN
--do your processing
SELECT * from myTable
where mytable.id between @ptr and @ptr+@chunksize;
--increment and move on
select @ptr=@ptr+@chunksize+1
END
If it IS important to have exactly 1M, then populating the temp table is easy too:
select Identity(int,1,1) as newTempID, *
INTO #MyTempTable
from myTable
If you already have an identity, you may need to just simply recast it to int to "hide" the identity property (since SELECT INTO will likely whine about 2 different Identity columns in a single table).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
February 5, 2008 at 7:23 am
You can create a temp table, with an identity column and have it auto generate all the way down.
CREATE TABLE #tmp(
Row_ID BIGINT IDENTITY(1,1),
ID INT,
Col2 INT,
Col3 INT
)
INSERT INTO #tmp
SELECT ID,Col2,Col3
FROM MyTable
SELECT *
FROM #tmp
WHERE Row_ID BETWEEN 1 AND 1000000
What type of processing are you doing in the stored procedure?
February 5, 2008 at 12:50 pm
create table #Temp (
ID int identity primary key,
FromID int,
ToID int)
insert into #temp (fromid, toid)
select *
from
(select 1, 1000000 union all
select 1000001, 2000000 union all
select 2000001, 3000000 union all) -- however many you need
If you have a Numbers table, replace the select in the last bit with:
select 1000000 * (number -1) + 1, 1000000 * number
from dbo.Numbers
where number > 0
and number <=
(select max(id)/1000000 +
case
when max(id)%1000000 = 0 then 0
else 1
end
from dbo.MainTable)
That may not give you exactly 1-million rows per run, but unless the EXACT number matters (I can't see how it could), it should work just fine.
I'd imagine that the reason going over 1-million rows slows way down is because of drive swapping, instead of running in RAM. Maybe not, but that's my first guess.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
February 5, 2008 at 3:35 pm
If the process has the correct criteria to keep rows from being updated more than once, then you could SET ROWCOUNT 1000000 and loop until @@ROWCOUNT returns zero...
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply