August 20, 2010 at 11:32 am
ricardo_chicas (8/20/2010)
Ok lets do this, forget about the original problemwhat I really want to know is for a suggestion on what is the best way to retrieve large amount of data, without blocking tables, without using no log, using a select, when I use oracle I was able to do something like that using a bulk select ( I set the select to retrieve chunks of 10000 rows), I am not sure if that kind of stuff works in sql server, but i want something like that
Thank you
as everyone else is telling you; 5 million rows takes time to move.
the right thing to do is not move the data at all, and look at what the process is doing with the data.
whatever that process is, it can be fixed and cleaned up so it doesn't take so long and doesn't block the server.
grabbing 5 million rows in 10K bites is not going to do anything faster for you; you need to tackle it a different way; for example, how often is that 5million rows downloaded ? more than once? why is it not cached?
As someone else already said, the secondary process that does stuff to the 5 million rows should be looked at so that a single summarizing statement that is executed on SQL server is used to replace that process.
Lowell
August 20, 2010 at 12:03 pm
ricardo_chicas (8/20/2010)
Ok lets do this, forget about the original problemwhat I really want to know is for a suggestion on what is the best way to retrieve large amount of data, without blocking tables, without using no log, using a select, when I use oracle I was able to do something like that using a bulk select ( I set the select to retrieve chunks of 10000 rows), I am not sure if that kind of stuff works in sql server, but i want something like that
Thank you
We're just guessing here. Post your current query and post the information requested by the article at the second link in my signature line below. Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 20, 2010 at 12:50 pm
ricardo_chicas (8/20/2010)
Ok lets do this, forget about the original problemwhat I really want to know is for a suggestion on what is the best way to retrieve large amount of data, without blocking tables, without using no log, using a select, when I use oracle I was able to do something like that using a bulk select ( I set the select to retrieve chunks of 10000 rows), I am not sure if that kind of stuff works in sql server, but i want something like that
Thank you
Well if you ask the question like that the only possible answer is that you dont need to do anything special. The application just needs to issue the select query and then consume the returned data as fast as it can.
There will be blocking - another application will not be able to insert data into the table until the application has finished reading all the data.
To make blocking shorter you have to minimize the time used to consume the returned data. You application should just read the data as quicklu as possible without waiting for anything else.
To avoid blocking completely you would have to use snapshot isolation (check BOL) or use the NOLOCK hint in the select.
Your real problem is probably that your application is not reading data fast enough from the SQL server connection.
I have no idea what an Oracle BULK SELECT does.
August 20, 2010 at 2:25 pm
Here is a question, if you select 5mm or selecting 10Kx500 won't the time it takes to do the select equal in any case?
If so what is the point in "BULK" selecting?
I would also just opt for hinting or isolation then lastly check to see what the external program is doing.
August 21, 2010 at 8:54 am
Stefan_G (8/20/2010)
There will be blocking - another application will not be able to insert data into the table until the application has finished reading all the data.
It depends. It depends on what the indexes are (especially the clustered index for inserts) and what the scope/range of data is being selected. Tables are not necessarily totally locked just because someone selects a large number of rows. 5M out of 120M is only a little over 4% of the table and can easily "survive" on page locks alone.
--Jeff Moden
Change is inevitable... Change for the better is not.
August 21, 2010 at 9:05 am
Since I'm no Oracle Ninja, I did a web search for 'Oracle +"Bulk Select"'. It would appear that there's no such animal. There IS a BULK COLLECT INTO, but no BULK SELECT.
You could do a similiar thing in SQL Server by using SELECT INTO to create a table on the fly. It's very fast even in the FULL recovery mode. The key here would be to quickly isolate any rows you wanted to work into a much smaller result set.
Also as a side, one of the OPs original requests is "NO BLOCKING". Folks, the ONLY way that's going to be guaranteed is if you use WITH(NO LOCK) or by setting the transaction isolation level for the query to READ UNCOMMITTED. Of course, that means possible dirty data and, in some cases, duplicated data.
--Jeff Moden
Change is inevitable... Change for the better is not.
March 19, 2012 at 11:17 am
Can't you just use a windowing function inside a stored proc? Something like this (the body of a stored proc that calls itself)
CREATE PROCEDURE piece_at_a_time @start int
AS
begin
if not exist (select * from ##foo)
begin
create table ##foo (kind of result set you need)
go
end
while
begin
insert into ##foo (v1,v2,v3,v4,v5) select f1,f2,f3,f4,f5 from (select f1,f2,f3,f4,f5,row_number() over (order by key) from t1 where row_number() > @start and row_number() < @start+1000 order by key)
if @@rowcount = 0 break
set @start = @start + 999
end
select * from foo -- or use table directly
end
March 20, 2012 at 3:45 am
Just to echo what others had said before - I cannot understand why you need to pull out the five million rows. No one is going to be able to use this in a report without some sort of aggregation. Is not the best solution to work out exactly what numbers report users need and then do the aggregations out of core business hours on a daily basis, or even in core hours if needed. The all that needs to be retrieved is the aggregated data.
I work with some 'large' tables and cannot think of a situation whereby I would need to select x million rows in a query (other than idle curiosity). I just cannot think of the business reason. Perhaps I am missing something here.:hehe:
March 21, 2012 at 7:22 am
Just my 2 cents worth..
I agree with the other posters, you cannot feasibly use 5M rows in a web page efficiently, unless you are consuming that using some web scraping process. If the users see aggregated data then do the aggregation in T-SQL and just serve up the results. One thing that might help would be to create a covering index; this is a feature of 2005/2008, you can store additional columns in the index which are not used as part of the indexing rules but can be returned directly from the index rather than having to go back to the records in the database - this does however significantly increase the size of the index.
You are SURE that the indexes are created properly??
Are you writing to a specific database table or to a #tempTable or to a @tableVariable.
If you are only looking at certain records, could you set Triggers on the source table to write to the report table when a record is created/modified?
Is the bottleneck in creating the report source table or serving that data up to the web server
March 21, 2012 at 9:36 am
Thanks to all
we actually have a need for all that data, I solved it using custom batches
March 21, 2012 at 9:40 am
Riccado, Can you post a summary of your solution so that we can add to the knowlege base.
Aaron
April 20, 2012 at 5:46 pm
We are actaully having the similar request. Our database stores big data.
200G data daily and we need to retrieve data on weekly basis for reporting purposes that is roughly around 1400G data...
What we've done is utilizing Waitfor Delay in between batches. Bascially,
we need to take a break after a certain amount of data being moved. In that way,
it won't block the CPU and let other process go through instead of being a resource pig.
The tricky part is to find out the right size of each batch . This has to be done
by trial and error as each company has different environment.
Hope it helps
Cheers
Hui
April 20, 2012 at 5:56 pm
ricardo_chicas (8/20/2010)
Ok lets do this, forget about the original problemwhat I really want to know is for a suggestion on what is the best way to retrieve large amount of data, without blocking tables, without using no log, using a select, when I use oracle I was able to do something like that using a bulk select ( I set the select to retrieve chunks of 10000 rows), I am not sure if that kind of stuff works in sql server, but i want something like that
Thank you
Ricardo, you can do this stuff on T-Sql as well. Assuming you have the rowID (identity column) for that monster table and you just pull 10000 records everytime and then break for 100 mill second. something like this waitfor delay '00:00:00.100'
April 21, 2012 at 1:53 pm
I am sure that I have read somewhere that MSSQL 2012 (Denali) has the facility to do a query offset and return rows n to n+m. Which will allow you to do paging on the server side.
April 22, 2012 at 10:20 am
ricardo_chicas (3/21/2012)
Thanks to allwe actually have a need for all that data, I solved it using custom batches
A least one other person asked what this means. Two way street here, Ricardo... what did you do with "custom batches" to solve this problem, please?
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 33 total)
You must be logged in to reply to this topic. Login to reply