Best way to select a large amount of data?

  • ricardo_chicas (8/20/2010)


    Ok lets do this, forget about the original problem

    what I really want to know is for a suggestion on what is the best way to retrieve large amount of data, without blocking tables, without using no log, using a select, when I use oracle I was able to do something like that using a bulk select ( I set the select to retrieve chunks of 10000 rows), I am not sure if that kind of stuff works in sql server, but i want something like that

    Thank you

    as everyone else is telling you; 5 million rows takes time to move.

    the right thing to do is not move the data at all, and look at what the process is doing with the data.

    whatever that process is, it can be fixed and cleaned up so it doesn't take so long and doesn't block the server.

    grabbing 5 million rows in 10K bites is not going to do anything faster for you; you need to tackle it a different way; for example, how often is that 5million rows downloaded ? more than once? why is it not cached?

    As someone else already said, the secondary process that does stuff to the 5 million rows should be looked at so that a single summarizing statement that is executed on SQL server is used to replace that process.

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • ricardo_chicas (8/20/2010)


    Ok lets do this, forget about the original problem

    what I really want to know is for a suggestion on what is the best way to retrieve large amount of data, without blocking tables, without using no log, using a select, when I use oracle I was able to do something like that using a bulk select ( I set the select to retrieve chunks of 10000 rows), I am not sure if that kind of stuff works in sql server, but i want something like that

    Thank you

    We're just guessing here. Post your current query and post the information requested by the article at the second link in my signature line below. Thanks.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • ricardo_chicas (8/20/2010)


    Ok lets do this, forget about the original problem

    what I really want to know is for a suggestion on what is the best way to retrieve large amount of data, without blocking tables, without using no log, using a select, when I use oracle I was able to do something like that using a bulk select ( I set the select to retrieve chunks of 10000 rows), I am not sure if that kind of stuff works in sql server, but i want something like that

    Thank you

    Well if you ask the question like that the only possible answer is that you dont need to do anything special. The application just needs to issue the select query and then consume the returned data as fast as it can.

    There will be blocking - another application will not be able to insert data into the table until the application has finished reading all the data.

    To make blocking shorter you have to minimize the time used to consume the returned data. You application should just read the data as quicklu as possible without waiting for anything else.

    To avoid blocking completely you would have to use snapshot isolation (check BOL) or use the NOLOCK hint in the select.

    Your real problem is probably that your application is not reading data fast enough from the SQL server connection.

    I have no idea what an Oracle BULK SELECT does.

  • Here is a question, if you select 5mm or selecting 10Kx500 won't the time it takes to do the select equal in any case?

    If so what is the point in "BULK" selecting?

    I would also just opt for hinting or isolation then lastly check to see what the external program is doing.

  • Stefan_G (8/20/2010)


    There will be blocking - another application will not be able to insert data into the table until the application has finished reading all the data.

    It depends. It depends on what the indexes are (especially the clustered index for inserts) and what the scope/range of data is being selected. Tables are not necessarily totally locked just because someone selects a large number of rows. 5M out of 120M is only a little over 4% of the table and can easily "survive" on page locks alone.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Since I'm no Oracle Ninja, I did a web search for 'Oracle +"Bulk Select"'. It would appear that there's no such animal. There IS a BULK COLLECT INTO, but no BULK SELECT.

    You could do a similiar thing in SQL Server by using SELECT INTO to create a table on the fly. It's very fast even in the FULL recovery mode. The key here would be to quickly isolate any rows you wanted to work into a much smaller result set.

    Also as a side, one of the OPs original requests is "NO BLOCKING". Folks, the ONLY way that's going to be guaranteed is if you use WITH(NO LOCK) or by setting the transaction isolation level for the query to READ UNCOMMITTED. Of course, that means possible dirty data and, in some cases, duplicated data.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Can't you just use a windowing function inside a stored proc? Something like this (the body of a stored proc that calls itself)

    CREATE PROCEDURE piece_at_a_time @start int

    AS

    begin

    if not exist (select * from ##foo)

    begin

    create table ##foo (kind of result set you need)

    go

    end

    while

    begin

    insert into ##foo (v1,v2,v3,v4,v5) select f1,f2,f3,f4,f5 from (select f1,f2,f3,f4,f5,row_number() over (order by key) from t1 where row_number() > @start and row_number() < @start+1000 order by key)

    if @@rowcount = 0 break

    set @start = @start + 999

    end

    select * from foo -- or use table directly

    end

  • Just to echo what others had said before - I cannot understand why you need to pull out the five million rows. No one is going to be able to use this in a report without some sort of aggregation. Is not the best solution to work out exactly what numbers report users need and then do the aggregations out of core business hours on a daily basis, or even in core hours if needed. The all that needs to be retrieved is the aggregated data.

    I work with some 'large' tables and cannot think of a situation whereby I would need to select x million rows in a query (other than idle curiosity). I just cannot think of the business reason. Perhaps I am missing something here.:hehe:

  • Just my 2 cents worth..

    I agree with the other posters, you cannot feasibly use 5M rows in a web page efficiently, unless you are consuming that using some web scraping process. If the users see aggregated data then do the aggregation in T-SQL and just serve up the results. One thing that might help would be to create a covering index; this is a feature of 2005/2008, you can store additional columns in the index which are not used as part of the indexing rules but can be returned directly from the index rather than having to go back to the records in the database - this does however significantly increase the size of the index.

    You are SURE that the indexes are created properly??

    Are you writing to a specific database table or to a #tempTable or to a @tableVariable.

    If you are only looking at certain records, could you set Triggers on the source table to write to the report table when a record is created/modified?

    Is the bottleneck in creating the report source table or serving that data up to the web server

  • Thanks to all

    we actually have a need for all that data, I solved it using custom batches

  • Riccado, Can you post a summary of your solution so that we can add to the knowlege base.

    Aaron

  • We are actaully having the similar request. Our database stores big data.

    200G data daily and we need to retrieve data on weekly basis for reporting purposes that is roughly around 1400G data...

    What we've done is utilizing Waitfor Delay in between batches. Bascially,

    we need to take a break after a certain amount of data being moved. In that way,

    it won't block the CPU and let other process go through instead of being a resource pig.

    The tricky part is to find out the right size of each batch . This has to be done

    by trial and error as each company has different environment.

    Hope it helps

    Cheers

    Hui

  • ricardo_chicas (8/20/2010)


    Ok lets do this, forget about the original problem

    what I really want to know is for a suggestion on what is the best way to retrieve large amount of data, without blocking tables, without using no log, using a select, when I use oracle I was able to do something like that using a bulk select ( I set the select to retrieve chunks of 10000 rows), I am not sure if that kind of stuff works in sql server, but i want something like that

    Thank you

    Ricardo, you can do this stuff on T-Sql as well. Assuming you have the rowID (identity column) for that monster table and you just pull 10000 records everytime and then break for 100 mill second. something like this waitfor delay '00:00:00.100'

  • I am sure that I have read somewhere that MSSQL 2012 (Denali) has the facility to do a query offset and return rows n to n+m. Which will allow you to do paging on the server side.

  • ricardo_chicas (3/21/2012)


    Thanks to all

    we actually have a need for all that data, I solved it using custom batches

    A least one other person asked what this means. Two way street here, Ricardo... what did you do with "custom batches" to solve this problem, please?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 33 total)

You must be logged in to reply to this topic. Login to reply