15 million rows and growing, so do the problem

Question

15 million rows and growing, so do the problem

Himalay

Grasshopper

Points: 21
More actions
August 7, 2006 at 11:41 pm

#115017

Hi Guys,
     I am having 15 million rows in my table (and every week 50K more records are added). In one of the pages the client want to see ALL the records in a DataGrid (10 records per page). My query takes around 8-9 minutes to get the data and thats why client is not accepting the system. Please Help Me.
    This is what I have done in my Stored Procedure.
    I am passing PageNo and PageSize values in my SP. I calculate the first and the last row number that is tobe displayed.
    The query is not simple select query, it has 5 joins and an order by clause.
    Since SQL Server 2000 does not provide functionality like Row_Number(), I am taking a variable of type Table with an identity column. I am pusing all the relavent data into table type variable and than select relevant 10 records from it.
    Apart from that I am setting ROWCOUNT to (PageNo * PageSize). But still the query is taking 8-9 minutes to fetch 10 records.
     Any help is appreciable!!!
Kindest Regards,
Himalay

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

David.Poole SSC Guru Points: 76001 More actions · Answer 1

So the client wants a system that gains an extra 5,000 pages per week and at present has 1.5 million pages of records which they insist that they will page through?

By my calculations at present (assuming a 1 second response time) it would take the client 17 days to page through all the records.

Putting 15+ million records in a datagrid is a ridiculous requirement.

The only way I can think of to deal with this is to cheat.

Split your table into two (active and archive). Make sure that the last 4 weeks of data are in the active portion and everything else in the archive.

Have your main stored procedure call subsidiary stored procedures depending on which page the client chooses.

For example if your client puts in page below 20,000 (4 weeks worth of data) then you call a stored procedure on just the active portion of your data.

If they want pages from 20,001 onwards then you call a stored procedure on a view that UNION ALLs the active and archive table.

LinkedIn Profile

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 2

It sometimes makes you want to grasp them round the throat very tightly !! I believe this issue can be resolved using a cursor which batches it's results ( forehose cursor ?? ) I'm not a cursor person I admit, but search under that I'm sure you'll find an answer.

I've likewise had this issue where searches could return several million rows of data to the user, in my case it took over a gb of data to the client and the query obviously had an adverse affect on my data cache ( fortunately I had 30Gb of memory ! ) - but the business declared I couldn't restrict the result set, I was going to add a set rowcount = 250 in the proc , arrghhhh!!!!

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/

Tom Goltl SSCertifiable Points: 5815 More actions · Answer 3

Partitioning the table seems like a very good idea. For query performance have you looked at your explain plans to make sure things are running optimally. I don't use it very often, but I've had to put hash hints in querys because the optimizer wanted to use loop joins on large tables. What about covering indexes to speed performance. Without the query and some DDL it's hard to quess on performance tuning.

Tom

Himalay Grasshopper Points: 21 More actions · Answer 4

I used Indexes, Hints, Views, IndexedViews and god knows what not. The execution plan did not show use of these indexes. Obviously because I don't have any WHERE condition in my query . I am supposed to show all the data page by page.

As David suggested I tried splitting up the table but it creates problem for the rest of the pages where I again need to take UNION of resultsets from two tables.

I have tried running the similar query with SQL Server 2005. There I modified the query and used WITH clause that is avaliable with SQL Server 2005 and the results were amazing. For the first 10 records to show it took 3 seconds and to fetch last 10 records it took around 15 seconds.

By the time I am trying to convince the client that this is not what he wants. At least do not try to show ALL the records on Page Load. Let the user select some criteria to see the records. Lets see what comes out.

But in between the time any suggestions are welcomed.

Kindest Regards,

Himalay

John Rowan SSC Guru Points: 56440 More actions · Answer 5

"Since SQL Server 2000 does not provide functionality like Row_Number(), I am taking a variable of type Table with an identity column. I am pusing all the relavent data into table type variable and than select relevant 10 records from it."

This may be a silly question, but are you creating a clustered primary key on your table variable? If not, you are doing a table scan of the entire table for each SELECT.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

Dinakar Nethi-176633 SSCarpal Tunnel Points: 4689 More actions · Answer 6

You could do custom paging. If you google there are plenty of articles describing how you can do it In brief, you only retrieve the 10 records for the page that was clicked so your first page loads faster. Last time I saw an article like this on 4guysfromrolla.com.

******************
Dinakar Nethi
Life is short. Enjoy it.
******************

Gabor Nyul SSCrazy Eights Points: 9459 More actions · Answer 7

You have to rewrite you query by memorizing the last retrieved key.

So you could write your proc like this:

create proc S_page @lastkey int

as

select top 10 a, b, c

from table

where a > @lastkey

order by a

here you have to assume being the column "a" unique and having and index (preferably clustered) on "a" as well

This solution works and it is fast, but not the most beautyfull

Bye
Gabor

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 8

please don't tell me you're trying to populate a table variable with 15 million rows?

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/

Himalay Grasshopper Points: 21 More actions · Answer 9

Well all your doubts are right

I do am pushing 15 million rows in Table Variable as I do not have an identity column

I am using the custom pagining and fetching relevant 10 recrods only using this Table variable.

I cannot use an identity column of my table as in resultset there is a GROUP BY clause and sequence of identity column is not being maintained. Hence LastKey logic fails in my case.

I have a composite key and am using Clustered Index on this composite key.

Thats the scenario................

If you guys can thnk of something that fits my client's Wierd Requirement.

Kindest Regards,

Himalay

Gabor Nyul SSCrazy Eights Points: 9459 More actions · Answer 10

Himalay,

You don't need to have an identity column to be able to use my virtual Lastkex idea. In your case Lastkey is composed of several columns. So you just have to replace Lastkey with xoulumns like Column1 = @Column1 and Column2 = @Column2...

Because you know (from your client app) which have been the last data retrieved it is quite easy to pass those data as parameter to the stored proc.

Just an another remark:

A table variable cannot have any index, a temp table can.

Because you are handling such a high amount of data I wouldn't use any temp table (or table variable), just use proper indexing and such kind of paging I've described earlier. this will work equally fast regardless the number of rows you have in the base table.

Istead of the TOP clause you can use SET ROWCOUNT 10 as well

Bye
Gabor

John Rowan SSC Guru Points: 56440 More actions · Answer 11

"Just an another remark:

A table variable cannot have any index, a temp table can."

You can create a clustered index on a table variable as a primary key.

compare the output of:

declare @table table (value int primary key clustered)

insert into @table

select 5 union all

select 2 union all

select 3 union all

select 1

select * from @table

with this:

declare @table table (value int)

insert into @table

select 5 union all

select 2 union all

select 3 union all

select 1

select * from @table

In the first example, the execution plan shows a clustered index scan. The second example shows a table scan. Now, try to insert duplicate values in the first example....PK constraint violation.

Regardless of whether you can create a clustered index on a table variable, Gabor is correct in that you shouldn't be using temporary storage for 15 million rows.

John Rowan

======================================================
======================================================
Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

colin.Leversuch-Roberts SSC Guru Points: 52643 More actions · Answer 12

Please don't use a table variable for anything other than a small data set. You'll find a #temp table will optimise far better.

clustered indexes allow for enforced uniqueness on a table variable, note that a table scan and a clustered index scan are the same thing.

A firehose cursor from asp/ado will fetch the first batch of rows really quick and continue to display result sets - I've seen microsoft demoing these and they are good for what you need.

[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/