Paging on huge tables with row_number

Question

Paging on huge tables with row_number

sqlnaive

SSCoach

Points: 17437
More actions
April 25, 2011 at 12:11 pm

#237042

From past few days, my queries are not performing well. I've used row_number function over four columns in a sub query and in outer query fetching the records for a particular row number. However it seems inefficient if the tables are huge as even for say 2000 records in a particular page, it's processing all the records. Is there some better option ?

Viewing 15 posts - 1 through 15 (of 34 total)

You must be logged in to reply to this topic. Login to reply

Dan.Humphries SSChampion Points: 12125 More actions · Answer 1

I quess I am not sure what the question is yet. Better option for what? row_number?

if that is the question then no there is not a better option although there can be improvements in performance depending on how the query is structured and what you are doing with it. I suspect that row_number is likely not the cause of a performace problem however as I have used in on tables with records well into the upper hundreds of million with out any loss of performance.

Dan

If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.

sqlnaive SSCoach Points: 17437 More actions · Answer 2

It's not like finding better way to implement row_numer. My query is like this:

select col1, col2, col3

from (select row_number() over (order by a.col1, a.col2) as 'rownum', a.col1, a.col2, a.col3

from table1 a inner join table2 b on a.col1 = b.col1

) x

where x.rownum = 5 -- [This can be anything]

The problem here is that the query runs good with thousand of records in both tables but when size grows, it starts taking time. My thinking is that the inner query processes the data for entire set of records, join them and give temporary result to outerquery for all pages i.e. rownum. Thereafter outer query selects the data only for a particular page and rest all of the pages are not required at a particular time.

Is there some way to process the records with only the desired page number within inner query ?

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 3

How many rows are in the table now?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sqlnaive SSCoach Points: 17437 More actions · Answer 4

Around 400000 records in the table [and can grow in future]. If we need 10000 records per page, we would like to avoid extra processing of rest of the records.

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 5

Thinking about how to put this together for you and have just a few more questions.

What will be done with the 10,000 records per page that you speak of? Do you expect a human to read them?

Do you have a given sort order or does the sort order need to be, ummm... "flexible"?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sqlnaive SSCoach Points: 17437 More actions · Answer 6

What will be done with the 10,000 records per page that you speak of? Do you expect a human to read them?

A: This is not for user but for internal processing and is required that way by business.

Do you have a given sort order or does the sort order need to be, ummm... "flexible"?

A: Didn't understand your question fully but Inside row_number() function, we have the fixed columns for ORDER BY clause.

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 7

Thank you for the replies. I'm at work right now and I'll build some code for you tonight.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sqlnaive SSCoach Points: 17437 More actions · Answer 8

I was thinking of one mechanism. Say we can have one index on the columns participating in pagination [this way these columns will be in sorted manner]. We can set the rowcount and do the processing. Parallely we'll need one table keeping the max record for the page based column of the recent process page.

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 9

Crud. My apologies. We had a bit of a crisis at work and forgot my commitment on this thread. I'll try to get to this tonight. With the information you've provided, it shouldn't be difficult and I believe you'll enjoy the performance of the code.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sqlnaive SSCoach Points: 17437 More actions · Answer 10

Jeff, I am awaiting suggestions from you on my thought and your code as well.

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 11

I know... I'm working on it. While your waiting, how many rows are in the second table and how long is your query currently taking?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004401 More actions · Answer 12

Using a join to Table2 as a criteria is really the showstopper here yet it is (I assume) critical to the proper return of data. I've been testing with a million rows of data in Table1 and Table2. With the join and some proper indexing, the best I can get ot of it is 7 seconds at row 100,000 which is pretty bad.

Without the join, I can get it down to a second but that requires a little pre-aggregation in the form of an extra column on Table1 and a trigger on Table2 to keep the new column up to date. What the column would hold is indication that any given row exists as if the join had been done.

Knowing all that, where do you want to go from here?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

sqlnaive SSCoach Points: 17437 More actions · Answer 13

Jeff, thanks for that idea. However I just gave you one example. Most of our issues involve more than one joins and few left outer joins as well.

These days I was trying to find out some out of box techinque to sort this issue. Right now I've created one physical table containing pageid, the columns involved in calculating pageid and the columns needed in joins. This data is processed for all pages into this table at the time of pageid = 1 and henceforth for all pages we just need select statement from that physical table choosing your pages. Finally at the last page we can delete data for all those pages. Let me know if this sounds good and your feedback is welcome.

sqlnaive SSCoach Points: 17437 More actions · Answer 14

sqlnaive

SSCoach

Points: 17437

May 4, 2011 at 3:12 am

#1319603

Can there be downside of this methodology ?