April 25, 2011 at 12:11 pm
From past few days, my queries are not performing well. I've used row_number function over four columns in a sub query and in outer query fetching the records for a particular row number. However it seems inefficient if the tables are huge as even for say 2000 records in a particular page, it's processing all the records. Is there some better option ?
April 25, 2011 at 1:28 pm
I quess I am not sure what the question is yet. Better option for what? row_number?
if that is the question then no there is not a better option although there can be improvements in performance depending on how the query is structured and what you are doing with it. I suspect that row_number is likely not the cause of a performace problem however as I have used in on tables with records well into the upper hundreds of million with out any loss of performance.
Dan
If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.
April 25, 2011 at 10:15 pm
It's not like finding better way to implement row_numer. My query is like this:
select col1, col2, col3
from (select row_number() over (order by a.col1, a.col2) as 'rownum', a.col1, a.col2, a.col3
from table1 a inner join table2 b on a.col1 = b.col1
) x
where x.rownum = 5 -- [This can be anything]
The problem here is that the query runs good with thousand of records in both tables but when size grows, it starts taking time. My thinking is that the inner query processes the data for entire set of records, join them and give temporary result to outerquery for all pages i.e. rownum. Thereafter outer query selects the data only for a particular page and rest all of the pages are not required at a particular time.
Is there some way to process the records with only the desired page number within inner query ?
April 25, 2011 at 10:27 pm
How many rows are in the table now?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 26, 2011 at 1:48 am
Around 400000 records in the table [and can grow in future]. If we need 10000 records per page, we would like to avoid extra processing of rest of the records.
April 26, 2011 at 6:27 am
Thinking about how to put this together for you and have just a few more questions.
What will be done with the 10,000 records per page that you speak of? Do you expect a human to read them?
Do you have a given sort order or does the sort order need to be, ummm... "flexible"?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 26, 2011 at 7:24 am
What will be done with the 10,000 records per page that you speak of? Do you expect a human to read them?
A: This is not for user but for internal processing and is required that way by business.
Do you have a given sort order or does the sort order need to be, ummm... "flexible"?
A: Didn't understand your question fully but Inside row_number() function, we have the fixed columns for ORDER BY clause.
April 26, 2011 at 9:44 am
Thank you for the replies. I'm at work right now and I'll build some code for you tonight.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 28, 2011 at 12:33 am
I was thinking of one mechanism. Say we can have one index on the columns participating in pagination [this way these columns will be in sorted manner]. We can set the rowcount and do the processing. Parallely we'll need one table keeping the max record for the page based column of the recent process page.
April 28, 2011 at 8:03 am
Crud. My apologies. We had a bit of a crisis at work and forgot my commitment on this thread. I'll try to get to this tonight. With the information you've provided, it shouldn't be difficult and I believe you'll enjoy the performance of the code.
--Jeff Moden
Change is inevitable... Change for the better is not.
April 28, 2011 at 8:36 am
Jeff, I am awaiting suggestions from you on my thought and your code as well.
April 28, 2011 at 8:26 pm
I know... I'm working on it. While your waiting, how many rows are in the second table and how long is your query currently taking?
--Jeff Moden
Change is inevitable... Change for the better is not.
April 28, 2011 at 10:19 pm
Using a join to Table2 as a criteria is really the showstopper here yet it is (I assume) critical to the proper return of data. I've been testing with a million rows of data in Table1 and Table2. With the join and some proper indexing, the best I can get ot of it is 7 seconds at row 100,000 which is pretty bad.
Without the join, I can get it down to a second but that requires a little pre-aggregation in the form of an extra column on Table1 and a trigger on Table2 to keep the new column up to date. What the column would hold is indication that any given row exists as if the join had been done.
Knowing all that, where do you want to go from here?
--Jeff Moden
Change is inevitable... Change for the better is not.
May 3, 2011 at 12:04 am
Jeff, thanks for that idea. However I just gave you one example. Most of our issues involve more than one joins and few left outer joins as well.
These days I was trying to find out some out of box techinque to sort this issue. Right now I've created one physical table containing pageid, the columns involved in calculating pageid and the columns needed in joins. This data is processed for all pages into this table at the time of pageid = 1 and henceforth for all pages we just need select statement from that physical table choosing your pages. Finally at the last page we can delete data for all those pages. Let me know if this sounds good and your feedback is welcome.
May 4, 2011 at 3:12 am
Can there be downside of this methodology ?
Viewing 15 posts - 1 through 15 (of 34 total)
You must be logged in to reply to this topic. Login to reply