Views (performance)

  • Hi,

     

    Is there any performance/cost advantage in selecting say 5 columns from a view, which contains 10 or more columns? I mean selecting only the relevant columns instead of all the columns, as in select *.

     

    If I select 5 cols from the 10 which are available, will the view still select all the 10 columns from the base tables and only return me the 5, which I selected? If that were the case then I’d rather have a view for those five columns, but if the view will select only the 5 columns from the base tables, I think it would be performing like a view, which has 5 columns in its definition.

     

    So how does the view internally fire the query on the base tables? Does it optimise it keeping in mind the columns selected by the user or does it fire the query specified in the view definition and fetch all the rows anyways?

     

    Thanks in advance.

     

     


    What I hear I forget, what I see I remember, what I do I understand

  • I've seen studies on this, and if memory serves, I don't think it makes enough of a difference even to worry about.  I usually just select what I need, but that's more of a habit than a studied decision.  When it comes to slowing down a query, the FROM clause is generally where most of the damage is done, with the WHERE clause also weighing in.

    So much for my opinion.  I hope the "big guys" respond, if reality takes a different turn.

  • I can add a little info on this but not much. As far as I know you can have perf degradation when you have lots of views being utilized in the same select. So much so that the query optimizer has problems finding the best plan, or also because the same tables are joined on a few too many times when they really shouldn't.

    However I never experienced that myself so I can't comment much further.

  • Thanks for replying Lee.

    I gave it some more thought and I think a stored procedure with a flag as an input parameter which would determine which query to execute (a query to return 10 or a query returning 5 columns) in an if loop would do.

    But when we are talking about multiple Database support we need to be as generic as possible, hence views would be the obvious choice in this case.

    Don’t know if that was relevant in this discussion but I wanted to add it here lest someone who is thinking of suggesting that should be informed that I have already thought of that possibility.


    What I hear I forget, what I see I remember, what I do I understand

  • Under most circumstances selecting the 5 colums will be marginally more efficient than selecting all 10.  Under some circumstances the benefit is large.  I can't think of a single circumstance where selecting all 10 would be more efficient.

    Why?  SQL Server doesn't need to return the entire result set of the view and then pass only the selected columns and/or rows back to the client.  So, in most circumstances you will see a small (percentage wise) gain by limiting your columns because less data needs to be read and sent. 

    In the case that you use a WHERE clause against a view and all the SELECTed columns are covered by an index, there could be a substantial improvement.

    HTH

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • I think that the question is more about the difference between selecting form a view vs selecting from the base table.

  • I have been guilty of reading questions too quickly before, but I don't think this is one of those times...

    If that really was the question, then there should be no difference because SQL doesn't try to compile two different query plans (one for the view and another one for the select against the view).

    /*****************

    If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek

    *****************/

  • Nice to hear that. I'll try to remember it .

  • OK I did some testing and here are my observations.

    I created 2 tables each containing about 300000 records. I joined them in a view, didn’t use any where clause and the join returned 99999 records when I did  select * from the view. This view, V1, returned 6 columns (i.e. it selected all the columns from the two base tables). I created another view,V2,  which returned 3 columns from the base tables, and it returned 99999 records as well.

    The execution times of the queries were very different, even when I was selecting the same three columns from both the views. The view –V2, performed a lil better than V1.

    Now we already knew this would be the case, right, OK! The question still remains... How are the times different when both the views are returning same records, joining the same tables. Cant the query optimiser be intelligent enough to understand that since only 3 columns are being referenced in the select statement for the first view, it should only select the relevant columns from the base tables? I'd expect it to be THAT SMART!


    What I hear I forget, what I see I remember, what I do I understand

  • Check both execution plan. The answer is most likely there. Maybe you move from an index scan + bookmark lookup to a clustered index scan somewhere and that can make a big difference. Also remember that the data is cached for the 2nd execution so you have to flush it before the 2nd query.

  • I executed these commands before I executed the queries.

    DBCC FREEPROCCACHE

    DBCC DROPCLEANBUFFERS

    And if they werent enough, I even stopped and restarted my dev server.

    The execution plans are as..

    View - V1

    select T1SRNO,T3SRNO,T1FNAME,T3Fname from t1t3
      |--Hash Match(Inner Join, HASH ([t3].[srno])=([t1].[srno]))
           |--Clustered Index Scan(OBJECT ([bank].[dbo].[t3].[t31]))

           |--Clustered Index Scan(OBJECT ([bank].[dbo].[t1].[IX_t1]))

    View - V2

    select T1SRNO,T3SRNO,T1FNAME,T3Fname from t1t3limit
      |--Hash Match(Inner Join, HASH ([t3].[srno])=([t1].[srno]))
           |--Clustered Index Scan(OBJECT ([bank].[dbo].[t3].[t31]))
           |--Clustered Index Scan(OBJECT ([bank].[dbo].[t1].[IX_t1]))

    They appear to be very similar.


    What I hear I forget, what I see I remember, what I do I understand

  • Is this a production server?

    Are you testing directly from the server's pc (so that the network is not a factor)?

    Are you sure you have 2 different views (doesn't show on the plan)?

  • Its a development environment, hence I can afford the restarts.

    Query Analyzer is opened on the server PC so network is not an issue.

    And I do have two different views...

    t1t3 is the name of the first view

    t1t3limit is the name of the second view.

     


    What I hear I forget, what I see I remember, what I do I understand

  • What's the size of the columns that you ommitted in the 2nd select?

  • I dont have that info right now as I am not at my office, but the width (thickness) of a coulmn, how would it impact the query? Since they are the same in both.


    What I hear I forget, what I see I remember, what I do I understand

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply