Preventing usage of "SELECT *..."

  • sorte.orm (11/5/2009)


    I asked why "select *" is such a bad thing, and as far as I can tell there has not been any single answer that was good imho. Data transfer amount was mentioned, but you can't make stupid developers smart by applying constraints like preventing "select *". A developer can still select all columns manually, there's no way to tell if he selects more data than he needs. You just have to rely on developers not being stupid, and if you find stupid developers you have to educate them. Indexes not being used correctly can't really be the case as far as I can tell? The where clause determines which index is used, not the data selected?! If this is not the case, then I would certainly appreciate a link to educate my ignorance on this subject.

    Most sql code is written by hand in a sql console, then tuned and finally implemented in code. That normally starts with select * something, then it is modified to it's final version including just the columns that's needed. We have a lot of select * in our production code because we have created a module that does automatic databinding based on control name. However just one row is selected at a time, so I can hardly see speed or excessive data transfer being valid arguments. Atleast not when you take into account that to implement a new field in a edit screen the only work required is adding the db column, and placing a textbox with the same name as the db column in the interface. No code required.

    The case of removing columns can easily break code, but that breakes code regardless of the deveopers using select * or select column-names.

    This solution was quite cunning imo, but it adresses a problem that's not really a problem by creating a much bigger problem (not beeing able to do select *). However I'm sure there's cases where this hack could prove helpful.

    EDIT:

    @SQL Noob - your post was relevant I think, but I trust developers that do a "select *" really knows what they are doing so that extra columns, or a changed column ordering or something like that doesn't break their code. Also we don't allow end users to access datatables directly, if that was a requirement I would replicate the column in question so that the main database was still safe from evil locks and people that doesn't have a clue on what they are doing.

    we've moved most MS Access users to replicated copies, but in the past they would call the most powerful EVP in the company and he said there was a business reason to hit the main servers. the big one was they needed to change a few rows of data and they would use MS Access. but in a lot of cases there was no app to select only a few rows so they would link a table.

    in our case we run Weblogic which requires a database of its own to operate and it's very sensitive to schema changes. the reason it breaks code is that it gets extra data and it doesn't know what to do with it since it also reads and writes to it's own tracking database as part of operations. it's a complicated system where numerous Java apps hit weblogic for some functions and every little schema change is thoroughly tested to make sure nothing breaks. a lot of the code goes back to a custom java class that is in use by multiple apps and it's not a simple matter of changing the select * statements if they currently work.

  • Phil Brammer (11/5/2009)


    SQL Noob (11/5/2009)

    isn't that going to be caught in testing?

    when we were getting ready for sql 2005 we found our developers had written code that looks like

    select column1, column1 order by column1 or something similar. worked perfect in sql 2000 but 2005 threw errors and they even complained that they had to change their code

    The point is that it works in testing, but after deployment someone adds another column of the same name as an existing column to a joined table.

    you don't test schema changes in a separate environment to make sure they don't break anything? in our case the SOX auditors would give us a big FAIL if we allowed developers to make changes in production

    the devs write code, it's sent to QA to be tested. in some cases QA has several copies of the same database to test different apps. Once testing is complete there is a small group of users who test it in pre-prod and give their comments and approval. then management approves it and the changes are deployed to production with the devs writing deployment instructions

    i know people that worked in very large companies where there is an official software standard which is usually at least one or two SP's behind the latest service pack and 1 major software version behind. everything has to work on that standard and any waivers must be submitted in writing to be approved. nothing is changed without testing and approval. we had index hints being used in code because for whatever reason in sql 2000 the optimizer would default to a scan on a query. with sql 2005 it's now a liability and testing showed the query running faster with no index hints. I think it took a year to change that code to get rid of the hints

  • I'm talking generically about the Exists clause. SQL server accepts it, but as a developer I need to be concious that my application may be moved to a different SQL db. This especially true with Java applications using tools like Hibernate or .Net using NHibernate. Or even platforms like MVC that separate out the db layer. Use of all of these is "good practice" these days.

    The more effort that DBA's put into making the developers life more difficult instead of training them to use better SQL practices widens the knowlege gap and makes everyone job more difficult.

  • You should not be using this at all for an example like these at all. Everyone should be using Parameterized queries and using the column names when selecting fields instead of using *. I believe they are going to stop letting us * pretty soon anyway

    select * from Table1 t join Table2 s on t.ID = s.ID2

    that should have changed into a parameterized query so that you can use it multiple times instead of only when it is identical.

    It may be a good practice to change these ADHOC to parameterized queries (using variables) and this will reduce the amount of CPU being used

    Example:

    Select Description from stages where stageid =0 [It has been used 103705 times]

    If you do the following, there would only be one plan executing for these.

    (@0 int) Select Description from stages where stageid = @0

    we have strayed far from the just of what the discussion began with. Good job to the one who started the thread and good luck to all those who reap from it.

    Got to go, training up developers on why we do not continue working this way.

  • sorte.orm (11/5/2009)


    I asked why "select *" is such a bad thing, and as far as I can tell there has not been any single answer that was good imho. Data transfer amount was mentioned, but you can't make stupid developers smart by applying constraints like preventing "select *". A developer can still select all columns manually, there's no way to tell if he selects more data than he needs. You just have to rely on developers not being stupid, and if you find stupid developers you have to educate them. Indexes not being used correctly can't really be the case as far as I can tell? The where clause determines which index is used, not the data selected?! If this is not the case, then I would certainly appreciate a link to educate my ignorance on this subject.

    Most sql code is written by hand in a sql console, then tuned and finally implemented in code. That normally starts with select * something, then it is modified to it's final version including just the columns that's needed. We have a lot of select * in our production code because we have created a module that does automatic databinding based on control name. However just one row is selected at a time, so I can hardly see speed or excessive data transfer being valid arguments. Atleast not when you take into account that to implement a new field in a edit screen the only work required is adding the db column, and placing a textbox with the same name as the db column in the interface. No code required.

    The case of removing columns can easily break code, but that breakes code regardless of the deveopers using select * or select column-names.

    This solution was quite cunning imo, but it adresses a problem that's not really a problem by creating a much bigger problem (not beeing able to do select *). However I'm sure there's cases where this hack could prove helpful.

    EDIT:

    @SQL Noob - your post was relevant I think, but I trust developers that do a "select *" really knows what they are doing so that extra columns, or a changed column ordering or something like that doesn't break their code. Also we don't allow end users to access datatables directly, if that was a requirement I would replicate the column in question so that the main database was still safe from evil locks and people that doesn't have a clue on what they are doing.

    Check out http://www.sqlservercentral.com/articles/User+Defined+Function+(UDF)/62730/ for a good reason

  • All,

    Thank you for the feedback and comments. Keep 'em coming. Appreciate it!

    This article was meant to suggest a possible way, as an idea (a rude one perhaps) to prevent the use of SELECT *. It should certainly not be mistaken for a solution. It's just a different approach. Am hoping this could give rise to an actual solution for some issue of the present or future.

    As stated in the article and by many of you, this is indeed a lot of work to not allow SELECT *. It also creates some unwanted restrictions...not to forget the frustration with the denials.

    I think we got good set of answers for why SELECT * is not a healthy practice.

    No questions...GDR 2008, code reviews and educating developers/users are excellent ways to eliminate SELECT * usage in code!

    The vision must be followed by the venture. It is not enough to stare up the steps - we must step up the stairs. - Vance Havner
  • Scott.Deagan

    I think you miss the point. This trick is to limit the selection to nominated columns required to satisfy a requirement not just dump all columns down the network.

    The users may have a valid need to access some of the data but rather than limit each user with roles etcetera this trick works well.

    🙂

  • the solution provided does not work on oracle. does it really work on some other RDBMS....i doubt? kindly clarify if some alterations required!

  • I love the idea but don't see myself using soon.

    Wouldn't the error messages be clearer for users if the dummy column was named [*]:

    The SELECT permission was denied on the column '*' of ...

  • Well, select count(customer_name) and count(*)

    will not produce the same results if Nulls are allowed

    in customer_name column.

  • Well, select count(customer_name) and count(*)

    will not produce the same results if Nulls are allowed

    in customer_name column.

  • 1) code deploy to production server should not use select *

    queries should be all in parameterized stored procedure A) each exec permission can be granted B) to prevent SQL injection (weak SQL code can be exploided by SQL injection) C) when there is a change, adding a column, it forces proper code change to avoid wrong field value being displayed into wrong column.

    2) On DEV or QA, do whatever you want to and need to.

    It would be difficult to have 2) and force 1) when deployment happens to production server because it has passed QAT and UAT.

  • Ineresting , but better than enforcement will be understanding....

  • ndrd (11/6/2009)


    the solution provided does not work on oracle. does it really work on some other RDBMS....i doubt? kindly clarify if some alterations required!

    It works quite well in MS SQL Server, the RDBMS that this site is centered around. It does however come with tradeoffs, as discussed in detail on this thread.

    And forgive the nitpick, but bold tends to work best if you highlight small sections out of a larger block. Bolding the entire post is a little counterproductive.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • Well tried mate!. Yes it may not be practical in most situations. That apart, I have one query.

    Can the user be able to view the values of the dummy column (DummyColumn) by using the following query?.

    SELECT DummyColumn FROM dbo.Table_1;

    Pardon me, as I could not try the code, since I had not installed SQL in my new system.

Viewing 15 posts - 46 through 60 (of 140 total)

You must be logged in to reply to this topic. Login to reply