Preventing usage of "SELECT *..."

  • LadyRuna (2/11/2011)


    I have a practical example of where "Select *" fails:

    Our product has a VIEW called "Customerz" defined as "SELECT * FROM CustomerTable" (view & table name changed to protect the guilty). This view is used in SSRS as part of a data model (and due to permissions settings / rules presented by the powers that be, we have to use the view in the data model)

    I added a New column to CustomerTable, then checked the view. It still listed only the original 5 columns for CustomerTable. The only way I could force the view to display all 6 columns (original 5 + the one I added to the base table) was to ALTER the view.

    Fortunately, I was able to find the problem quickly, however, I wonder how many other "gems" of SELECT * in views will bite us later on.

    This is not a flaw of a view, it is a common misunderstanding of how they work. They are a saved query with a saved execution plan. This is the way they behave. If you change the data structure and want your view(s) to be updated you have to recompile them.

    _______________________________________________________________

    Need help? Help us help you.

    Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

    Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

    Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
    Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
    Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
    Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

  • This seems a bit too extreme and doesn't really stop programmers from using more fields than necessary not to mention slowing down your system.

    Work close with your programmers, force them to use views and/or stored procedures, and periodically evaluate / audit the queries being utilized in the system if it is that important.

    In my opinion, a solution like this one is never a good answer over creating procedures and making sure people follow them. This goes for not only creating policy and guidlines for programmers, but also for yourself.

  • This is a good way to start a war between (developers and/or users) and database administrators.

  • This is a great topic and discussion, but I looked and did not find the "Why" part. WHY is it frowned upon to use Select *? I am assuming that it's a performance consideration.

    I know that I have been bitten numerous times when someone changes a field name without notifying me. Then, selecting on the explicit (missing) field name caused an error in processing. Using the Select * avoids the error and the need to update the SP (or code wherever) whenever a field name changes.

    If the source table and the destination table field names are changed in sync, then the Select * transparently allows the data to pass between the tables without the necessity to change the SP any time a field name is changed.

  • The code displayed seems to DENY select permission altogether. Lets assume there are application users who need select access to all the tables but we need to create a condition to DENY only select *. How to achieve this.

  • A good example of what I have personally run into many times over the years.

    SELECT * on a table where many columns are involved, but the code only needs 2 columns. This turned into an issue of just obtaining more data than necessary with outcome being slower query performance and extra bandwidth usage.

    Why pull 20 columns worth of data when you only need 3? Especially when one of those columns just happens to be an image, ntext, (max), ect...

    So to your question, I ask you a question. If you have 20 books, would you carry them around every day or just carry the one you need?

    If you need all the columns and the table will never change... well, it's not THAT bad to use SELECT * in your code. but... it's better to keep things uniform and stop a bad habit from forming. Will help keep your code towards self documenting.

  • Ken Wells (2/11/2011)


    This is a great topic and discussion, but I looked and did not find the "Why" part. WHY is it frowned upon to use Select *? I am assuming that it's a performance consideration.

    No, it isn't a performance issue - except when someone makes one of three silly mistakes:

    (a) combining using select * with not writing sensible filters (on clauses for inner joins and where clauses for all queries);

    (b)using select * when they only want 2 or 3 columns of a 100 column table but they want a lot of rows; or

    (c)using select * when they are looking at a table with a varchar(max) colum with a very high every average length that their code doesn't need to see and they want a lot of rows (or something similar)

    In cases (b) and (c), which are really two extremes of one issue, select * probably shouldn't have been used (it depends whether the maintenance problems arising from not using it are worse than the performance problems from using it, which in turn may depend whether the result of the select is going to go over a netwrok connection or just into local storage within the data engine).

    In case (a) the problem is that the developer has failed to select the rows he wants.

    If none of (a), (b), and (c) applies there is no good reason to ban select * except in code that requires to see an exact column set; usually when problems arise because select * is used when a specific column set is required it's because the programmer didn't have the sense (or was too lazy or careless) to write code that would ignore extra columns, not that select * was inappropriate - and it's sloppy programming like that, or the desire to have applications chucking ad hoc SQL at the server, that has led to the "select * is always bad" myth.

    Of course any time an application is passing raw SQL over a network to a server (instead of only making procedure calls) there is already a security problem, a control and management problem, and probably a performance problem too and select * can make the last of those worse; but the solution there is to put together a secure and controllable architecture, not to ban select *.

    I won't comment on the rest of your note because I would only be agreeing vehemently.

    Tom

  • Uripedes Pants (2/11/2011)


    I thought this topic sounded familiar:

    http://www.sqldev.org/sql-server-database-engine/prevent-select--query-80094.shtml

    Thank you for the link. The posts in the link that you provided above has 5th Nov 2009 as its first post date. Also, Jonathan Kehayias has provided a link to this article in one of the subsequent posts on that page. Thank you Jonathan!

    Just to clarify...this sqlservercentral article was first published on 5th Nov 2009 and hence was sent to the editor a few weeks earlier than the date of first publishing.

    The vision must be followed by the venture. It is not enough to stare up the steps - we must step up the stairs. - Vance Havner
  • I'll just weigh in...that's a little much don't you think?

    Putting a dummy column in EVERY table to prevent Select *? As others have alluded--you've got bigger problems if all your developers are using select *, so preventing that isn't going to fix much.

    I had to check the date because I remember PC magazine putting in articles like this on April 1.

  • LadyRuna (2/11/2011)


    I added a New column to CustomerTable, then checked the view. It still listed only the original 5 columns for CustomerTable. The only way I could force the view to display all 6 columns (original 5 + the one I added to the base table) was to ALTER the view.

    Fortunately, I was able to find the problem quickly, however, I wonder how many other "gems" of SELECT * in views will bite us later on.

    No, not fortunately. You still have the same issue if the view selects five columns and a sixth is added to the table. You have to find the views anyway, to add the sixth column.

    The problem with SELECT * (other than some performance issues) is where it is used to populate temp tables or other places (e.g. SSIS) where adding a new column will cause things to break.

  • vishal.gamji (2/11/2011)


    Well... using SELECT *, 1 or the column name has the same affect in an IF EXISTS clause, and COUNT(PriKey) can always be used for a count.

    COUNT(PriKey) only works if there is a PRIMARY KEY constraint and it is only one column. A composite primary key would not work in this case, as only one column can be passed into COUNT() as an argument.

    SELECT COUNT(*) is not the same as COUNT(Column). COUNT(Column) counts only non-null values in that column. COUNT(*) actually counts all of the rows in the table, using an index with the fewest number of pages to get said count.

  • sorte.orm (11/5/2009)


    I asked why "select *" is such a bad thing, and as far as I can tell there has not been any single answer that was good imho. Data transfer amount was mentioned, but you can't make stupid developers smart by applying constraints like preventing "select *". A developer can still select all columns manually, there's no way to tell if he selects more data than he needs. You just have to rely on developers not being stupid, and if you find stupid developers you have to educate them. Indexes not being used correctly can't really be the case as far as I can tell? The where clause determines which index is used, not the data selected?! If this is not the case, then I would certainly appreciate a link to educate my ignorance on this subject.

    Most sql code is written by hand in a sql console, then tuned and finally implemented in code. That normally starts with select * something, then it is modified to it's final version including just the columns that's needed. We have a lot of select * in our production code because we have created a module that does automatic databinding based on control name. However just one row is selected at a time, so I can hardly see speed or excessive data transfer being valid arguments. Atleast not when you take into account that to implement a new field in a edit screen the only work required is adding the db column, and placing a textbox with the same name as the db column in the interface. No code required.

    The case of removing columns can easily break code, but that breakes code regardless of the deveopers using select * or select column-names.

    This solution was quite cunning imo, but it adresses a problem that's not really a problem by creating a much bigger problem (not beeing able to do select *). However I'm sure there's cases where this hack could prove helpful.

    EDIT:

    @sql Noob - your post was relevant I think, but I trust developers that do a "select *" really knows what they are doing so that extra columns, or a changed column ordering or something like that doesn't break their code. Also we don't allow end users to access datatables directly, if that was a requirement I would replicate the column in question so that the main database was still safe from evil locks and people that doesn't have a clue on what they are doing.

    CREATE TABLE MyOldTable (col1 VARCHAR(100), col2 VARCHAR(100))

    CREATE TABLE MyNewTable (col1 VARCHAR(100), col2 VARCHAR(100))

    No problem:

    INSERT INTO MyNewTable

    SELECT * FROM MyOldTable

    Now:

    ALTER TABLE MyOldTable ADD col3 VARCHAR(100)

    Will no longer work:

    INSERT INTO MyNewTable

    SELECT * FROM MyOldTable

    However, if the code had been written as:

    INSERT INTO MyNewTable

    (

    col1,

    col2

    )

    SELECT col1, col2

    FROM MyOldTable

    The code would continue to work even after MyOldTable was changed.

  • I might be fuelling the fire here, as I'm merely playing devil's advocate rather than agreeing wholeheartedly with the approach...

    But, for the people who say that the DBAs/analysts should be able to run SELECT * (for quick-fire checks or the data structure, etc), how about using roles/groups to decide whether this denial is applied? You could deny select on the dummy column for just the developer users and not the DBAs, so it would help to capture these issues during the testing process (maybe as part of an automated first-stage peer-review process).

    As a sysadmin/DBA in a small dev company who have only recently committed to drafting some sort of coding standards/peer-review system (based on some lengthy meetings regarding improving our quality), tips to 'assist' our developers in moving away from lazier processes are very welcome. As others have mentioned, peer-review takes time and can only reasonably be done by someone who also knows the codebase well enough (unless you reduce it to basic steps, such as '1, check for SELECT *; 2, ...', which just gets you a human equivalent of the same restrictive permission setting).

    Also, it is a particularly small-picture view to take the stance that developers who forget to follow the standard be relieved of their job -- you would lose a lot of prior client/application knowledge for comparitively little gain.

    So, how about a middle ground approach? I know it will be different for many people (and I'm not saying I advocate), but it is still quite a common model to have applications (particularly web apps) that use a single user account to access the database, which is part of the connection string -- be it a SQL user or Windows (domain)? (user|computer) account. In those cases how about setting up a script that adds the dummy column to every table and sets up the deny policy against the single user? You would run this during your promotion from dev to test and your test team (sh|w)ould pick up the exception as part of their process. The developers would be told exactly what to expect and what is required, to prevent them closing the bug report as 'unable to reproduce'; and it would give them a reasonable reminder each time so that they eventually move away from the muscle-memory use of SELECT *.

    TL;DR: how about using this approach on a test server only, as part of a rigorous test regime; enforcement is not wrong if you are trying to move people away from an approach, to a better one.

  • jimbobmcgee (2/11/2011)


    I might be fuelling the fire here, as I'm merely playing devil's advocate rather than agreeing wholeheartedly with the approach...

    But, for the people who say that the DBAs/analysts should be able to run SELECT * (for quick-fire checks or the data structure, etc), how about using roles/groups to decide whether this denial is applied? You could deny select on the dummy column for just the developer users and not the DBAs, so it would help to capture these issues during the testing process (maybe as part of an automated first-stage peer-review process).

    As a sysadmin/DBA in a small dev company who have only recently committed to drafting some sort of coding standards/peer-review system (based on some lengthy meetings regarding improving our quality), tips to 'assist' our developers in moving away from lazier processes are very welcome. As others have mentioned, peer-review takes time and can only reasonably be done by someone who also knows the codebase well enough (unless you reduce it to basic steps, such as '1, check for SELECT *; 2, ...', which just gets you a human equivalent of the same restrictive permission setting).

    Also, it is a particularly small-picture view to take the stance that developers who forget to follow the standard be relieved of their job -- you would lose a lot of prior client/application knowledge for comparitively little gain.

    So, how about a middle ground approach? I know it will be different for many people (and I'm not saying I advocate), but it is still quite a common model to have applications (particularly web apps) that use a single user account to access the database, which is part of the connection string -- be it a SQL user or Windows (domain)? (user|computer) account. In those cases how about setting up a script that adds the dummy column to every table and sets up the deny policy against the single user? You would run this during your promotion from dev to test and your test team (sh|w)ould pick up the exception as part of their process. The developers would be told exactly what to expect and what is required, to prevent them closing the bug report as 'unable to reproduce'; and it would give them a reasonable reminder each time so that they eventually move away from the muscle-memory use of SELECT *.

    TL;DR: how about using this approach on a test server only, as part of a rigorous test regime; enforcement is not wrong if you are trying to move people away from an approach, to a better one.

    DENY to roles is so much better. Great idea on implementing this approach for lower ennvironments only! Thank you!

    The vision must be followed by the venture. It is not enough to stare up the steps - we must step up the stairs. - Vance Havner
  • Oh lighten up. It's 2011. Ever hear of corporate standards? What is the big deal over select * anyhow? It's a tool like any other tool developers or dba's use. This isn't a philosophy class it's business. Things need to get done. Restricting features and usage is just bad.

Viewing 15 posts - 91 through 105 (of 140 total)

You must be logged in to reply to this topic. Login to reply