Preventing usage of "SELECT *..."

  • Another tool along these lines would be to expand select statement itself to allow for an exclusion set of columns. For example:

    select all except (columnX, columnY, columnZ) from Table.....

    That would still suffer some of the issues of Select *, though, in particular that you'd be getting additional columns your application isn't expecting returned if someone adds a column to the database. In fact, the only issue it would fix is the one where you're already returning way more data than your application needs.

  • sorte.orm (11/5/2009)


    Exactly why is "select * " not a good idea? I searched for articles with that tag but came up empty? I can't really see any reason not to do so, as long as you prefix with tablenames/aliases. And by the term "users" does the article author mean developers or end users ?

    we have a weblogic app that does a select * from a table with a where condition. few years ago my predecessor set up replication on this table and instead of making the basic transactional replication he set it up allowing updatable subscriptions. this created a GUID column and crashed the weblogic application which is critical to our call center and other CSR functions. Reason is the code wasn't set up to handle the new column. Normally any new schema changes go through QA where they are tested with all our applications.

    the fix was to create a new blank table, copy the data, delete the original and then rename the new table. I think there were also FK constraints involved and it took close to a day to do this. and meanwhile customers were told we can't help you right now.

    It's also may be a nice way to battle evil MS Access. we used to have people go into production data and lock it all the time. of course most times we were busy with something else and didn't see the email alerts we set up for 15-20 minutes or until the managers from the call center called other managers. imagine you have a branch office with 500 people and 6 T1's. someone uses Access to grab all the data from a 2 million row table that is the most important table in the business. they do this to find one row they need to change. meanwhile a CSR using the CRM app can't change something via the app or pull the data while they are on the phone with a customer.

    and i'm always killing Access connections at night because someone leaves it running, goes home and it starts blocking alter index or replication. i don't really care except that 300 blocking alerts during the night kills my iphone battery

  • The problem with this approach is that the user can no longer do COUNT(*) as was indicated.

    COUNTing a column, especially one that allows NULLs (perhaps a unique non-null primary key is not defined), will not return the correct row count if the column has NULL values in it.

    SELECT COUNT(*) <> SELECT COUNT([Column]) unless [Column] does not contain NULL values.

    This is important to note if you are forcibly preventing users from being able to do SELECT * / COUNT(*) / etc...

  • jts_2003 (11/5/2009)


    Certainly an interesting idea, but as some others have said I woul dhope the DBA would have the authority to vet any code gpiong into the live environment.

    Often times it has less to do with authority and more to do with available time. I can't review every line of code, I don't have time.. But I do try and train my users to not do this..

    CEWII

  • I would imagine that most organizations have folks doing purely business research/analytical work on their data in their databases. While I agree that select * ... isn't appropriate for production code, it often is expedient for research. Production code one assumes gets vetted. This solution seems to solve the wrong problem. The real question should be not "how do we tie our own hands", but rather "how do we enforce code review".

  • Peter_Beery (11/5/2009)


    I would imagine that most organizations have folks doing purely business research/analytical work on their data in their databases. While I agree that select * ... isn't appropriate for production code, it often is expedient for research. Production code one assumes gets vetted. This solution seems to solve the wrong problem. The real question should be not "how do we tie our own hands", but rather "how do we enforce code review".

    I agree 100% with this. When I'm building a query as a proof of concept, I generally type SELECT * just for speed purposes.

    However, with that said, the approach documented in this article could be applied to a single user, perhaps the user account used to run production queries, while leaving the rest of the user base alone.

  • Peter_Beery (11/5/2009)


    This solution seems to solve the wrong problem. The real question should be not "how do we tie our own hands", but rather "how do we enforce code review".

    Pete,

    I could not have said it better myself. Again, as I said before, part of the KISS approach is finding the appropriate and practical solution for the problem at hand, not just solving it with "neat or cute" gimmicks. Also, educating your developer and BI staff on best practices helps a lot too...

    "Technology is a weird thing. It brings you great gifts with one hand, and it stabs you in the back with the other. ...:-D"

  • I asked why "select *" is such a bad thing, and as far as I can tell there has not been any single answer that was good imho. Data transfer amount was mentioned, but you can't make stupid developers smart by applying constraints like preventing "select *". A developer can still select all columns manually, there's no way to tell if he selects more data than he needs. You just have to rely on developers not being stupid, and if you find stupid developers you have to educate them. Indexes not being used correctly can't really be the case as far as I can tell? The where clause determines which index is used, not the data selected?! If this is not the case, then I would certainly appreciate a link to educate my ignorance on this subject.

    Most sql code is written by hand in a sql console, then tuned and finally implemented in code. That normally starts with select * something, then it is modified to it's final version including just the columns that's needed. We have a lot of select * in our production code because we have created a module that does automatic databinding based on control name. However just one row is selected at a time, so I can hardly see speed or excessive data transfer being valid arguments. Atleast not when you take into account that to implement a new field in a edit screen the only work required is adding the db column, and placing a textbox with the same name as the db column in the interface. No code required.

    The case of removing columns can easily break code, but that breakes code regardless of the deveopers using select * or select column-names.

    This solution was quite cunning imo, but it adresses a problem that's not really a problem by creating a much bigger problem (not beeing able to do select *). However I'm sure there's cases where this hack could prove helpful.

    EDIT:

    @sql Noob - your post was relevant I think, but I trust developers that do a "select *" really knows what they are doing so that extra columns, or a changed column ordering or something like that doesn't break their code. Also we don't allow end users to access datatables directly, if that was a requirement I would replicate the column in question so that the main database was still safe from evil locks and people that doesn't have a clue on what they are doing.

  • A very interesting article with a solution I had not considered.

    Of course, to echo so many others I do not think it is practical. I am quite comfortable with my developers doing ad hoc "select top 100 * ...." type queries. It is one of they best ways to get an intuitive feeling for how the data is structured and exactly what is really in a table if they have forgotten.

    In actual procedures/programs/permenant code it is of course a very bad practice, but standards enforced by code reviews are probably a much better way to enforce that. Also, if all source code is consolodated it can be relatively easily searched for any select * type code if you really want to make sure code reviews did not miss any.

    ---
    Timothy A Wiseman
    SQL Blog: http://timothyawiseman.wordpress.com/

  • OK, but what about Exists clauses? We need to use select * from in at least some of those cases (yes I know about null and 1 and whatever, but some SQL versions don't even support that). It's not always practical to block these select statements and not always to the developers or DBA's advantage.

    I liken this to the ban on Goto's from the past. Better development tools have practically eliminated the Goto, and better SQL practice can eliminate select * when it should not be used.

    As a developer, when the dba's and db engine designers can give me a better tool, I'll use it. I can count on one hand the number of times I've used a select * in a production program, but there are thousands of daily ad hoc queries written that require select * just to figure out what's in an unknown table.

    I'd put the emphasis on training rather than making blanket rules and database changes that will be hard to enforce.

  • sorte.orm (11/5/2009)


    ...

    The case of removing columns can easily break code, but that breakes code regardless of the deveopers using select * or select column-names.

    ...

    Actually, depending on what's happening with the data selected, if you have "select *" in your code and a column gets deleted, it might not break code. Or, you might have a case of a column getting deleted and another column getting added. This keeps the same number of columns, so doesn't break the "select *" in the usual way, but depending on what's done with the data, it might break a whole lot of things if the column that replaces the deleted column is a totally different data type, like if the deleted column was BINARY and the replacing column is FLOAT.

    If "Select *" works for you, then go for it. "Select *" does have its valid uses (in IF EXISTS statements for one). But when it gets overused in production codes, that's when it moves over from valid to lazy and it can lead to a lot of code rewriting depending on the table modifications. If you have a column that gets deleted and you have to take that column out of your code, it's a lot easier to do a search for your column name than to have to look for the table, then look through that code to see if the deleted column affects it and then to find the next spot in your code with that table.

    Yes, it can be easier and faster to code, but I would be willing to bet that it makes code modification and debugging take longer.

    -- Kit

  • Here is one reason why I think select * can be bad.

    Lets say that you have two Tables as below which can be joined on ID = ID2

    create table Table1 (ID int not null)

    go

    create table Table2 (ID2 int not null, Name varchar(10) not null)

    go

    and in your code you do the following

    select * from Table1 t join Table2 s on t.ID = s.ID2

    a few days later, another developer decides to add a Name column to Table1 which isn't required by your code

    alter table Table1 add Name varchar(10) not null

    go

    Now your code has to deal with the fact that two name columns are returned

    select * from Table1 t join Table2 s on t.ID = s.ID2

  • djnewman (11/5/2009)


    OK, but what about Exists clauses? We need to use select * from in at least some of those cases (yes I know about null and 1 and whatever, but some SQL versions don't even support that).

    What SQL Server version does not support EXISTS (SELECT NULL/1 FROM TABLE...)?

  • David Betteridge (11/5/2009)


    Here is one reason why I think select * can be bad.

    Lets say that you have two Tables as below which can be joined on ID = ID2

    create table Table1 (ID int not null)

    go

    create table Table2 (ID2 int not null, Name varchar(10) not null)

    go

    and in your code you do the following

    select * from Table1 t join Table2 s on t.ID = s.ID2

    a few days later, another developer decides to add a Name column to Table1 which isn't required by your code

    alter table Table1 add Name varchar(10) not null

    go

    Now your code has to deal with the fact that two name columns are returned

    select * from Table1 t join Table2 s on t.ID = s.ID2

    isn't that going to be caught in testing?

    when we were getting ready for sql 2005 we found our developers had written code that looks like

    select column1, column1 order by column1 or something similar. worked perfect in sql 2000 but 2005 threw errors and they even complained that they had to change their code

  • SQL Noob (11/5/2009)

    isn't that going to be caught in testing?

    when we were getting ready for sql 2005 we found our developers had written code that looks like

    select column1, column1 order by column1 or something similar. worked perfect in sql 2000 but 2005 threw errors and they even complained that they had to change their code

    The point is that it works in testing, but after deployment someone adds another column of the same name as an existing column to a joined table.

Viewing 15 posts - 31 through 45 (of 140 total)

You must be logged in to reply to this topic. Login to reply