The SQL * Wildcard

  • My vote is for the unexpected consequences of having column order changes break client applications.

    Given that you can very simply insert a column at any position in a table in design view in both SSMS and Visual Studio the likelihood of a developer/user with database access simply inserting a new column (e.g. need a third address line on the customer address table...) is pretty high.

    Not too long ago it was considered best practice in the developer community to reference columns by ordinal rather than by column name...

    It's not at all uncommon to see code out there that goes something like:

    Response.Write("Customer ID: " + myRow[0].ToString());

    Response.Write("Last Name: " + myRow[1].ToString());

    ad infinitum.

    Joe

  • Actually AFAIK it is STILL the best practice to reference columns as you state - at least in ADOc. There is a demonstrable gain in efficiency in doing so.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Wow. Profiler recorded nothing.

    I ran both queries twice on a 69 column wide, 7,538,289 row long table. Select Distinct * and Select Distinct (columnlist). I did the DBCC DropCleanBuffers around all queries. I reset Profiler with different events once the first one failed to record anything. Even tried a Showplan ALL on one and a ShowPlan Text on another.

    Nothing on Profiler whatsoever. But here are the time trial results for both.

    Select Distinct * - 4 hours 19 minutes and 34 seconds

    Select Distinct (columnlist) = 4 hours 18 minutes and 10 seconds.

    Mind you, this was run Off-Production, so the box might have hardware limitations that caused this query to take so long. Doesn't seem to matter though. What time was saved by specifying the column list wasn't a whole lot. 1 minute... My end users STILL would have been screaming about query performance if this had been officially requested. @=)

    Moral of the story... Use fewer columns and WHERE clauses when possible. @=)

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • I'd have to agree there with Brandie. I tried this on two "wide" (180 columns) tables with NO records:

    declare @i int

    select @i =0

    while @i<1000000

    begin

    insert tableA

    select * from tableB

    select @i=@i+1

    DBCC FREEPROCCACHE

    END

    I did the insert only so that there would be no return component. I also tried putting the insert into a separate SP (with the ensuing FREEPROCCACHE) just to try to make sure it's not reusing anything.

    Comparing this vs the same with specifying the column names, there's no duration, execution plan, or IO statistics difference of any significance (+/- a few ms). So - if there IS another hidden query in there - I can't see its effects.... I realize the tools tend to hide purely internal operations, but if there is one - it's REALLY hidden.

    So there's another piece of info I can safely flush out of the old pea-shooter.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Interesting note. In the Microsoft Self Paced Training kit for 70-441, the book says

    However, you cannot optimize all types of queries. For example, a query such as the following cannot be optimized at all.

    Select * from table

    Page 101, chapter 4 lesson 1.

    Yet, my tests show that optimization didn't seem to help me much, if indeed my Select columnlist query was being optimized by the engine.

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • Hi All,

    Im sorry my cable went down so I wasnt able to participate further in this discussion Till today.

    So I talked to my manager about what his solution to the question he posed was after giving my thoughts.

    He basically was going after the "more overhead caused by using the * wildcard". Which is understandable considering this is a MSSales DB with ALOT of users hitting it 24/7 and anything you can do to reduce the work on the servers is not a bad thing.

    The fact of the Select * also breaking any SP that is being run if future schema changes happen is also a factor. Which if I can add seems to be rather commonplace in my group, the tables on the DB's im working with seem to change very often and actually I ran across this problem in the first week of working there, where a user was not getting the results they expected due to a schema change.

    Thanks for all the responses, hope this wasnt too anticlimactic.

    Paul

  • Sounds like you got the job then. @=)

    Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

  • yeah im just Dash trash ATM(contractor), its very hard to break into actual DBA work so right now im a analyst there, but I think the experience will go well towards a DBA career.

  • 1st thank Paul for the update on his particulars, alway useful to hear back from the original poster.

    It looks like it's safe to say that in the end there are many arguments against using "select *" vs. column list all of which are valid - in short "select *" is the devil's work.

    Joe

  • I am not fully satisfied with the answers we have come up with here (not that anyone should really care about that). What about something more substantial like what is found on Microsoft's site (which was in a link to an article posted by someone else):

    Using a SELECT * query not only returns unnecessary data, but it also can force clustered index scans for the query plan, regardless of the WHERE clause restrictions. This happens because the cost of going back to the clustered index to return the remaining data from the row after using a non-clustered index to limit the result set is actually more resource-intensive than scanning the clustered index.

    I guess, if possible, I'd like to have a solid reason for not using select *, that I really understand, to give to our developers. I do not foresee the table modification causing an issue with select * statements in our environment. I don't fully understand what Microsoft said either, particularly since their supporting execution plans did not look to match what they were trying to explain. Here is a link to that article again:

    http://msdn2.microsoft.com/en-us/library/ms998577.aspx

    I am hoping someone else will still be curious too and have the information I am seeking. Otherwise, maybe we are okay here using select * statements at times...

    Thanks...

  • I'm a developer. I think they would understand if you told them to just do it for readability. With a decent editor you don't have to type the column names anyway. You can script them.

    Tom Garth

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • I think I may need more than that - they are pretty tough on us here!

    Example - I opened a call with Microsoft about deadlocking (app. vs. replication) in October. End result was telling the developers that adding a nolock statement (in a part of the stored proc not harmful to the app at all) would resolve the issue. They didn't believe me. 4 months later the deadlocks began happening several times per minute. They put in the nolock hint and there hasn't been one deadlock since. They are also using a select * statement in this same stored procedure and I have been trying to come up with justification as to why that should be changed too, but have yet to come across any argument strong to actually make the recommendation. This is the story behind why I am so curious about this topic (and probably too long of a story at that)...

  • That is truly unfortunate. It seems as though that department's goals are too narrowly focused. That's usually a top down issue, and you probably don't have much control over it.

    I wish you well!

    Tom

    Tom Garth
    Vertical Solutions[/url]

    "There are three kinds of men. The one that learns by reading. The few who learn by observation. The rest of them have to pee on the electric fence for themselves." -- Will Rogers
  • I always thought (nolock) was a default setting for select statements, but Everyone and I mean "Everyone" writing a select statement at MS uses no lock. Not that this has much to do withthis conversation other than a observation on a previous statement.

  • NOLOCK is most certainly NOT the default for SELECT statements. You must explicitly state it or be running under READ UNCOMMITTED ISOLATION to get both the benefits AND drawbacks of it. And as was discovered a while back the drawback is much worse than just not receiving uncommitted transaction modifications. Search the web for Itzik allocation order scan to see what can happen.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 15 posts - 31 through 45 (of 60 total)

You must be logged in to reply to this topic. Login to reply