May 13, 2015 at 8:30 am
I just started working at a software company that has never had a DBA. The database that evolved over time has 1157 Tables with over 28000 columns. Some table are small but many have 200+ columns.
The Problem: All of the SQL is built dynamically in the application and the SQL in Most cases looks like 'SELECT t1.*, t2.* from ........'. Even when the application only uses a couple of the columns returned, the SQL does a SELECT *.
I have spoken to the lead developer and told him that it is a very bad practice and his response is always, 'Prove It'. So, I am looking for good ways to Prove that doing a SELECT * from a 200 column table when you only need 2 of the columns is a Very bad practice. I want to show things like performance, resource utilization, caching issues, and anything else that you bright professionals could turn me on to. Descriptions of how to do it are good, scripts that show it are even better.
Thanks in advance!
Mark.
May 13, 2015 at 8:45 am
Mark
It's not just performance - if columns are added to or removed from a table then your application may break.
But with regard to performance itself, get a copy of your database on to a test server, and compare the execution plans of SELECT * versus SELECT <column list>. Then SET STATISTICS IO ON and SET STATISTICS TIME ON and run the two queries, making sure you clear the data cache in between. Finally, compare the result sets. If they're sent across the network, that could mean many more megabytes trying to squeeze through your already overused pipes than is actually necessary.
John
May 13, 2015 at 10:07 am
For a point lookup (one row) against a clustered index, there really isn't a major performance hit.
But, some of the places where there are issues. First, pages in memory. By loading everything from a table, and then only consuming part of it, you're loading all that data into memory and then not using it. Covering indexes, will never work, so you'll always have a key lookup operation for nonclustered indexes. Minimum, that's three extra reads for each row. It seriously limits what you can do to help performance in places where there is poor performance.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 13, 2015 at 10:23 am
Ask him why he's ever want to try to parse a 200+ column X however many rows array to get the data he wants instead of just parsing a 5 X however many rows array.
Oh and then change the column order in a table and ask him to try to trouble shoot the problem when he's doing a SELECT *.
May 13, 2015 at 2:37 pm
Hahaha. That would truly be funny.
His explanation of why he issues SELECT *'s instead of SELECT <column names> is that he is lazy and does not like to type nor does he want to spend the time up front and determine which columns he is actually going to need. Doing it his way he says means that I will always have the columns I need.
It totally drives me insane.
May 13, 2015 at 2:54 pm
Maybe a real life scenario can help. Someone posted a problem that happens when using SELECT *.
http://www.sqlservercentral.com/Forums/Topic1685042-392-1.aspx
I'm sure that it's faster to code using * instead of the column names. The same goes for naming variables as a,b,c. It's a bad practice that will hit back at the worst moment. There are tools that can help to get column names faster, some are even included in SQL Server or SSMS (sp_help, Alt+F1, drag columns folder from object explorer, etc).
May 13, 2015 at 4:48 pm
You can apply what I'm going to show you to your data. If you want to prove that what you are recommending is faster then visual proof is often the best way to do that.
Let's say you had some data that looks like this (I'm including an example of how to show how to generate sample data):
-- a db we both have
-- Note, not the best data model - was using for a different example...
USE tempdb
GO
SET NOCOUNT ON;
/********************************************************************
SAMPLE DATA
********************************************************************/
-- not the greatest design, just putting something together real quick
IF OBJECT_ID('tempdb.dbo.sales') IS NOT NULL DROP TABLE dbo.sales;
IF OBJECT_ID('tempdb.dbo.offices') IS NOT NULL DROP TABLE dbo.offices;
GO
CREATE TABLE dbo.offices
(
OfficeID int primary key,
Country varchar(100) NOT NULL,
OfficeName varchar(100) NOT NULL
);
CREATE TABLE dbo.sales
(
SaleID int NOT NULL,
OfficeID int NOT NULL,
CustomerID int NOT NULL,
SaleDate date NOT NULL,
SaleTotal money NOT NULL
);
GO
INSERT dbo.offices
VALUES(1,'US', 'Midwest'),(2,'UK', 'London'),(3,'NZ', 'Aukland'),(4,'US', 'SW');
WITH iTally(N) AS
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns a, sys.all_columns b
)
INSERT dbo.sales
SELECT TOP(200000)
N, ABS(CHECKSUM(NewId()))%4+1, ABS(CHECKSUM(NewId()))%1000+1000,
DATEADD(day,(ABS(CHECKSUM(NewId()))%1000)*-1,getdate()),
(ABS(CHECKSUM(NewId()))%100000)*.01
FROM iTally;
ALTER TABLE dbo.sales
ADD CONSTRAINT pk_sales PRIMARY KEY(SaleID);
ALTER TABLE dbo.sales
ADD CONSTRAINT fk_offices FOREIGN KEY(OfficeID) REFERENCES dbo.offices(OfficeID);
CREATE INDEX nc_sales ON dbo.sales(OfficeID, SaleDate) INCLUDE (SaleTotal);
CREATE INDEX nc_office ON dbo.offices(OfficeName, OfficeID);
GO
You can copy/paste this onto a server you are working with, play around with it and then apply it to your own data. The query takes about 10 sec to run on my laptop.
Now let's say you needed the OfficeName, SaleDate and SaleTotal for 2014 through today. Here are two examples... The 1st that returns all columns and the second one that only returns what you need:
SELECT o.*, s.*
FROM dbo.sales s
JOIN dbo.offices o ON s.OfficeID = o.OfficeID
WHERE SaleDate >= '1/1/2014';
SELECT OfficeName, SaleDate, SaleTotal
FROM dbo.sales s
JOIN dbo.offices o ON s.OfficeID = o.OfficeID
WHERE SaleDate >= '1/1/2014';
First thing you can do is run this query in SSMS with "Include actual query plan" turned on...
Exhibit A:
At a quick glance you can see that the SELECT * query is twice as slow. Nonclustered Indexes are usually better than clustered ones. Index Seeks are better than index scans. You can see that the SELECT * query scans but clustered indexes whereas the other one scans a non-clustered index and performs a seek against the other no-clustered index. When you hover over the SELECT statement in each one you can see that the SELECT * query has double the subtree cost - that basically means twice the work.
Exhibit B:
Run the query (as previously mentioned) with STATISTICS IO and STATISTICS TIME ON (and "include query plan OFF) like so:
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
PRINT 'WITHOUT "SELECT *" :'
SELECT OfficeName, SaleDate, SaleTotal
FROM dbo.sales s
JOIN dbo.offices o ON s.OfficeID = o.OfficeID
WHERE SaleDate >= '1/1/2014';
PRINT CHAR(10)+'WITH "SELECT *"'
SELECT o.*, s.*
FROM dbo.sales s
JOIN dbo.offices o ON s.OfficeID = o.OfficeID
WHERE SaleDate >= '1/1/2014';
SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
Results:
WITHOUT "SELECT *" :
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'sales'. Scan count 4, logical reads 329, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'offices'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 219 ms, elapsed time = 5474 ms.
WITH "SELECT *"
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sales'. Scan count 1, logical reads 796, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'offices'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 718 ms, elapsed time = 7862 ms.
Ignore the elapsed time, the CPU time is what matters in this example. As you can see - the CPU had to work 3 times as hard to produce the results. SELECT * produced like double the reads (e.g. more IO).
This is how I sell my suggested changes and it usually works. Play with what I posted and apply it to your data.
-- Itzik Ben-Gan 2001
May 13, 2015 at 6:17 pm
Careful now... I wouldn't ignore the elapsed time. Frequently, it's a strong indication of disk activity even if stats I/O isn't turned on to see. You just have to make sure that the elapsed time isn't due mostly to the time it takes to display the return in SSMS.
Sometimes it's an indication that you're using the wrong tool to measure with.
http://www.sqlservercentral.com/articles/T-SQL/91724/
--Jeff Moden
Change is inevitable... Change for the better is not.
May 14, 2015 at 12:56 am
Jeff Moden (5/13/2015)
Careful now... I wouldn't ignore the elapsed time. Frequently, it's a strong indication of disk activity even if stats I/O isn't turned on to see. You just have to make sure that the elapsed time isn't due mostly to the time it takes to display the return in SSMS.Sometimes it's an indication that you're using the wrong tool to measure with.
I've actually stopped using STATS I/O and STATS TIME entirely. Too much observer effect[/url]. I keep extended events running most of the time now so I can pull this stuff out easily without it being skewed. But, I agree. I always look at both I/O and time. Neither measure is complete alone.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 14, 2015 at 2:24 am
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
May 14, 2015 at 9:28 pm
I Really appreciate everyone who contributed information. I must also apologize for neglecting to let everyone know the other issue that brings me such joy in this new position: Almost all of the queries that contain the SELECT * from one or more tables is contained inside of a Cursor. In fact, every screen within the system operates using Cursor Processing. An example of this is the Shipment Movement screen. This screen has a single record which contains the header information about a given shipment and 1 - N detail rows containing delivery stop information. The details are displayed in a grid which allows any item to be updated and stops to be added and/or deleted. Once the user makes any needed changes and submits the delivery, a cursor processes all of the detail rows.
When I asked why the software was written using such antiquated patterns, They just said 'Thgat is the way we have always done it and we see no reason to change.
Aint' Life Grand!
May 15, 2015 at 4:10 am
Vyper426 (5/14/2015)
I Really appreciate everyone who contributed information. I must also apologize for neglecting to let everyone know the other issue that brings me such joy in this new position: Almost all of the queries that contain the SELECT * from one or more tables is contained inside of a Cursor. In fact, every screen within the system operates using Cursor Processing. An example of this is the Shipment Movement screen. This screen has a single record which contains the header information about a given shipment and 1 - N detail rows containing delivery stop information. The details are displayed in a grid which allows any item to be updated and stops to be added and/or deleted. Once the user makes any needed changes and submits the delivery, a cursor processes all of the detail rows.When I asked why the software was written using such antiquated patterns, They just said 'Thgat is the way we have always done it and we see no reason to change.
Aint' Life Grand!
That's extremely problematic.
However, if they're not experiencing pain, I guess it's ok. However, if they are experiencing pain, the worst possible answer on earth is "That's the way we have always done it." Actually, any time you ask "Why" and someone gives you that answer, it's a bad answer.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 15, 2015 at 4:29 am
Grant Fritchey (5/15/2015)
Vyper426 (5/14/2015)
I Really appreciate everyone who contributed information. I must also apologize for neglecting to let everyone know the other issue that brings me such joy in this new position: Almost all of the queries that contain the SELECT * from one or more tables is contained inside of a Cursor. In fact, every screen within the system operates using Cursor Processing. An example of this is the Shipment Movement screen. This screen has a single record which contains the header information about a given shipment and 1 - N detail rows containing delivery stop information. The details are displayed in a grid which allows any item to be updated and stops to be added and/or deleted. Once the user makes any needed changes and submits the delivery, a cursor processes all of the detail rows.When I asked why the software was written using such antiquated patterns, They just said 'Thgat is the way we have always done it and we see no reason to change.
Aint' Life Grand!
That's extremely problematic.
However, if they're not experiencing pain, I guess it's ok. However, if they are experiencing pain, the worst possible answer on earth is "That's the way we have always done it." Actually, any time you ask "Why" and someone gives you that answer, it's a bad answer.
Or perhaps it's a bad question. A different way of asking the same thing:
"Would you like to switch to using SQL Server coding best practices and
a) Improve performance noticeably?
b) Reduce network bandwidth requirements significantly?
c) Reduce CPU utilisation considerably?
d) Improve developer morale and end-user satisfaction dramatically?"
I could go on, but I'm running out of adverbs.
Then again, I suppose the answer could be: "No, we like it this way because that's the way it's always been." :w00t:
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
May 15, 2015 at 5:22 am
A SELECT * in a cursor...So that means if a column is added to a table the system uses, the interface breaks at every point where it reads from that table immediately on doing the FETCH. The cursor fetches into variables, which won't match the column list returned by *, so it breaks.
Well, I guess that's one round of ammunition. It sounds like you have your work cut out for you and I don't envy you in the cultural battle you have to engage in.
Let me guess:
1. Is the application is connecting with the 'sa' account too?
2. Did they solve their slow query problem by using NOLOCK on every table to speed things up?
3. They don't have any foreign keys defined because they just get in the way of data being inserted?
4. They have no indexes because they got too fragmented years ago, so the natural answer was to drop them. Besides, the SELECT * wasn't using them for some strange reason.
5. There's no reason to worry about SQL injection because they've never seen it happen
I honestly hope I'm wrong on all these guesses.
May 15, 2015 at 8:53 pm
There's absolutely no sense in arguing with people that obviously know it all. The "developer" that retorted "Prove it" is absolutely spot on whether I agree with his use of SELECT * or not. Either prove it with a code comparison or sit down. Diametrically opposed "Best Practices" can be bantered about between warring parties all bloody day and nothing is resolved except you both now know how far the vein on each other's forehead sticks out and what the maximum intensity of their dermal flush is.
The thing is, you're more knowledgeable about the thing you're speaking of than your opponent. You should be able to easily prove such problems even with the simplicity of right clicking on the instance name in the Object Explorer, selecting {reports}, and then following your nose into the performance related reports.
Arguing about "SELECT *" is a total waste of time on both your parts. You need evidence and then you need an easily demonstrable comparison of performance between your opponent's code and the your code. And both MUST be IN SITU or your well-convinced opponent will either call foul or chide you for changing the environment in your favor.
In this case, it does NOT depend. PROVE IT! 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 60 total)
You must be logged in to reply to this topic. Login to reply