September 26, 2011 at 6:41 am
I posted this on a programming forum as well but I'm always interested in hearing the DBA side of things ...
This is a discussion myself and some of my colleagues are having and thought I'd come out here and see what if there's a general consensus on it.
It basically comes down to the following 2 opinions on database calls:
1. Make one large call to get everything that may be needed to reduce database the number of DB calls
2. Make smaller separate calls based on what is requested to reduce the size of DB calls
Where this is especially coming into play is in common code. We'll use the example of an Employee class as that's fairly straight forward.
Let's say that your Employee class has 10 value attributes (first name, last name, hiredate, etc.) and then 2 class attributes ... 1 pointing to a Department class and then 1 supervisor that points back to another Employee object.
In mindset #1, you'd make one call that returns the Employee data as well as the fields needed to populate the Department and Supervisor attributes ... or at least the fields that most often used from those sub objects.
In mindset #2, you'd only populate the Employee object at first and then only populate the Department and Supervisor objects if and when they are actually requested.
2's stance is pretty straight-forward ... minimize the size of the requests and how many database objects need to be hit each time one of those requests is made. #1's stance is that even if it could be implemented properly, the sheer fact that the code would have to make multiple connections is going to cause more strain on the connection between the webserver and the database as opposed to reducing it.
The driving force behind researching this is that the amount of traffic between our webserver and database server is getting out of control.
TIA
September 26, 2011 at 7:26 am
Let the clients determine what they need by passing a parameter to a single stored procedure to obtain their data structures. One of the parameters to the procedure specifies how much additional data is needed.
It is almost always more efficient to obtain as much data as you can in a single request, or batch. On the other hand, pulling back all possible information from multiple related tables (whether it is needed or not) is wasteful of server resources and can reduce concurrency in some designs.
The best solution is to design an interface procedure that allows the client to determine how much data is needed. The interface procedure can generally obtain the data requested in the most efficient manner and return it to the client.
The probability of survival is inversely proportional to the angle of arrival.
September 26, 2011 at 8:17 am
are you also doing some form of caching in the web server?
September 26, 2011 at 8:41 am
The traditional mechanism is to move only the data you need to move, when you need to move it.
I would not go for the "move everything, we might need it" approach. You'll be taking out more and longer locks on the system while you do this. That will lead to more contention and slower performance overall. That's just on the server. Then we have to talk about your network too. And, as you move very large data sets it means that your indexes are less and less useful so instead of getting seeks, you'll be looking at lots of scans, again, smacking the system around, leading to more contention on resources. And of course, as more waits occur, the queues get longer, which leads to more waits, more queues, more waits... you get the picture.
From a programming stand point, the "move it all" approach is easier to code. But it's not better for performance.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 27, 2011 at 5:14 am
Grant Fritchey (9/26/2011)
The traditional mechanism is to move only the data you need to move, when you need to move it.I would not go for the "move everything, we might need it" approach. You'll be taking out more and longer locks on the system while you do this. That will lead to more contention and slower performance overall. That's just on the server. Then we have to talk about your network too. And, as you move very large data sets it means that your indexes are less and less useful so instead of getting seeks, you'll be looking at lots of scans, again, smacking the system around, leading to more contention on resources. And of course, as more waits occur, the queues get longer, which leads to more waits, more queues, more waits... you get the picture.
From a programming stand point, the "move it all" approach is easier to code. But it's not better for performance.
I'm going to second this.
I'm in the middle of redesigning a process that worked on the "move everything, we might need it" approach. And it fails in several ways. Locks on primary tables in an OLTP system that end up causing blocking, table scans, long running queries, and a system on the other side that processes line by excruciating line - resulting in ASYNC_NETWORK_IO waits that take forever to clear.
I'm fortunate - I have a development team that is willing to work with me to change the way they interact with the system. The old methods were grabbing over 100,000 lines of data and locking about 10 tables when they were actually trying to get a handle on maybe 50 of those lines.
Feed the application what it needs, when it needs it, or you may find yourself in a situation that just doesn't scale well.
-Ki
September 27, 2011 at 11:31 am
Putts,
As Grant pointed out it's usually best to only return the columns actually needed by the front end. It reduces network IO between SQL Server and the application.
However, it's really a question of who needs what and when. How often is the request made? Once every 5 minutes? 100 times a second? Is there a fairly standard set of columns needed or are there many different types of requests from the app?
You have to play these against each other to come up with the optimum solution. You don't want to have 100 different stored procedures or views that return almost the same data to the app. Then again you don't want to have one monstrous procedure or view that returns everything to the app if such a request happens 10 times a second.
Todd Fifield
September 27, 2011 at 11:56 am
I too am going to second Grant's position. I just can't go along with the "we might need it" position, you need to be smart about moving data around. If the screen needs all the data or much of the same data every time then it probably makes sense to include it in a single call. You can always have a "main" sproc and several outliers to handle the edge cases. As another poster mentioned, how often the data is requested also has an effect on how you would want to do this. And as an additional thought, if you are requesting a LOT of data OFTEN, I would probably push back to the front-end devs and ask why we need all this data so often. Just because they want it that way doesn't mean it makes sense and it gives us an opportunity to teach good design practices.
CEWII
September 27, 2011 at 12:18 pm
It's also worth noting that with IIS connection pooling, the spin-up/spin-down time of opening a new connection to pull that extra data is almost nil.
September 27, 2011 at 12:21 pm
I remember one time we had a requirement that we had to return 100,000 rows in 3 seconds. I'm sitting there with my jaw on my chest asking why on earth we had to meet that requirement. The response was "That's what we do now." We beat ourselves up for six weeks trying to figure out how to do it, then finally we asked to see the current app. It returned about 10 rows and the number of remaining values, 100,000. "See, it returns 100,000 rows" Except, it didn't. It returned 10, and that's all they wanted. They wanted 10 rows and to know that the rest were there. We had it running in well under 1 second, a huge improvement over the old code, and they still thought they were returning 100,000 rows.
Just because people ask for "everything" doesn't mean that's what they want or what they need. You have to be smart about this stuff and build your code the same way.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 28, 2011 at 4:14 pm
Grant Fritchey (9/27/2011)
Just because people ask for "everything" doesn't mean that's what they want or what they need. You have to be smart about this stuff and build your code the same way.
Amen! Sometimes we have to educate users and managers as to what they really need and want.
Todd Fifield
September 28, 2011 at 5:43 pm
tfifield (9/28/2011)
Grant Fritchey (9/27/2011)
Just because people ask for "everything" doesn't mean that's what they want or what they need. You have to be smart about this stuff and build your code the same way.Amen! Sometimes we have to educate users and managers as to what they really need and want.
That is so true. I often have to ask (as I do here in the forums) what is it you are trying to accomplish, because often times they have a solution in mind already and are looking at how to make that solution work as opposed to whether that solution makes sense..
CEWII
September 29, 2011 at 8:31 am
While I agree completely with Grant that fetching data you may not need can add unwanted, and unneeded, overhead to to back-end systems, there's can be an efficiency in returning a multi-table data set over multiple fetches to return individual tables.
In many systems I've seen (and sometimes designed), the DAL classes work on a model that fetches all data at object instantiation. And when you have a class that has object dependencies (e.g. the Department and Supervisor, in your example) your model may end up generating several, serial data fetches as these objects are instantiated in turn.
In my experience, if this is the model you have, you may increase performance by gathering this data in a single fetch, rather than in serial round trips across the wire.
Again, this is only true IF you are going to fetch this data anyway as part of your object instantiation. As Grant, and others, rightly point out fetching data that is likely to go unused can tax your system.
(As a side note, I've just started work on a mobile app project and the lead developer on it, who has way more experience than I do, pushed back when I presented the "Let's not fetch data the user hasn't requested" model. Apparently, in the mobile world the overhead of each data connection is high enough that fewer connections is generally preferred, even if this leads to fetching more data than will be actually consumed, as long as the payloads aren't inordinately large).
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply