A Check and Foreign Key Constraint Improves Query Performance

  • Nice article. Apparently in the editing of the article content, some minor inconsistencies were left in toward the end. In the paragraph that begins "Let's make it a bit more interesting," you refer to the members table which should be the SalesOrderdetail table, and your reference to TeamID column should actually be CustomerID column.

  • I missed this somehow when it first came out - or at least I don't recall it. It's a good and interesting article, but I think it would have been better if it had used queries against a join view for the example - it would have reduced the number of silly comments I suspect.

    Tom

  • Evil Kraig F (4/26/2013)


    Grant Fritchey (4/26/2013)


    TheGreenShepherd (4/26/2013)


    Don't mean to pile on here, but yeah. I fail to understand why someone would query this data in this fashion. Yeah, it improves performance for that query but it's unlikely that you'd ever actually write that query.

    Here's a blog post I wrote up[/url] that gives a different example, probably more realistic, to show how foreign keys help. They really do.

    Grant, unfortunately, I'm of the same mind as the above. I checked out your blog post and really, there's not a lot of difference. If you don't have RI in place, you have to go to those tables to make sure you have data limiters, I agree. Once you have RI in place everything you did was off the 'p' table, including the where clause. He did a where exists, you did a join, but the limitation logic is the same.

    FK's don't help where it matters, which is getting data from the connected tables. Well, they don't help anymore than a tuned index does. All trusted FKs allow for is a shortcircuit on join logic used as restrictors that you don't even need to include in the query if you have FKs in the first place.

    Well, that's what they're there for, so I'm okay with that. But to me, this isn't really 'optimization'. This is using a tool, similar to indexes, that will slow some things down and speed up others, and can be overused. If you're always associating two tables on the same field to restrict rows, using an FK to enforce it and being able to avoid the churn is probably a good idea for that restriction pattern. If you're connecting up for data, you might as well just use an index and truck on without the constant overhead of the FK.

    True. If you don't need those joins, you're better off writing the query without them. But, if we are getting down to the point where people have properly built queries, then tuning and optimization changes. You don't need things like the simplification optimizations. But, WAY too many people are not writing their queries correctly and those optimizations help them. I don't recommend this instead of tuning queries.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • accidental post - deleted

    Tom

  • Evil Kraig F (4/26/2013)


    Grant Fritchey (4/26/2013)


    TheGreenShepherd (4/26/2013)


    Don't mean to pile on here, but yeah. I fail to understand why someone would query this data in this fashion. Yeah, it improves performance for that query but it's unlikely that you'd ever actually write that query.

    Here's a blog post I wrote up[/url] that gives a different example, probably more realistic, to show how foreign keys help. They really do.

    Grant, unfortunately, I'm of the same mind as the above. I checked out your blog post and really, there's not a lot of difference. If you don't have RI in place, you have to go to those tables to make sure you have data limiters, I agree. Once you have RI in place everything you did was off the 'p' table, including the where clause. He did a where exists, you did a join, but the limitation logic is the same.

    Craig, I think you are seeing things froma very narrow perspective: a perspective where the people who write queries (a) write them against base tables, (b) know all the base table schemata. This is very different from the situation where application developers write queries against views and are not permitted to know the base schemata. In the latter situation, the optimiser is presented with a query whose from clause includes view expansions, which may be joining tables that it can eliminate because the particular query doesn't use anything from those tables.

    Yes, it's very nice from the point of view of a DBA to be able to design all the queries himself - no app devs write queries, there is no facility which provides for ad hoc queries, and only queries written by the DBA team are ever run. But in the real world it isn't always like that. I'm a great believer in proper modularity and levels of abstractrion, and I don't believe an app developer should know the base schemata (mostly because if he does he'll write stuff that depends on them, so that further evolution of the base schemata becomes impossible - and this includes the case where what was supposed to be an ad hoc query is now the most important thing used by the marketing department or by the publicity department or by some one else with a lot of muscle); and I prefer to bury the schemata behind a stored procedure interface, rather than a view interface, in the hope of preclusing ad-hoc queries produced by all sorts of random people; but I can't guarantee to achieve that, and may have to provide some views that expose the data but not any teh structure of any part of any schema for which I have no real guarantee that the structure will not need to change in a few years time to enable us to accept new requirements, cope with new regulations, and enforce new business rules. Given that the requirement changes, the base schemata will need to change, so even if I start out exposing a view per base table (usually a fairly stupid idea, but not always) those views might rapidly cease to be in one to one correspondance with base tables.

    Given these real world circumstances, where queries are written by people who don't know the underlying schemmata. proper declarative referential integrity enforcement gives the optimiser a great assist in getting the best possible performance out of the system for us.

    Tom

  • Grant Fritchey (4/29/2013)


    True. If you don't need those joins, you're better off writing the query without them. But, if we are getting down to the point where people have properly built queries, then tuning and optimization changes. You don't need things like the simplification optimizations. But, WAY too many people are not writing their queries correctly and those optimizations help them. I don't recommend this instead of tuning queries.

    Okay, I can follow and agree with that logic. I guess feeding someone with the firehose instead of step by step MIGHT be construed as a little overboard. 🙂

    @tom (Didn't quote for length):

    Well, you're quite correct there, it's a narrow perspective. However, someone who's writing against views and can't see base schema also won't be able to adjust the FKs or see the execution plan, so this article isn't really written towards them in my mind. It's written for the person trying to cater to them, which means they're already pretty deep in the optimizer to care about this topic at all. However, you are showing a real world example of where this makes an actual difference, as the alternate would be to write umpteen different views depending on exactly what data the developer was trying to access.

    At the same time, I rarely take this approach for app devs to access my data because of all the intrinsic concerns that views bring into play, one of which is the one you mention. Another is data width, one of my biggest concerns. Additionally, it's against most best practices and depending on your security concerns, straight up illegal in production. The fact that I have usually dealt with medical and/or securities and retirement data through my career may have colored my opinion in this regard.

    You're right, however, that ad-hoc queries written against the base schema or the views are difficult to monitor at best. By their nature, however, ad-hocs are supposed to be one-offs. Your business should understand that. If you're ad-hoc'ing the same query at the end of every month, it's no longer ad-hoc. It's requirements you haven't given to your tech department.

    Trying to keep views the same constantly is a nice to have to me, but I have no issue with ripping a view to pieces and rebuilding it if I need to, as a view is merely a subquery container for most usage and should be able to be traced in sys.sql_modules. If your app coders are blazing ahead of your sql resources, then you've unbalanced your development team. You've over-hired on app guys (or maybe you've just got a couple of superstars). It's time to get them more support... even if you have to let go of an app guy to get your ratio(s) right.

    So...

    Given these real world circumstances, where queries are written by people who don't know the underlying schemmata. proper declarative referential integrity enforcement gives the optimiser a great assist in getting the best possible performance out of the system for us.

    You're using a shop where inexperienced coders are relying on workarounds for ad-hoc queries as your example of when to use this optimization. You're right, they exist, and when I get there I try to fix that problem. I understand your point, and you're correct within that venue. That the venue exists is a whole different issue I probably shouldn't be derailing the topic with, so I won't go too deep unless the conversation hangs a left turn there.

    I just feel like we're addressing the wrong concern with this. I do appreciate the very detailed explanation however of where something like this could actually come to use... as a stopgap until they fix things properly.

    I've edited this twice now and I think it finally reads correctly. I'm honestly not trying to come off as an ***, but I just can't see people at the level of skill you're discussing having a clue this could help them, nor them looking for it in the first place. Anyone with enough skill to know and/or remember the intricacies of the optimizer at this level and requires to use this for your use-case is either insanely overwhelmed with workload or is in the first steps of cleanup when they arrive at a new site.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (4/29/2013)


    At the same time, I rarely take this approach for app devs to access my data because of all the intrinsic concerns that views bring into play, one of which is the one you mention. Another is data width, one of my biggest concerns. Additionally, it's against most best practices and depending on your security concerns, straight up illegal in production. The fact that I have usually dealt with medical and/or securities and retirement data through my career may have colored my opinion in this regard.

    I too have concerns about using views, and have a very strong preference for providing a procedural interface to the apps - not because I think the app developers will write bad SQL (at least it won't be any worse than what most DBAs write, in my experience) but because they provide a functionally less constrained interface, hence poorer modularity. But of course views provide a functionally more constrained interface than does direct access to base tables, because that direct access allows the app to be dependent on the base schemata, ie there is no module boundary at all between the database and the app. Data width is no more a concern with views than it is with base tables - and possible less of a concern because if a sloppy app developer writes select * against a view he gets only the columns which are in the select list of the view, whereas if he writes select * against a join between base tables he gets all columns in those tables. Security is potentially better with views than with direct access to base tables, so direct access to base tables is more often illegal in production that access through views.

    Trying to keep views the same constantly is a nice to have to me, but I have no issue with ripping a view to pieces and rebuilding it if I need to, as a view is merely a subquery container for most usage and should be able to be traced in sys.sql_modules. If your app coders are blazing ahead of your sql resources, then you've unbalanced your development team. You've over-hired on app guys (or maybe you've just got a couple of superstars). It's time to get them more support... even if you have to let go of an app guy to get your ratio(s) right.

    I think we are talking about different things here: "keeping views the same constantly" doesn't have any clear meaning to me. What I want is freedom to change the base schema without changing the apps - so I want to present the same view to the app when the base schema changes, and have no objection to having to rewrite the view in order to handle the new base schema. I also want freedom to change the apps view of the data without changing the base schema - for example I may want to allow the app to see historical data which it couldn't see before, which might mean either changing a view to add some new columns or adding a new view. Neither of these things strike me as "keeping views the same constantly". And I also want to be able to add columns to a view and have the old version of the app continue to work without change, so that most of the time when an upgrade requires both schema changes and app changes I can upgrade the base schema before I upgrade the app instead of having to synchronize the upgrades - this is often useful in ensuring that you only have to support one version of the base schema even if your customers are using 4 different releases of the app. These are all the classical reasons for wanting strong modular boundaries and concealing the internals of one component from other components so that developers can't accidentally create unwanted dependencies between modules and thus place unwanted constraints the way the business can develop and evolve.

    So...

    Given these real world circumstances, where queries are written by people who don't know the underlying schemmata. proper declarative referential integrity enforcement gives the optimiser a great assist in getting the best possible performance out of the system for us.

    You're using a shop where inexperienced coders are relying on workarounds for ad-hoc queries as your example of when to use this optimization.

    My main example is a system which has extremely intelligent, competent, and experienced people working to a set of rules which rigidly enforces best practise modularity. It is of course useful that this defends you from the idiot in marketing or PR who turns an ad-hoc query into the foundation of his whole function, but that's not the important property - support of narrow modular boundaries is. Without those narrow modular boundaries, you will very quickly have either a product that is so slow to evolve that your competitors eat up all your business or an product so complex and unwieldly and riddled with kludges that it costs the earth to support it.[/quote]

    I just feel like we're addressing the wrong concern with this. I do appreciate the very detailed explanation however of where something like this could actually come to use... as a stopgap until they fix things properly.

    I've edited this twice now and I think it finally reads correctly. I'm honestly not trying to come off as an ***, but I just can't see people at the level of skill you're discussing having a clue this could help them, nor them looking for it in the first place. Anyone with enough skill to know and/or remember the intricacies of the optimizer at this level and requires to use this for your use-case is either insanely overwhelmed with workload or is in the first steps of cleanup when they arrive at a new site.[/quote]

    I think it's pretty clear that when you wrote that you hadn't understood what I was talking about. It seems equally clear that I haven't a clue what you are getting at.

    It's clear that we are not both addressing the same concerns. I think that's because of different experience.

    I'm coming from a product development view, where the schemata, middleware, apps, and front ends all went together and were all sold to lots of customers who demanded new features, 24X7 availability, 24/7 support, minimal unscheduled downtime risk (we couldn't guarantee the system would continue to run if vaporised by a thermonuclear device, so we didn't claim zero unscheduled downtime risk), regular upgrades, conformance with data protection regulations, very user-friendly and enjoyable end-user interface that would conform with all the latest fads and fashions, all of which meant rapid change which in turn meant very thoroughly modular design and construction was essential. You seem to be coming from a situation where you have at any time a single user community, with a well understood requirement and rare and slow change, so rare and slow that complex interdependencies between app layer and data interface layer and base schema are perfectly acceptable and modularity doesn't much matter.

    So we can either continue to try to understand each other - which may be difficult, given the different backgrounds - or we can just accept that we have done different things and therefor have needed to take different approaches to doing them.

    edit: spelling and quote tags

    Tom

  • Last thing first. 🙂

    L' Eomot Inversé (5/1/2013)


    So we can either continue to try to understand each other - which may be difficult, given the different backgrounds - or we can just accept that we have done different things and therefor have needed to take different approaches to doing them.

    Without different viewpoints one can never learn. I'd prefer to try to continue the discussion to at least be sure we understand each other, even if we may disagree at the end. I will, however, strive for more clarity.

    I too have concerns about using views, and have a very strong preference for providing a procedural interface to the apps - not because I think the app developers will write bad SQL (at least it won't be any worse than what most DBAs write, in my experience) but because they provide a functionally less constrained interface, hence poorer modularity. But of course views provide a functionally more constrained interface than does direct access to base tables, because that direct access allows the app to be dependent on the base schemata, ie there is no module boundary at all between the database and the app.

    While in my experience most app developers DO write horrible SQL, but otherwise I agree with what's above.

    Data width is no more a concern with views than it is with base tables - and possible less of a concern because if a sloppy app developer writes select * against a view he gets only the columns which are in the select list of the view, whereas if he writes select * against a join between base tables he gets all columns in those tables.

    This is a place where I think we're starting to disconnect. In my experience, views tend to end up overloaded with more joins than are necessary for any particular query. This is more what I meant by data width, and it was a horrible term. Properly requested information from a view that expands through the optimizer properly is no worse than most other calls. It's the extra stuff that comes along for the ride that concerns me.

    Security is potentially better with views than with direct access to base tables, so direct access to base tables is more often illegal in production that access through views.

    When I mentioned this, I was more thinking ANY direct access to large volume data is illegal, in any form. SEC not so much but being able to pull up patient records in mass can be a significant concern. There are ways to address this, I agree, but many places I've worked at err on the side of paranoid caution.

    I think we are talking about different things here: "keeping views the same constantly" doesn't have any clear meaning to me. What I want is freedom to change the base schema without changing the apps - so I want to present the same view to the app when the base schema changes, and have no objection to having to rewrite the view in order to handle the new base schema.

    We're saying the same thing, just with different words. The view being the same is it's output that needs to stay consistent. Apps shouldn't be that close to the data layer in most circumstances, at least via the design philosophies I'm comfortable with.

    I also want freedom to change the apps view of the data without changing the base schema - for example I may want to allow the app to see historical data which it couldn't see before, which might mean either changing a view to add some new columns or adding a new view.

    This, to me, sounds like you're more familiar with coding a one size fits all view so the app can come at it with multiple approaches without having to design a proc for each call to the database. My concern with this is if your app is going to do this, it has to use dynamic SQL. To me, that should be the exception for a particular circumstance, not a design method.

    And I also want to be able to add columns to a view and have the old version of the app continue to work without change, so that most of the time when an upgrade requires both schema changes and app changes I can upgrade the base schema before I upgrade the app instead of having to synchronize the upgrades - this is often useful in ensuring that you only have to support one version of the base schema even if your customers are using 4 different releases of the app.

    This is not a concern I would have thought to attempt to address via views, but then my experience is always to release a SQL upgrade script (with a rollback handy) for each front end app deployment. I don't expect the app and the database to go out of sync that way.

    These are all the classical reasons for wanting strong modular boundaries and concealing the internals of one component from other components so that developers can't accidentally create unwanted dependencies between modules and thus place unwanted constraints the way the business can develop and evolve.

    This is where I think we're differing wholesale. Unless I'm ordered to by on high or it makes sense to cure a problem, my app devs would never even see a view unless they're writing their own procs, too. Dynamic SQL, transactional concerns, data volume... there's been so many problems with a tightly wound app-db integration that I do everything in my power to avoid it and make sure my app devs get exactly what they need with a simple proc call.

    Which leads us to how we got into a different set of assumptions about who would be using this type of optimization:

    My main example is a system which has extremely intelligent, competent, and experienced people working to a set of rules which rigidly enforces best practise modularity. It is of course useful that this defends you from the idiot in marketing or PR who turns an ad-hoc query into the foundation of his whole function, but that's not the important property - support of narrow modular boundaries is. Without those narrow modular boundaries, you will very quickly have either a product that is so slow to evolve that your competitors eat up all your business or an product so complex and unwieldly and riddled with kludges that it costs the earth to support it.

    I've found that neither of these are the case. I can easily modify data pulls without an app change via procedure to adjust for business rule changes, and with reasonable naming conventions and a quick proc name pull from the code by your app tier it's simplistic to get to a particular piece of code when necessary. Also, by completely de-coupling the app and database tiers via procedures, you allow for simpler troubleshooting of particular components when the old 'oh gods it's slow now!' problems arise.

    As long as your SQL developers aren't lazing around, they should be able to keep up with app building on (in my experience) a ratio of about 1 SQL:4 App.

    It's clear that we are not both addressing the same concerns. I think that's because of different experience.

    I'm coming from a product development view, where the schemata, middleware, apps, and front ends all went together and were all sold to lots of customers who demanded new features, 24X7 availability, 24/7 support, minimal unscheduled downtime risk (we couldn't guarantee the system would continue to run if vaporised by a thermonuclear device, so we didn't claim zero unscheduled downtime risk), regular upgrades, conformance with data protection regulations, very user-friendly and enjoyable end-user interface that would conform with all the latest fads and fashions, all of which meant rapid change which in turn meant very thoroughly modular design and construction was essential. You seem to be coming from a situation where you have at any time a single user community, with a well understood requirement and rare and slow change, so rare and slow that complex interdependencies between app layer and data interface layer and base schema are perfectly acceptable and modularity doesn't much matter.

    I also come from such a location, but I haven't worked in vendorware. All of my development and design has been either in-house custom software or web-portal backends. Rare and slow changes however are not part of either of those design philosophies, however, though your milage may vary depending on what shop you're at. Well understood requirements are rare, though I've been at a few places that were pretty good at it. Most places you're iteratively deploying because users don't want to look at 'old dev data', they want to see what they put in yesterday... even if they could put in stuff from yesterday in dev.

    I disagree however that it's a complex interdependency. It's proc. *Best caveman voice* You call proc, I deliver data. Dependency done. *end voice* No app interface to the data directly in any way. I think that's the significant disconnect we have. I don't personally care how close the app devs get to the data model and understanding it. The app should never be that close.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (5/1/2013)


    Last thing first. 🙂

    In this case that's the right order to tackle them.

    Without different viewpoints one can never learn. I'd prefer to try to continue the discussion to at least be sure we understand each other, even if we may disagree at the end. I will, however, strive for more clarity.

    Yes, I agree. And your latest post has made me understand much better where you are at, and realise where you think I am at. I now think we are both in the same place, apart from minor incidentals (eg how horrible are app devs, or what is a narrow modular interface).

    While in my experience most app developers DO write horrible SQL, ...

    I guess it depends what you call horrible. In my experience most developers write SQL that I would be ashamed to put my name to, but then so do most DBAs; I don't call that SQL horrible, because to me "horrible" means "much worse than the usual run of the mill", but this is run of teh mill SQL and of course nothing can be much worse than itself. The good developers and the good DBAs both write excellent SQL, nothing like that.

    Properly requested information from a view that expands through the optimizer properly is no worse than most other calls. It's the extra stuff that comes along for the ride that concerns me.

    That extra stuff annoys me too, if there is any. Whoever wrote the query needs to learn to use filters, and maybe whoever wrote the view needs educating too. The fact that extra stuff can happen is the only reason I object to SELECT * (I don't object to it when it can't pull in extra unwanted stuff, as is often true for diagnostic code).

    ..., I was more thinking ANY direct access to large volume data is illegal, in any form. SEC not so much but being able to pull up patient records in mass can be a significant concern. There are ways to address this, I agree, but many places I've worked at err on the side of paranoid caution.

    Data protection law is pretty fierce in Europe (including UK), and many organisations are paranoid about it. I'm inclined to believe that paranoid caution is the right approach. Given how utterly useless various anonymising schemes have turned out to be we can't trust the "experts" on what it's safe to let out, and given how frequently some idiot parks unencrypted data on a pen drive or pad or laptop and leaves it somewhere public we can't trust data users to care about data protection either. Maybe the only solution is to altogether prevent people getting large chunks of data. I've not had to deal with extremely sensitive stuff as opposed not really sensitive personally identifiable data (I've tried to instill paranoia in the non-technical management even over that).

    This, to me, sounds like you're more familiar with coding a one size fits all view so the app can come at it with multiple approaches without having to design a proc for each call to the database. My concern with this is if your app is going to do this, it has to use dynamic SQL. To me, that should be the exception for a particular circumstance, not a design method.

    Dynamic SQL??? What on earth would that be for? I think you've misunderstood badly. I certainly don't want one-size-fits-all, every system I've been involved with since about 2000 has ended up (in some cases after I've overcome a lot of resistance from the backwoodsmen who want to embed DML statements directly in C++) as a system where the only thing the database presents to the outside world is a bunch of stored procedures.

    ..., so that most of the time when an upgrade requires both schema changes and app changes I can upgrade the base schema before I upgrade the app instead of having to synchronize the upgrades - this is often useful in ensuring that you only have to support one version of the base schema even if your customers are using 4 different releases of the app.

    This is not a concern I would have thought to attempt to address via views, but then my experience is always to release a SQL upgrade script (with a rollback handy) for each front end app deployment. I don't expect the app and the database to go out of sync that way.

    So you never have to cope with a couple of servers and many hundreds of clients, no down-time permitted, client can be upgraded only when it is not in use; client is normally in use 24/7, which occassional periods of typically 4 hours where it is not in use; there is never a situation where many clients are simultaneously not in use; client upgrade is over net from servers, and will sometimes involve a complete reimage of client. Server upgrade has to be one server at a time, since there won't otherwise be enough servers to maintain acceptable response time.

    I think that's the situation when you have to design for server and client being out of sync.

    This is where I think we're differing wholesale. Unless I'm ordered to by on high or it makes sense to cure a problem, my app devs would never even see a view unless they're writing their own procs, too.

    That's what makes me think we are both on the same page, just misunderstanding. For me, using views to isolate the app from the schema is very much a second choice, something I won't do unless I'm ordered from on-high not to take everything to a procedural interface. I had misunderstood your earlier posts to say that you though views were a bad idea and it would be better to give the app access to the base schema; so my posts were arguing that using views was better than that, which seems to have given you the impression that I thought views were the right way to go. Talk about cross-misunderstandings!

    Dynamic SQL, transactional concerns, data volume... there's been so many problems with a tightly wound app-db integration that I do everything in my power to avoid it and make sure my app devs get exactly what they need with a simple proc call.

    Tightly bound app-db integration is pure poison. If you have that you can't make upgrades of DB and app independent, and for me that was absolutely essential (a company goes bust if you can't do it issue).

    Which leads us to how we got into a different set of assumptions about who would be using this type of optimization:

    My main example is a system which has extremely intelligent, competent, and experienced people working to a set of rules which rigidly enforces best practise modularity. It is of course useful that this defends you from the idiot in marketing or PR who turns an ad-hoc query into the foundation of his whole function, but that's not the important property - support of narrow modular boundaries is. Without those narrow modular boundaries, you will very quickly have either a product that is so slow to evolve that your competitors eat up all your business or an product so complex and unwieldly and riddled with kludges that it costs the earth to support it.

    I've found that neither of these are the case. I can easily modify data pulls without an app change via procedure to adjust for business rule changes, and with reasonable naming conventions and a quick proc name pull from the code by your app tier it's simplistic to get to a particular piece of code when necessary. Also, by completely de-coupling the app and database tiers via procedures, you allow for simpler troubleshooting of particular components when the old 'oh gods it's slow now!' problems arise.

    That I don't understand - you seem to be contradicting yourself: you will do all in your power to avoid the app and the db being wound tightly together, but you don't mind not having narrow modular boundaries? You can completely decouple the app and database tiers without narrow modular boundaries? That doesn't make any sense - if you have narrow modular boundaries two components can't be tightly wound up in each other; if you don't, they can and almost certainly are. That's the definition of a narrow modular boundary: a set of interfaces which gets the job done while preventing the two components from getting wound up in each other.

    As long as your SQL developers aren't lazing around, they should be able to keep up with app building on (in my experience) a ratio of about 1 SQL:4 App.

    Not terribly productive SQL developers then. I generally did far better than that :laugh:

    Tom

  • L' Eomot Inversé (5/3/2013)


    Dynamic SQL??? What on earth would that be for? ... , every system I've been involved with since about 2000 has ended up (in some cases after I've overcome a lot of resistance from the backwoodsmen who want to embed DML statements directly in C++) as a system where the only thing the database presents to the outside world is a bunch of stored procedures.

    L' Eomot Inversé (5/3/2013)


    Tightly bound app-db integration is pure poison. If you have that you can't make upgrades of DB and app independent, and for me that was absolutely essential (a company goes bust if you can't do it issue).

    My personal experience differs…radically and I figure I am working in another branch of software development then you are. Your single "hard core" approach to databases (purposely provocative description) can be damaging to projects too. It depends on the particular application one works on/with and the requirements. I would argue that most applications, notably the smaller to medium custom written, are better implemented using tight binding between database and application. For ease of development as well as performance and flexibility reasons and then some possibly beyond the scope of your own particular needs and experience!

    If some other application needs access to a specific part of the database you can fall back on stored procedures and provide customized access rights just for them to functionally provide what is needed securely and without being schema bound there. The only situation where this happened to me was when two third party applications for the same customer as ours needed piggybacking on existing login accounts stored in that database of our application. A few stored procedures and a token system was all it took to implement this secure.

    In a public discussion of this matter it helps to acknowledge not every service is like facebook or a non-client specific application that is deployed over and over and just customized by third parties. Most applications I bet are custom written against a non-public database schema designs where the database is provided as part of the service an application provides and serviced by the application provider and no-one else.

    By no means this translates to "not needing declarative referential integrity applied to it its fullest" in the database. Some people seem to think of the database just as a necessary place to store data and keep it as dumb as possible…consider me NOT one of them! It just means business logic constraints that do not fit this declarative pattern can also be implemented in the application layer without any real drawback. But with the benefits this other approach brings, such as much better and efficient dynamic SQL and richer state management.

    And no, I hate ORMs with a passion, not referring to the utter SQL crap those deliver today either.

    As for the requirement of dynamic SQL, nearly all application back-offices need this in spades for search and/or real-time reporting purposes. If I compare the issues we have in our applications developed this way and the number we find others have, it is staggering! We never have the issues related parties have (being unable to get anything data related to be consistent and/or correct). And it is not for lack of having to do complex stuff, it is just a matter of knowing what you do and have the basics in place, plus the proper knowledge level of SQL (this last part is an area that is pretty thin out there I am afraid).

    Part of the reason is that most developers do not think in data, but code and at the same time DBA’s generally have the reverse mindset. Both parties like to have total control and freedom in their own domain and comfort zone! That does not mix and it would be elevating if more people would step a bit further outside their own domain and comfort zone and gain relevant experience and insights that way.

    Sorry for the long text, but after reading the latest discussion I don’t want to have people feel there is only one right way in what seems of this class between dba titans ?.

  • EDIT: This is a textwall, I know, sorry for that but I got on a roll. Peter, a response to you starts about 1/3 of the way down. /EDIT

    L' Eomot Inversé (5/3/2013)


    I guess it depends what you call horrible. In my experience most developers write SQL that I would be ashamed to put my name to, but then so do most DBAs; The good developers and the good DBAs both write excellent SQL, nothing like that.

    I fear there's a bit of semantisicm in there but I get your drift. What I wrote five years ago I would consider horrible today. However, there may be a locality difference here. DBAs are just another title for SQL Devs here in the states, and they're 'run of the mill' for me. App devs tend to write multiple nested loops and the like because of their familiarity with that form of processing and data manipulation.

    Data protection law is pretty fierce in Europe (including UK), and many organisations are paranoid about it. I'm inclined to believe that paranoid caution is the right approach. ... Maybe the only solution is to altogether prevent people getting large chunks of data. I've not had to deal with extremely sensitive stuff as opposed not really sensitive personally identifiable data (I've tried to instill paranoia in the non-technical management even over that).

    Some of this was definately from the misunderstanding of how you intend to use views. I'll get more into that below, but most of my contracts lately have been incredibly sensitive data sets... it's part of why I avoid the discussions completely on 'how easy it is to find people' from a few bits of data. I don't need the temptation. 😉

    Dynamic SQL??? What on earth would that be for? I think you've misunderstood badly. I certainly don't want one-size-fits-all, every system I've been involved with since about 2000 has ended up (in some cases after I've overcome a lot of resistance from the backwoodsmen who want to embed DML statements directly in C++) as a system where the only thing the database presents to the outside world is a bunch of stored procedures.

    Definately misunderstood, but I think we're coming to the point where our approaches were radically different to deal with the same problem.

    I had misunderstood your earlier posts to say that you though views were a bad idea and it would be better to give the app access to the base schema; so my posts were arguing that using views was better than that, which seems to have given you the impression that I thought views were the right way to go. Talk about cross-misunderstandings!

    *facepalm* Oh, lords yes. That was definately NOT what I meant! 😛

    Tightly bound app-db integration is pure poison. If you have that you can't make upgrades of DB and app independent, and for me that was absolutely essential (a company goes bust if you can't do it [without] issue).

    ...

    That I don't understand - you seem to be contradicting yourself: you will do all in your power to avoid the app and the db being wound tightly together, but you don't mind not having narrow modular boundaries? You can completely decouple the app and database tiers without narrow modular boundaries?

    Okay, I apparently had no idea of what the definition of a 'narrow modular boundary' was... and possibly still don't.

    Now, how did I deal with the multiple app front ends being 'updated' in pieces for a 24/7 five nines system? We used schemas bound to logins and made sure the front end code simply called for procs, not dbo.procs. Any proc that could work between two versions stayed as 'dbo'. Any proc that was a version output/input change (parameters or output columns/datatypes) was dbo for the old version, and schema'd to the new version. The schema would come in, get its own procs in its schema first, and then default to dbo if there wasn't one for its default schema name. Same way tables work. The next upgrade shuffled that schema down to the DBO level.

    More than a single version being supported except during switchovers is where I was reacting to, not the logistics behind a five nines system upgrades, which is a different scenario. I've never had to try to support four+ versions off the same schema, and it's not something I've ever really deeply thought about.

    @peter-2:

    peter-757102 (5/3/2013)


    I would argue that most applications, notably the smaller to medium custom written, are better implemented using tight binding between database and application. For ease of development as well as performance and flexibility reasons and then some possibly beyond the scope of your own particular needs and experience!

    Development is eased with direct binding but performance is rarely assisted. Also, you expose yourself to a number of dangers that dynamic SQL poses. I have worked in very small shops where they made practically six calls to 4 tables in a single database... and that's all they ever really needed. The tables could have been denormalized, but no reason, really. In this case... the proc is STILL the better choice. You've encapsulated the call for troubleshooting, removed injection concerns, and have better security separation between front and back end.

    Additionally, tight binding and flexibility rarely go hand in hand. If you mean the flexibility for the devs to work in only their programming language to send code to SQL, then I'd agree. If you mean future programming flexibility, I'd disagree. Any change to schema would require you to re-write multiple code locations instead of a single proc acting as a function call for the front end.

    If some other application needs access to a specific part of the database you can fall back on stored procedures and provide customized access rights just for them to functionally provide what is needed securely and without being schema bound there.

    And yet if the procs are already written you don't even need to do that.

    In a public discussion of this matter it helps to acknowledge not every service is like facebook or a non-client specific application that is deployed over and over and just customized by third parties. Most applications I bet are custom written against a non-public database schema designs where the database is provided as part of the service an application provides and serviced by the application provider and no-one else.

    You're correct, and that's the majority of my work. Now, are the front ends public or private depends on what we're doing. Most companies these days have some form of portal for their clientelle, and thus you must allow for eventual exposure of some kind of data to them. Now, is the majority of the work being done in the back? Sure.

    Out of curiousity, have you ever worked with DB2? DB2 required the kind of tight binding you're speaking of. There's a reason it's basically out of the market except in legacy systems.

    It just means business logic constraints that do not fit this declarative pattern can also be implemented in the application layer without any real drawback. But with the benefits this other approach brings, such as much better and efficient dynamic SQL and richer state management.

    You'll have to define 'richer state management' to me. However, I agree with there being rules that are more efficient in the app layer than in the database layer. The reverse is also true. I'm more than happy to discuss with devs on the teams I assist which I feel would be best where and why. Particularly formatting/string manipulation, display, and row to row computations are certainly best done in an iterative environment instead of a set based one. Aggregations, data organization and associations, filtering, and usually mathmatics are normally best done at the DB level.

    However, there is a significant drawback if you just want to look at time to process a call to the DB. A proper string scraper in the front end to remove all possibilities of injection to your particular system can take anywhere from 15 to 20 milliseconds. No, that's not long. Every parameter for every call however has to be ran through that process. It's just one place where you're easing the time for development and placing it on the time to process.

    Another thing is every dynamic call to the DB has to be uniquely compiled. It can't re-use cache plans. Now, cached plans have some of their own quirks but under 99.9% of the situations they're just fine.

    Thirdly if the app is 'overhandling' data, you're dealing with a lot of unnecessary network traffic and waits.

    Finally, and my most vehement argument, is eventually one of those little apps grows up. Your 'prototype' becomes the best thing since sliced bread and someone in accounting just dumped every spreadsheet for the last 30 years into it over a weekend. Now, since you've cut so many corners, you're looking at re-designing the entire thing into optimal calls. Wouldn't it have made sense just to have done it 'right' in the first place, and then tweak for volume?

    As for the requirement of dynamic SQL, nearly all application back-offices need this in spades for search and/or real-time reporting purposes.

    Catch All Searches, I agree. And they can be done under the hood without significant problems. Real Time reporting you'll have to be a bit more explicit to me about. If you mean real time from the data, I hook SSRS reports to procs all day long off our OLTP system and they're used quite effeciently. Prior to SSRS ASP.NET served similar purposes for us, and so did MS Access with passthrough queries. Now, if you're talking dynamically built user reports, we're in a whole different conversation.

    A) That's not common. Most firms expect Tech to build them for them. Part of why SSRS 2k8 went more dev driven than user driven, like SSRS 2k5 was.

    B) If you've got power users at that level invest some time in SSAS and PowerPivot. It's a lot easier.

    If I compare the issues we have in our applications developed this way and the number we find others have, it is staggering! We never have the issues related parties have (being unable to get anything data related to be consistent and/or correct). And it is not for lack of having to do complex stuff, it is just a matter of knowing what you do and have the basics in place, plus the proper knowledge level of SQL (this last part is an area that is pretty thin out there I am afraid).

    If you have the proper level of SQL, putting it in a proc is no more difficult than writing the query and wrapping it in CREATE PROC @param1 INT, @param2 VARCHAR(50) AS ... GO

    Part of the reason is that most developers do not think in data, but code and at the same time DBA’s generally have the reverse mindset.

    I couldn't agree with this specific statement more. Thus why I get 'enforced' rules of using things like CI for database scripts in storage. Ever try to CI an index column order modification, and not have it rebuild the index EVERY time the CI is ran?

    Both parties like to have total control and freedom in their own domain and comfort zone! That does not mix and it would be elevating if more people would step a bit further outside their own domain and comfort zone and gain relevant experience and insights that way.

    Does it help you if you knew that I started as a programmer? That it helps me understand what my devs need? That I still give them bloody havoc when they start sending dynamic trash down the pipe that ignores the indexes in place and they're scanning a 400 gig audit table?

    That last is mostly because they didn't bother to wrap it in sql_execute with parameters properly and didn't have me do a code review than because they wrote crap SQL. Sometimes you need to scan. Sometimes there's ways to do it without shared locking the entire table and thus locking the triggers that are trying to write to it from the main table.

    Sorry for the long text, but after reading the latest discussion I don’t want to have people feel there is only one right way in what seems of this class between dba titans.

    I agree, there is not only a single right way... if your machines can handle an unoptimized load on your database(s). Those also aren't the times when you need a SQL Dev professional like the majority of us who hang around here.

    There are cases where I allow my front ends to send me down dynamic SQL. Particular cases where there is an exact scenario that makes for difficult or non-optimal T-SQL coding and it won't break security or expose the company awkwardly if the front end login information is compromised. However, data storage and retrieval logic will never fit directly into object/instance logic, you'll always require widgets of some form. Procs and their ilk are generally the most secure method and allow for both logic reusage and portability as well as separation of components during troubleshooting issues at 3 in the morning when some client 8 hours east of you on the planet is screaming holy rebellions because they can't check their cat's color for today.

    If you want to ease development by avoiding them, that's your shop's call. Just hope the powers that be will give you time to rebuild from scratch once it gets out of scope.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • This is going to be a long reply, mostly because of much quoting, bear with me ?.

    Evil Kraig F (5/8/2013)

    peter-757102 (5/3/2013)


    I would argue that most applications, notably the smaller to medium custom written, are better implemented using tight binding between database and application. For ease of development as well as performance and flexibility reasons and then some possibly beyond the scope of your own particular needs and experience!

    Development is eased with direct binding but performance is rarely assisted. Also, you expose yourself to a number of dangers that dynamic SQL poses. I have worked in very small shops where they made practically six calls to 4 tables in a single database... and that's all they ever really needed. The tables could have been denormalized, but no reason, really. In this case... the proc is STILL the better choice. You've encapsulated the call for troubleshooting, removed injection concerns, and have better security separation between front and back end.

    Additionally, tight binding and flexibility rarely go hand in hand. If you mean the flexibility for the devs to work in only their programming language to send code to SQL, then I'd agree. If you mean future programming flexibility, I'd disagree. Any change to schema would require you to re-write multiple code locations instead of a single proc acting as a function call for the front end.

    Performance is assisted because you can create better tuned (dynamic) queries without procedural overhead in the database. Procedural logic to make SQL code adaptive is more efficiently handled in a language suited to procedural programming.

    Another advantage of tight binding is that the SQL code that is written can use the same declared values for status (and other) codes as is used by the rest of the application. There still needs to be a match between these declarations and the actual status table contents of course. But at least you do not end up with unmaintainable constants in the SQL code as you would be forced to when writing stored procs!

    Firing unneeded queries happens when people write code not knowing what that code really does behind the abstractions they use. This is the danger of abstraction and misuse is promoted by constructs like LINQ and ORM's (and yes, this is true for every hidden abstraction, thus stored procedures as well). Developers are using these abstractions most often because they never learned proper SQL. Which is a shame as any half decent level developer should know quite a bit of advanced SQL (I intend to write a piece on that someday, also for internal use). That there exists of a lot of bad database related code in applications "out there" just has another root cause then tight binding.

    And just to be clear, I want developers to write SQL and become good at it! It promotes awareness of the data and structure they operate on and leads to better understanding, choices and in the end...better developers too. Just for this alone, schema awareness is essential and shielding the developer from it, is a very bad path to follow!

    Your argument that changing the schema is easier with a stored proc, is comparing apples to oranges. That same stored proc can be implemented effectively in a single method in the application for application specific databases. And when a schema changes significantly, the application needs to change just the same! Not to mention the advantages of having that code in the same version control as the application!

    Evil Kraig F (5/8/2013)

    If some other application needs access to a specific part of the database you can fall back on stored procedures and provide customized access rights just for them to functionally provide what is needed securely and without being schema bound there.

    And yet if the procs are already written you don't even need to do that.

    You are thinking a one size fits all thought. What i write is that only in the rare exception (for most apps) you need a strored proc for security reasons, and only then you have to write one. Then a response of "it would be already there" doesn't make much sense! How would always writing procedures be better?

    Evil Kraig F (5/8/2013)

    In a public discussion of this matter it helps to acknowledge not every service is like facebook or a non-client specific application that is deployed over and over and just customized by third parties. Most applications I bet are custom written against a non-public database schema designs where the database is provided as part of the service an application provides and serviced by the application provider and no-one else.

    You're correct, and that's the majority of my work. Now, are the front ends public or private depends on what we're doing. Most companies these days have some form of portal for their clientelle, and thus you must allow for eventual exposure of some kind of data to them. Now, is the majority of the work being done in the back? Sure.

    Out of curiousity, have you ever worked with DB2? DB2 required the kind of tight binding you're speaking of. There's a reason it's basically out of the market except in legacy systems.

    I have not worked with DB2. At the same time I do not get the feel you understand where I am coming from. An application consist of several layers, one is the data access layer (DAL for short) and even that can be subdivided in more layers.

    What I am talking about that that at least the lowest layer of the DAL, there is 100% awareness of the database schema. Everything acting directly on one entity is found there, including constants related to the schema. Things spanning functional groups of entities can be found in higher DAL layers.

    The decoupling between schema and interface in this approach does not happen at the application/database boundary, but instead above the DAL (which resides in the application). It is the interface where you generally find things out of direct alignment with the schema. Additional models supporting those interfaces represent the solution to this disconnect and keep both model groups clean. The DAL (database or otherwise) should not be inflicted with changes every time some part of the user interface is re-arranged.

    Evil Kraig F (5/8/2013)

    It just means business logic constraints that do not fit this declarative pattern can also be implemented in the application layer without any real drawback. But with the benefits this other approach brings, such as much better and efficient dynamic SQL and richer state management.

    You'll have to define 'richer state management' to me. However, I agree with there being rules that are more efficient in the app layer than in the database layer. The reverse is also true. I'm more than happy to discuss with devs on the teams I assist which I feel would be best where and why. Particularly formatting/string manipulation, display, and row to row computations are certainly best done in an iterative environment instead of a set based one. Aggregations, data organization and associations, filtering, and usually mathmatics are normally best done at the DB level.

    We are on the same page here I think. What I mean with “richer” is handling anything beyond the capabilities of the relational model and constraints. Think of keeping columns from different entities that are somehow related, in sync. You can write triggers for this too, but as usual it depends on the specifics if that is a good solution or not.

    Evil Kraig F (5/8/2013)


    However, there is a significant drawback if you just want to look at time to process a call to the DB. A proper string scraper in the front end to remove all possibilities of injection to your particular system can take anywhere from 15 to 20 milliseconds. No, that's not long. Every parameter for every call however has to be ran through that process. It's just one place where you're easing the time for development and placing it on the time to process.

    15-20 milliseconds sounds like an awful lot to me, what I have in mind will be nowhere near even one millisecond! And the cost of parameter passing you will have is identical for stored procedure calls! And dynamic SQL, processing stored procedures isn’t free either and generally procedural logic is not efficient in SQL.

    Evil Kraig F (5/8/2013)


    Another thing is every dynamic call to the DB has to be uniquely compiled. It can't re-use cache plans. Now, cached plans have some of their own quirks but under 99.9% of the situations they're just fine.

    You seem to assume that dynamic SQL send from the application DAL is somehow not parameterized. To be clear, all parameters not explicitly configured otherwise, are parameterized in what I am describing. It is just that all the sores of coding this can be removed from the programmer without alienating him/her from the actual schema and the SQL.

    Evil Kraig F (5/8/2013)


    Thirdly if the app is 'overhandling' data, you're dealing with a lot of unnecessary network traffic and waits.

    Anything you put in a stored procedure has a counterpart in the application DAL. Thus only in situations where one stored procedure would call another stored procedure would there be extra traffic. If that is significant, then it is time to lift the combined logic to a higher level in the DAL and write specialized code for handling this situation in one step. Same as you would do with stored procedures really.

    Evil Kraig F (5/8/2013)


    Finally, and my most vehement argument, is eventually one of those little apps grows up. Your 'prototype' becomes the best thing since sliced bread and someone in accounting just dumped every spreadsheet for the last 30 years into it over a weekend. Now, since you've cut so many corners, you're looking at re-designing the entire thing into optimal calls. Wouldn't it have made sense just to have done it 'right' in the first place, and then tweak for volume?

    You are misrepresenting the situation, there were no corners cut! What you will find in your stored procedure solution you now find in the application DAL, neatly isolated. Even more properly isolated then with procedures actually, since the DAL consists of several layers. The application does still the same, thus the models for the interface remain intact too.

    Evil Kraig F (5/8/2013)


    As for the requirement of dynamic SQL, nearly all application back-offices need this in spades for search and/or real-time reporting purposes.

    Catch All Searches, I agree. And they can be done under the hood without significant problems. Real Time reporting you'll have to be a bit more explicit to me about. If you mean real time from the data, I hook SSRS reports to procs all day long off our OLTP system and they're used quite effeciently. Prior to SSRS ASP.NET served similar purposes for us, and so did MS Access with passthrough queries. Now, if you're talking dynamically built user reports, we're in a whole different conversation.

    We don’t use catch all queries for general list views, filter conditions and sorting are added dynamically based on user input. Often we have reporting functionality build on aggregate tables and even user definable reports. Both type of systems work exclusively by generating SQL code dynamically on the fly.

    Evil Kraig F (5/8/2013)


    A) That's not common. Most firms expect Tech to build them for them. Part of why SSRS 2k8 went more dev driven than user driven, like SSRS 2k5 was.

    B) If you've got power users at that level invest some time in SSAS and PowerPivot. It's a lot easier.

    SSRS is awful from what I seen of it, I wouldn’t dare to sell such a solution if my life depended on it (ok, maybe I would in that extreme, but you get the idea)! It’s so disconnected from what a user really needs and totally not part of the application regular users use. It also requires its own database, backup procedures, users, has a bad interface, you name it. Nothing I would like to use as a developer either, it is just really bad all-around. In short application specific reporting functionality never needs this, let alone the dependencies that come with it.

    Evil Kraig F (5/8/2013)


    If I compare the issues we have in our applications developed this way and the number we find others have, it is staggering! We never have the issues related parties have (being unable to get anything data related to be consistent and/or correct). And it is not for lack of having to do complex stuff, it is just a matter of knowing what you do and have the basics in place, plus the proper knowledge level of SQL (this last part is an area that is pretty thin out there I am afraid).

    If you have the proper level of SQL, putting it in a proc is no more difficult than writing the query and wrapping it in CREATE PROC @param1 INT, @param2 VARCHAR(50) AS ... GO

    That is just overhead and does not allow for maintainable SQL code. It does however allow you to apply procedure lever security, which as I made clear most single purpose applications databases really do not require. The body of the stored procedure as you describe here can be executed as a batch just the same without wrapping a redundant interface around it.

    In the end it is not about what is more difficult, it is about that develops easier and is cleaner. Having access to constants in an application DAL is one benefit, and source control is another. From a developer perspective having a stored procedure to call would still mean calling a proxy method for that stored procedure, which in turn is really a proxy for the SQL batch inside the procedure. That chain is overhead and complexity. Then consider aspects like error reporting when a query fails. The application being aware of the query allows much more detailed error reporting, allowing things to be fixed faster!

    I have to say the C# environment I recently started developing in is quite underdeveloped the area of database access, as are the ORMs I seen for it so far (I think an ORM is for the most part a bad idea anyway). Microsoft and maybe the industry as a whole choose the path of least resistance and addressed symptoms instead of curing causes of bad code (that being lack of basic knowledge).

    I think they have been on this path since the late 90s when garbage collection based languages gained traction. Since that event, I feel most code has become less clean, less efficient, is worse maintainable and has more bugs (be it of another kind, I never had memory leaks back in those days either tho). Everyone and their donkey can now write a few lines of code and get away with it for some time until what is made, simply implodes.

    Evil Kraig F (5/8/2013)


    Part of the reason is that most developers do not think in data, but code and at the same time DBA’s generally have the reverse mindset.

    I couldn't agree with this specific statement more. Thus why I get 'enforced' rules of using things like CI for database scripts in storage. Ever try to CI an index column order modification, and not have it rebuild the index EVERY time the CI is ran?

    Can’t say I have as I do not know of CI. I did look it up and maybe you mean CodeIgniter for PHP? I simply haven’t worked with that, and I wouldn’t as I don’t like PHP at all for various reasons. I haven’t checked the latest versions to see if they bettered their life.

    Evil Kraig F (5/8/2013)


    Both parties like to have total control and freedom in their own domain and comfort zone! That does not mix and it would be elevating if more people would step a bit further outside their own domain and comfort zone and gain relevant experience and insights that way.

    Does it help you if you knew that I started as a programmer? That it helps me understand what my devs need? That I still give them bloody havoc when they start sending dynamic trash down the pipe that ignores the indexes in place and they're scanning a 400 gig audit table?

    That last is mostly because they didn't bother to wrap it in sql_execute with parameters properly and didn't have me do a code review than because they wrote crap SQL. Sometimes you need to scan. Sometimes there's ways to do it without shared locking the entire table and thus locking the triggers that are trying to write to it from the main table.

    I know what you mean, where I work there is no separate DBA role but a relatively high level of SQL knowledge and I expect new members to skill up too. Some problems simply require awareness of the schema and the insight to update the schema if that is beneficial. Other problems like SQL injection, query parameterization and when explicitly not wanted, no parameterization (for say a status value to take advantage of a filtered index), require a proper method of writing for and executing of SQL statements. This is where I found LINQ to SQL, LINQ in general and ORMs to be a major obstacle in nearly every aspect.

    Evil Kraig F (5/8/2013)


    Sorry for the long text, but after reading the latest discussion I don’t want to have people feel there is only one right way in what seems of this class between dba titans.

    I agree, there is not only a single right way... if your machines can handle an unoptimized load on your database(s). Those also aren't the times when you need a SQL Dev professional like the majority of us who hang around here.

    Not sure I agree with that, scaling issues can arise even with moderate growth in data. That is why even as a developer you need to know SQL and understand relational databases and not use them for dumb storage. Often you hear the mantra “premature optimization is evil”, where in practice it is mostly applied by people that create horrible inefficient solutions, that with the same amount of effort but more knowledge would run much better! Therefore my mantra is “performance by design” ?, much more positive!

    Evil Kraig F (5/8/2013)


    There are cases where I allow my front ends to send me down dynamic SQL. Particular cases where there is an exact scenario that makes for difficult or non-optimal T-SQL coding and it won't break security or expose the company awkwardly if the front end login information is compromised. However, data storage and retrieval logic will never fit directly into object/instance logic, you'll always require widgets of some form. Procs and their ilk are generally the most secure method and allow for both logic reusage and portability as well as separation of components during troubleshooting issues at 3 in the morning when some client 8 hours east of you on the planet is screaming holy rebellions because they can't check their cat's color for today.

    If you want to ease development by avoiding them, that's your shop's call. Just hope the powers that be will give you time to rebuild from scratch once it gets out of scope.

    I work on several projects that have activity in all time zones and that are “mission critical” for their particular customers. Granted the load we have is not that of say Facebook or twitter, but we do not face the issues you describe because we do not have that model mismatch to begin with! If something significant changes we do refactor the relevant code. We know exactly what needs to be changed and where.

  • peter-757102 (5/3/2013)


    L' Eomot Inversé (5/3/2013)


    Tightly bound app-db integration is pure poison. If you have that you can't make upgrades of DB and app independent, and for me that was absolutely essential (a company goes bust if you can't do it issue).

    My personal experience differs…radically and I figure I am working in another branch of software development then you are. Your single "hard core" approach to databases (purposely provocative description) can be damaging to projects too. It depends on the particular application one works on/with and the requirements. I would argue that most applications, notably the smaller to medium custom written, are better implemented using tight binding between database and application. For ease of development as well as performance and flexibility reasons and then some possibly beyond the scope of your own particular needs and experience!

    That of course is the good old "modular programming is a waste of effort, it's easier to construct an interwoven mess than to define and implement modules" argument. It is true for a certain class of applications, those that are so small that they can be written (including all the database code and all the presentation code) by one man on the back of a couple of envelopes in a pretty short time and have requirements so stable that the change rate that has to be supported is trivial. I believe that the vast majority of code falls outside that scope.

    peter-757102 (5/3/2013)


    In a public discussion of this matter it helps to acknowledge not every service is like facebook or a non-client specific application that is deployed over and over and just customized by third parties. Most applications I bet are custom written against a non-public database schema designs where the database is provided as part of the service an application provides and serviced by the application provider and no-one else.

    I agree fully. Certainly my applications have been written against non-public schemata and the database was part of the service provided by the application provider. So I think you are barking up the wrong tree if you think I am talking about databases that are not provided along with the application (how on earth would an interface well suited to the application be provided by the database in the other case) or about things like facebook (where there is only one version of the application at any one time, and I want to support multiple versions).

    peter-757102 (5/3/2013)


    [By no means this translates to "not needing declarative referential integrity applied to it its fullest" in the database. Some people seem to think of the database just as a necessary place to store data and keep it as dumb as possible…consider me NOT one of them! It just means business logic constraints that do not fit this declarative pattern can also be implemented in the application layer without any real drawback.

    More things where we have no disagreement. There are often constraints required by business logic that can't be expressed by the structure of the schema, and they should be placed wherever they are easiest to implement. In fact I might go further than you in this direction, in that I believe that in some data warehouse applications it may even be a good idea to shed some referential integrity constraints.

    peter-757102 (5/3/2013)


    [But with the benefits this other approach brings, such as much better and efficient dynamic SQL and richer state management.

    With a procedural interface, you can put the dynamic SQL construction and execute call in a stored procedure and get just as efficient dynamic SQL without breaking modularity.

    peter-757102 (5/3/2013)


    [And no, I hate ORMs with a passion, not referring to the utter SQL crap those deliver today either.

    Snap yet again!

    peter-757102 (5/3/2013)


    [As for the requirement of dynamic SQL, nearly all application back-offices need this in spades for search and/or real-time reporting purposes.

    I see three different cases to look at. For regular reporting requirements, with reports that will be used over and over and are running off the active data there's no reason not to provide stored process that generate any dynamic SQL that's wanted; that keeps knowledge of the schema inside the database. For genuinely ad hoc reports I don't mind dynamic SQL outside the database, provided that they really are ad-hoc one off reports that don't have to be taken account in decisions about changing the schema, i.e. database changes that break them are guaranteed to be the report writers' problems because those reports are not allowed to constrain the schema in any way.

    As for reports using a data warehouse, I'm sure Craig F has a far better view of that than I have (the nearest thing I ever had to a data warehouse was a database which was trickle updated from an active db which would contain only current data (current meant data less than a month old), containing history from day 1 up to reasonably recently, which allowed separate backup and recovery strategies for current and historical data and avoided having to select a time at which to degrade the active stuff by running a large scale extraction to the history stuff, while permitting reports to see both old stuff and current stuff if they really needed to. I'm sure that (a) that was not the best way to do it (my predecessors had kept all history in the active db and ran reports direct off that, which turned out to be a catastrophe for performance and for backup; my "temporary fix" improved things a lot, but more improvement would have been better, as it was still in use seven years later and for all I know still is now) and (b) someone like Craig, with real experience in this area, will have more clue than I have.

    Tom

  • As usual, this conversation got away from my 'quick reply' ability to respond to the pieces, so my tardiness isn't ignoring it, it's wanting to make sure I had enough time to process your comments and to make sure I was clear.

    peter-757102 (5/10/2013)


    Performance is assisted because you can create better tuned (dynamic) queries without procedural overhead in the database.

    You'll have to clarify this to me, because no matter what you write, there will be procedural overhead in the database. Using dynamic SQL in this manner just means you're compiling queries more often.

    Procedural logic to make SQL code adaptive is more efficiently handled in a language suited to procedural programming.

    Adaptive SQL is not the purpose in most use cases, even small ones, in my experience. Optimized SQL is. If you don't care about optimization due to size then method can be what you like as long it's secure. Three rules, in this order: Accurate, Efficient, Flexible. Flexible must be third as the DB is the bottleneck otherwise.

    Another advantage of tight binding is that the SQL code that is written can use the same declared values for status (and other) codes as is used by the rest of the application.

    Can you clarify what you mean by this? I'm afraid whatever correllation you're trying to make is missing me. You mean constant declarations?

    There still needs to be a match between these declarations and the actual status table contents of course. But at least you do not end up with unmaintainable constants in the SQL code as you would be forced to when writing stored procs!

    How are constants any more or less maintainable when sent as a parameter to a proc or directly inserted into your sp_executesql as a parameter? This offer of preferred functionality doesn't make any sense to me.

    Firing unneeded queries happens when people write code not knowing what that code really does behind the abstractions they use. This is the danger of abstraction and misuse is promoted by constructs like LINQ and ORM's (and yes, this is true for every hidden abstraction, thus stored procedures as well). Developers are using these abstractions most often because they never learned proper SQL.

    With this, I'm in complete agreement. There's other concerns I have with ORM, like multiple instances will hold cached data that collides when multiple updates are made against the same key if instituted horribly when they flush to the db, but that's another discussion.

    Let's assume ORM and LINQ are miserable and stay on target; Directly written SQL in the code being sent as adhoc calls.

    Which is a shame as any half decent level developer should know quite a bit of advanced SQL (I intend to write a piece on that someday, also for internal use). That there exists of a lot of bad database related code in applications "out there" just has another root cause then tight binding.

    While I don't disagree with this in theory, you may be part of the 2% I know of out there. Most app developers write horrible dataset logic, and most DB Devs can't write efficiently in iterative instance languages. For the record, I am NOT one of those 2%. My app code is crap. The simple logic approaches to the desired goal are so different that it's rare to find anyone who can do both at more than the basic level well without having multiple personalities.

    Now, for advanced SQL? Most app developers are spending their lives just trying to keep up with the libraries that are being shoved down their throats in new versions and whatever they just downloaded when they googled up a solution. They don't have time to deal with the datastore (not even nearing the data engine) as long as it works. I disagree. You can have a life or you can keep yourself up to date on everything in the industry. Rarely can you do both.

    And just to be clear, I want developers to write SQL and become good at it! It promotes awareness of the data and structure they operate on and leads to better understanding, choices and in the end...better developers too. Just for this alone, schema awareness is essential and shielding the developer from it, is a very bad path to follow!

    I am more than happy to train my devs when they're interested and give them rule of thumbs to follow for a DBA to review once they're complete to double check execution plans and the like. I in no way advocate shielding my devs from the schema. Exactly the opposite. I do however grief the everloving crap out of them if they start sending down ad-hoc queries instead of stabilizing it in a procedure.

    Your argument that changing the schema is easier with a stored proc, is comparing apples to oranges. That same stored proc can be implemented effectively in a single method in the application for application specific databases. And when a schema changes significantly, the application needs to change just the same!

    Not always. I've worked many an app where it wasn't an interface change but 'plumbing repairs' when we do schema modifications that the app never had to see.

    Not to mention the advantages of having that code in the same version control as the application!

    Brakes.

    I want to make sure we're clear. There's some components of SQL Server that can act like your front end controls, and are subject to effective continuous integration and can easily be swapped between versions: Procs, Functions, Security, non-indexed views.

    There is something that CANNOT be versioned that way... the tables. Schema adjustments are massive undertakings that affect everything lying on top of them. It's like screwing with the foundation of a house instead of just moving the furniture around and maybe a wall or two. Table modifications (and indexed views) are particular versionings and need to be handled with care.

    As long as that's clear, and we're discussing that, then yes. I agree that you're looking at a single roll solution instead of a dual role solution and embedding the SQL into the app code saves you a step of organization and possible mix ups. That alone is not enough of a reason for me to advocate ad-hoc querying, which is what front end app codes that don't call procs are doing. Ad-Hoc queries, executesql parameterization or not.

    You are thinking a one size fits all thought. What i write is that only in the rare exception (for most apps) you need a strored proc for security reasons, and only then you have to write one. Then a response of "it would be already there" doesn't make much sense! How would always writing procedures be better?

    Because if you've written good SQL than wrapping it in a procedure does 3 things.

    1) Security controls already built in.

    2) Proc cache optimization

    3) Injection protection

    Good SQL will be written when sent to the server like this from the front end (to protect from injection):

    EXEC sp_executeSQL 'SELECT * FROM tbl WHERE a = @a and b=@b', @a = @localVariableA, @b-2=@localVariableB

    Which is roughly the equivalent of:

    EXEC procBlah @a = @localVariableA, @b-2=@localVariableB

    So, all you've done is wrap it as a proc call and made sure that it's independently executable of the code for logistics, security, and troubleshooting. What does leaving it in the code do besides open cans of worms noone wants?

    Evil Kraig F (5/8/2013)I have not worked with DB2. At the same time I do not get the feel you understand where I am coming from. An application consist of several layers, one is the data access layer (DAL for short) and even that can be subdivided in more layers.

    I'm familiar with N-Tier architecture. I'm also familiar with a DAL using procs and/or PL/SQL and/or Paradox and/or... to talk to the invidual systems in their particular methods but giving a consistent interface to the front end devs.

    What I am talking about that that at least the lowest layer of the DAL, there is 100% awareness of the database schema. Everything acting directly on one entity is found there, including constants related to the schema. Things spanning functional groups of entities can be found in higher DAL layers.

    If you tell me you open database transactions from this layer, which is what you're implying, I'm going to cry in public and run screaming. If you don't know the concerns there hit up google for a bit for the horror stories.

    The decoupling between schema and interface in this approach does not happen at the application/database boundary, but instead above the DAL (which resides in the application). It is the interface where you generally find things out of direct alignment with the schema. Additional models supporting those interfaces represent the solution to this disconnect and keep both model groups clean. The DAL (database or otherwise) should not be inflicted with changes every time some part of the user interface is re-arranged.

    Why would the DAL be modified if you're moving labels around? If you need new/different data, you must affect the DAL, or your DAL is horribly ineffecient. This is more theory at this point, but proc or N-Tier DAL, you must affect the DAL when you affect the expectations of the usage layer.

    We are on the same page here I think. What I mean with “richer” is handling anything beyond the capabilities of the relational model and constraints. Think of keeping columns from different entities that are somehow related, in sync. You can write triggers for this too, but as usual it depends on the specifics if that is a good solution or not.

    This is coming back to the external transactions controlling data. The reason you rely on triggers is to make sure the data is always consistent within a particular transaction. You do not want external structures controlling transactional integrity. The volume of problems that occur when a simple shared connection overloads or you glitch on communications is amazing and insanely disruptive. I've seen people fired on the spot for coding it that way simply because they forced a database reboot for multiple applications with one glitch after being warned of the concerns.

    15-20 milliseconds sounds like an awful lot to me, what I have in mind will be nowhere near even one millisecond! And the cost of parameter passing you will have is identical for stored procedure calls!

    We're fine till this... however:

    And dynamic SQL, processing stored procedures isn’t free either and generally procedural logic is not efficient in SQL.

    Okay, please elaborate. If you're going to tell me GIGO then fine, it's crap no matter where a bad dev writes it. Otherwise my eyebrows are going to climb into my hairline at that assertation and wonder if we're working with the same product.

    You seem to assume that dynamic SQL send from the application DAL is somehow not parameterized. To be clear, all parameters not explicitly configured otherwise, are parameterized in what I am describing. It is just that all the sores of coding this can be removed from the programmer without alienating him/her from the actual schema and the SQL.

    No, I'm assuming if you're parameterizing it there's no gain leaving it out of a proc except for catch-all search queries and/or dynamic report building, thus any other component of the discussion is based on ad-hoc querying.

    SSRS is awful from what I seen of it, I wouldn’t dare to sell such a solution if my life depended on it (ok, maybe I would in that extreme, but you get the idea)! It’s so disconnected from what a user really needs and totally not part of the application regular users use. It also requires its own database, backup procedures, users, has a bad interface, you name it. Nothing I would like to use as a developer either, it is just really bad all-around. In short application specific reporting functionality never needs this, let alone the dependencies that come with it.

    We have particularly different experiences with SSRS, but I'm just as comfortable embedding Crystal Reports into ASP apps as well, though I'm not up on the current versions of Crystal... haven't needed to be. I've done plenty of app specific reporting on it and have little to no issues. I also find it's very easy to train my users once in it and then let them loose once new reports become available for them. They don't need to do anything but fill in parameters and get their data.

    That is just overhead and does not allow for maintainable SQL code. It does however allow you to apply procedure lever security, which as I made clear most single purpose applications databases really do not require.

    No, I'm afraid you haven't made that clear to me. Most databases are designed for single application access. Most databases use procs to add a layer of security and troubleshooting between the front end app and the back end storage. Should a user get your front end user/password, he has free reign in the database. Period. With execution rights to procs only, he doesn't. Would they eventually get to the same result? Most likely. Security measures are time sinks to slow attackers while you trace them and either block them or apprehend them.

    As to maintainable SQL code, I'm sorry, but you've given me a chuckle with that one. I have a metric crapton of SQL code out there that I maintain, and I'm sure plenty of others do too... we maintain them just fine in procs.

    In the end it is not about what is more difficult, it is about that develops easier and is cleaner. Having access to constants in an application DAL is one benefit, and source control is another. From a developer perspective having a stored procedure to call would still mean calling a proxy method for that stored procedure, which in turn is really a proxy for the SQL batch inside the procedure. That chain is overhead and complexity. Then consider aspects like error reporting when a query fails. The application being aware of the query allows much more detailed error reporting, allowing things to be fixed faster!

    And as a developer and the guy called at 3AM when the database fails, I need a quick and simple method to determine where the failure is. The error telling me that proc kldjfalsdjfalk coughed up error XYZ from the stack means I go beat on a proc. It's a separation of church and state, if you will, and can halve your troubleshooting in a single glance/test. I believe your method ADDS complexity, not reduces it, at least past the initial development phase. It's not enough to build something, you must maintain it.

    No matter what the dev's using a proxy method, I agree. There's no reason not to use the one that adds security and ease of use.

    Error reporting from an sp_executeSQL call vs. an EXEC proc call are roughtly equivalent. What differences are you expecting there?

    Can’t say I have as I do not know of CI. I did look it up and maybe you mean CodeIgniter for PHP? I simply haven’t worked with that, and I wouldn’t as I don’t like PHP at all for various reasons. I haven’t checked the latest versions to see if they bettered their life.

    Sorry. CI = Continuous Integration


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig,

    I do not have much time this day and the rest of the week, thus I am going to keep it short by making just points as a reply to you post.

    * Performance is assisted as “not required” where clauses and even joins can be left out of dynamically build queries, based on input. Dynamic SQL is not the same as a catch all query and can be applied in many more scenario's efficiently! Trying the same from a stored procedure that builds a string to get the same effect is hard to create and maintain and inefficient process.

    * Only unique instances of queries as a result from the "dynamification" process are compiled. Again, I am not talking about a catch all solution.

    * Adaptive SQL is optimized SQL, you are not carrying overhead for no purpose this way.

    * I indeed mean the availability of constant declarations in the when the SQL is part of the application (no more where 'IdStatus = 5' hardcoded).

    * Constants are part of the logic you try to apply, be it in the application or a stored procedure. Handing over constants from the application to the procedure just so you know you talk about the same values is like having an important part of the operational logic inside the application anyway. The segregation you try to maintain simply is not there then anymore, it’s a mirage as the constants are supplied from an external source.

    * Most developers know little to no SQL, because there are not told/learned to, and that is the mistake as they are perfectly capable of understanding it all. As for them not thinking in sets, i have to disagree, LINQ and such which they are eager to absorb, essentially work in sets of data too!

    * Your argument why to favor a stored procedure is flawed, except for the argument about security, which i already argued is not needed for single app databases to that extend. Procedure cache optimization and injection protection, you both get when executing batches (or single statement queries) from an application with parameterization too.

    * The way you write "Good SQL" as executed from an application is NOT how it is done. Purposely or not, your example code is SQL as you would write it in SQL and therefore looks cumbersome and hard to maintain. Do you have the impression that the only way to call anything database related from within an application is via a narrow stored procedure call interface?

    ADO.NET for example has several ways of executing SQL, specialized around what data to expect back (scalar, sets, forward read-only reading of set).

    * Opening database transactions has to happen where multiple high- or low level actions need to be grouped and succeed as one. That implies this control is always done from a higher layer in the application. For low level operations, it is perfectly reasonable that a 2nd level DAL function does this. If you do not subscribe to this, think of two independent working stored procedures that not need to succeed as one. Are you going to write a 3rd procedure doing the work of both in one transaction? Are you going to make that procedure call the other two trough exec calls? If the answer to this is yes, then good luck...and stop saying you find performance important.

    * I suspect your perception of how transactions are to be controlled in an application is not how I implied it. A developer must NEVER work with objects that directly implement transaction control over a connection. The trick is not to program against the raw ADO connection, but to have a system in place that manages the "named" connection(s) and also any transaction assigned to such names for you. All a programmer needs to do is dictate when a transaction should be in place for a certain block of code and for what named connection (data source is what I like to call it). This method allows for seamless automatic nested transactions and will not get you into the issues of multiple connections operating within the same database, doing different things even when you only have one database.

    * The lowest level of DAL will not need change if you are adding columns to tables that your application does not need to use.

    * The lowest level of DAL is NOT hand coded, when there is change, it can be auto updated from scratch from the database schema. This means low maintenance for common changes.

    * Removing columns, changing column types, removing entire tables will require some changes, but not as many as you might think. Compatibly between application logic and schema can be maintained with simple logic sometimes to ease transitions when evolving schemas. And a user interface can have its own models, tailored to its own "local" needs. This means only the interaction between this model and the DAL needs to be patched and the bulk of the code can be left intact.

    * Any parameterization for a query must be send to the database, regardless what type of SQL it is you try to parameterize. There is no true distinction in call overhead here between a plain batch with parameters and a stored procedure with identical parameters.

    * Your arguments about stored procedures helping troubleshooting are seen through the colored eyes of a DBA supporting an application not his own. For most common type of application this scenario simple is not reality as the team/company that develops it, also supports it. Also writing that constants are being send to the procedures as parameters, means even the stored procedures are not really in control.

    * As for failures as 3AM..you are making the assumption about a lone DBA having to support applications he has no control over unless nearly all logic is in stored procedures. Quite frankly, if an application stops working that we have written it is due to a hardware/network failure and not some business logic gone mad all of a sudden. For general diagnosis of issues, we have much richer error logging then what you describe. We can see what input the user posted in a form, who the user is, what he done before this action, what the state of the session is at the time of the submit, etc. Including relevant parts of the source code in text, which is better than a stack trace.

    * Do not always put yourself at the center of the application DATA universe, else you might just become that single point of failure!

Viewing 15 posts - 76 through 90 (of 92 total)

You must be logged in to reply to this topic. Login to reply