One of the supposed advantages of the EF is that it abstracts you from
your database implementation. This is considered to be a good thing. It
lets me move between different database products (SQL Server, Oracle,
MySQL, etc) without having to rewrite anything the business tier of my
application. I just need to hook up a different provider to the entity
framework and I'm done. All of my business logic and query construction
stays at the application layer.
Ironically, this paradigm, which
comes from the object oriented world, fails to recognize one of the
most basic tenents of object oriented design. The separation of
interface from implementation!
First of all, I'd better point
out the reason for this post. Why is it a bad thing to put the query
construction into automated tools like the entity framework or
nHibernate, or into the hands of the application layer more generally?
The
short answer is performance. Databases aren't given much respect in
many organizations. It's just a place to persist information, right?
No, of course not, that's completely wrong. A database is an extremely
complex system of physical constructs like indexes, partitions, columns
composed of data types, keys, etc. SQL databases also have a
standardized (though vendor extended) language for data retrieval and
modification. This, of course, is SQL.
SQL is not a procedural
language. SQL is a declarative language. Procedural programming fu do
not translate to SQL-fu, and vice versa.
A person focused on
database development generally knows at least a little something about
database design, indexes, partitions, keys, and of course SQL. There's
a lot to know. I've been doing it for 7 years and still learn new and
complex technical details about SQL Server every week, if not every
day. A person focused on database development also, as a rule, knows a
little something about SQL.
A procedural programmer, as a rule
does not. Or at least, obviously someone who has spent 10 years mostly
focused on, say, .NET won't know as much about it as someone who has
spent the same amount of time focused on, say, SQL Server. The people
who wrote an object-relational mapping tool might know a little, but
they don't necessarily. And more importantly, the tool certainly
doesn't know anything about your specific indexes, partitions,
performance requirements, storage system, etc.
Long story short,
this means that the ability to write sensible and efficient SQL for
your specific database will only ever be possible by someone with
knowledge of the logical and physical structure of that database, and
never by an automated tool, or a tool which translates one language
(like LINQ) to another (like SQL).
Now here's the key to this
post. An ORM layer which dictates certain database requirements and
builds your queries for you is COMPLETELY UNNECESSARY in order to
achieve the goal of database vendor independence. The belief that it
accomplishes this disregards the separation of interface from
implementation.
Your database developers can, and should, do
this for you by providing the application code with an API. From an
OOAD point of view, this is simply the separation of public methods
from private data, and has been around FOREVER. How do we accomplish
this? With stored procedures, or routines. If we switch vendors the
only thing we have to do is ensure that the database provides the same
set of routines to the application layer. The code inside those methods
is irrelevant to the API itself. I can build a GetCustomer routine
(stored proc) in SQL. If a decision is made to move to Oracle, all that
matters is that the GetCustomer routine is provided, taking the same
parameters and returning the same results. The fact that internally it
is built using PL/SQL instead of T-SQL is completely hidden from the
application layer.
This does mean that the code INSIDE the
routines may have to be rewritten. I say may because if standards
compliant SQL is used, it doesn't. You can literally do a copy and
paste. But the thing is, you probably want to rewrite that code anyway.
Why? For the reasons I discussed above. The performance of your queries
depends extremely heavily on your logical data model, your physical
database structures, and requires a deep knowledge of same. That's
precisely the problem with automated query generation tools... they
don't. Your performance can be orders of magnitude worse if you don't
tune your queries based on your structures, and vice versa.
Need
a concrete example? What about nHibernate. GUID's are usually needed
for object creation, so each object gets it's own unique GUID which is
then used for lookup. Using GUIDs in a database has the potential for
MASSIVE performance loss. I'm talking orders of magnitude on query
times, as well as vastly (VASTLY) increasing the physical size of the
database on disk. Why? The reasons are complex, which is exactly the
point. Care in how they are used is required to avoid this. But do the
application programmers know the pitfalls? No, they're application
programmers! They don't have to know anything about databases, that's
what nHibernate is for, right?