November 26, 2009 at 9:28 am
In an earlier thread ("SQL Futures") it was suggested that more explanation is needed of the advantages of the Relational Model alternatives to SQL. Here are some examples of the benefits I would expect. I am making an assumption that any future RDBMS would at least meet the requirements of a “D”-compliant system - i.e. it would have features including relation variables, relational closure and so forth.
1. More cost effective OLAP. Many decision support systems make use of non-SQL data stores for their data even where that data is sourced from a SQL-based Data Warehouse, i.e. they use a HOLAP / MOLAP architecture. A principal reason for this is that SQL doesn’t provide the multi-dimensional data structures needed for business analytics. SQL returns its results as essentially “flat” two-dimensional tables - a 2D array or collection being the only data structures supported by SQL query interfaces.
The need to support multi-dimensional data stores separately from SQL is an expensive burden. For Oracle’s OLAP option for example the licence cost is around $20K per processor. Quite apart from the software cost there is the expense of maintaining and managing data and storage in two places and keeping them consistent.
The relational model is a n-dimensional data model. Relational queries preserve keys in their results, in other words they preserve the dimensional structure of the data which SQL does not. Therefore it is possible to use an RDBMS as a multi-dimensional data store to support analytical queries without a separate data store. Given that OLAP is a multi-billion dollar market I think the savings here could be very considerable indeed.
2. Better query optimization and storage strategies. SQL and RDBMS are both logical representations of data but relational databases have certain potential advantages when it comes to optimization. By freeing the database engine of the burden of supporting duplicates in tables and queries we would get the benefit of much smarter query optimisation and processing. The final benefit is of course hard to quantify but there is undoubtedly a large, well-documented body of research into relational database optimization which the industry is currently unable to take advantage of in SQL DBMSs. This ought to represent a very significant gain for RDBMS over SQL DBMS.
3. Developer productivity. SQL is a seriously deficient language for the 21st century when compared to other languages in the object-oriented world (C++, Java or C#). SQL’s 1980’s style type support, lack of type-inheritence and lack of relation types, relation variables or relation assignment are serious omissions that certainly cost development time and effort on practically every project. Incomplete support for set-based queries and the consequent need to rely on row-by-row processing are another feature of SQL. These defects are why abstraction layers that hide SQL complexity and limitations are so popular today.
Take a few examples. The need to eliminate duplicates from queries or from tables without keys are very common requests in forums that deal with SQL problems. Assignment and comparison of tables or sets of rows are two other very frequent SQL problems. Since SQL doesn’t have any straightforward syntax for table assignment or comparison the code has to be written again and again for each new project. RDBMS doesn’t suffer from any of these problems.
Inability to support anything other than a few basic types also causes big challenges for developers who are forced to write or duplicate code to emulate native or user-defined types in other languages.
As a very conservative estimate I think it’s not unreasonable to assume a full-time developer writing complex SQL might save 1-2 days per month by using a more full-featured relational language instead. In other words it could be a 5-10% saving on development costs.
In the above I've tried to concentrate on the benefits which I think are easily demonstrable and relatively easy to estimate. There are many other advantages too but these are pretty well covered in the work of Codd, Chris Date and others so I don't think there's much need to repeat them here.
December 6, 2009 at 2:27 pm
David Portas (11/26/2009)
Relational queries preserve keys in their results, ...
I've been trying figure this out for a week now, and I just can't see how this is true. How can a relational query "preserve keys" across relational operations like PROJECT and JOIN? It just doesn't work as far as I can tell, even SELECT is questionable. Granted, there may be some subtlety that I am missing that allows it for JOIN (if so please let me know). But PROJECT seem plainly impossible, given that one or more of the key columns could be removed and that you cannot validly derive keys merely from data inferencing.
:crazy:
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 6, 2009 at 3:02 pm
RBarryYoung (12/6/2009)How can a relational query "preserve keys" across relational operations like PROJECT and JOIN?
Some examples:
P = PROJECT(a,b) R
If a is a key of R then a is a key of P
If b is a key of R then b is a key of P
Otherwise (a,b) is a key of P
Q = S JOIN T
The union of the key attributes of S and T is a superkey of Q (possibly a candidate key).
Key inference rules exist for all the relational operations. It's just a natural consquence of the closure property, ie: every result is a relation and every relation has at least one key.
December 6, 2009 at 3:41 pm
David Portas (12/6/2009)
RBarryYoung (12/6/2009)How can a relational query "preserve keys" across relational operations like PROJECT and JOIN?
Some examples:
P = PROJECT(a,b) R
If a is a key of R then a is a key of P
If b is a key of R then b is a key of P
Otherwise (a,b) is a key of P
...
But this is a trivial example that dodges the concern that I raised. Specifically, what about:
P = PROJECT(a,c,d) of R(a,b: c,d) ?
Removing one of the columns of a multi-column candidate key leaves you with an ambiguous situation wherein it is impossible to "preserve keys" in any meaningful way. Sure you could declare the whole columnset (a,c,d) as a superkey, but that's not a relational key and ultimately meaningless since you can do that with ANY relation, relvar, relational rowset, or indeed with any SQL rowset so long as you use DISTINCT with your final SELECT. There's nothing actually being preserved here.
[font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
Proactive Performance Solutions, Inc. [/font][font="Verdana"] "Performance is our middle name."[/font]
December 6, 2009 at 4:20 pm
All keys are superkeys. Identifying a candidate key of a relvar is purely a matter of interpretation because it depends on what the intended meaning of the relvar is - except in the special case where the candidate key is the empty set, which is obviously irreducible. Except in the case of the empty set, key constraints in a database are effectively superkey constraints (this incidentally being one criticism of SQL's daft "PRIMARY KEY" constraint - it's really a misleadingly named superkey constraint!).
Of course I agree that you can't necessarily derive candidate keys. Candidate keys must still exist though because every superkey implicitly contains a candidate key. So I don't see what you mean by "not a relational key".
In SQL you can only identify a superkey when there are no nullable columns and when DISTINCT is used. SQL query processors can't and don't assume that. They just don't bother to derive keys at all. Even if they did then the best they could do in many cases would be to assume the whole set of columns is a superkey. The advantage of the relational model is that you can identify smaller keys and more of them, which is much more useful. In decision support queries you normally want to distinguish the dimension attributes from the measures for example. You don't want measure columns to be considered as part of a key.
December 6, 2009 at 9:42 pm
All of this is well and interesting, unfortunately I have to work with what we have and make it perform as best as I can. I don't have the time or energy to engage in an academic discourse on relational theory, which I did learn while in college. It is a interesting topic, I must say, but until the industry can truely build a RDBMS based completely on relational theory, I will continue to work with SQL and the various databases that exist. At this moment, that happens to MS SQL Server, and it currently meets the needs of the organization for which I work.
Perhaps the academic community should strive to work more closely with the leaders in our industry to work at developing such an RDBMS as you envision. When that occurs, I'll be more than happy to learn some new and work with it.
Until then, I really don't see what I can do to change things. I am not in a real position to influence what Microsoft, Oracle, IBM, or any of the other SQL DBMS developers are doing. Yes, you could argue that my one voice added to many others could have an impact, but I need to work in the world of today, not the dreams of tomorrow and what it may provide.
December 6, 2009 at 10:23 pm
Please don't bring the nightmare of inheritence anywhere near an RDBMS. 😉 And I don't care if SQL isn't up to speed when compared to GUI languages... they're simply not built for the same thing and I tire of the people that claim they are.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 6, 2009 at 10:30 pm
SQL Server and all relational databases aren't much more than glorified file handlers. If you accept them as that, then you'll see the falacy of trying to do things like inheritence and the wad of other stuff people think it should do. SQL is not a full fledged language nor should it be. It's a simple file handling language. Tables are really nothing more than files with lines (rows) and fields (columns). It's a very simple thing... why do people keep trying to make it more complex?
And the automatic elimination of duplicates from queries won't fix anything... the engine will still have to remove them if someone writes stupid code that generates the duplicates either due to bad table design or, like I said, stupid code. Learn how to use the language and stop looking for it to make up for a lack of knowledge.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2009 at 12:29 am
Lynn Pettis (12/6/2009)
Until then, I really don't see what I can do to change things. I am not in a real position to influence what Microsoft, Oracle, IBM, or any of the other SQL DBMS developers are doing.
Lynn,
All Microsoft, Oracle and IBM's customers are in a position to influence the features and future direction of their products. Those companies spend a lot of time listening to what customers want. All of them are constantly at work on the next version of their products. So discussing future features is not dreaming - it's something that is happening or can happen now.
Which brings me back to the main reason why I began this thread. Which group of customers is having the most influence on the future direction of database systems right now? Answer: application developers. Just look at the number of new DBMS models that have sprung up in the last 2 years or so. Google, Amazon and the NoSQL open source movement are responding to what developers need and they are changing the face of the DBMS market. The big three are responding to this and it's a trend that will influence the future direction of their products for better or worse.
This is a very hot topic right now for application developers, IT executives and strategists. However, as I've said before, what I've noticed is that SQL database professionals are almost silent on the topic. They don't seem to be part of the conversation right now and I don't think that's a healthy position to be in. As a community database professionals need to recognise the limitations and criticisms of SQL and propose a way forward. Because if they don't then someone else certainly will.
December 7, 2009 at 5:10 am
David,
I continually read to keep up on the changes that occur in technology. Unfortunately I don't see what you see. Most of the NoSQL DBMS's at this time appear to be niche systems answering very specific needs, not the needs in general when it comes to mainstream computing.
Also, until general LOB commercial applications begin using these DBMS's successfully, I don't see many business moving towards them either. Until that happens I will continue to support my organization the best way I can, which is making the best use of the tools I have at hand.
December 7, 2009 at 6:20 am
Which group of customers is having the most influence on the future direction of database systems right now?
Heh... apparently, the wrong ones... look at what they did to Office 2007 and look at all the good features they removed between SQL Server 2000 and 2005.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 7, 2009 at 7:17 am
Jeff Moden (12/6/2009)
SQL Server and all relational databases aren't much more than glorified file handlers.
And OO languages are just syntactic sugar! After all, everything can be done with machine code! 😉
Tables are really nothing more than files with lines (rows) and fields (columns)
Tables are nothing of the kind! A file is a one dimensional stream of bytes. Rows and columns are just two dimensions. Relational tables on the other hand may have any number of dimensions. Data values in a relation cannot be accessed by row or column indices like a spreadsheet. I think that's a pretty important and fundamental distinction for most people.
December 7, 2009 at 9:56 am
Actually, with a bit of fore thought, it's quite easy to build a table that can be referenced as if it were a multidemensional array. It's not a common practice and it's generally frowned on when you do, but it's easy to do.
--Jeff Moden
Change is inevitable... Change for the better is not.
December 12, 2009 at 10:39 am
David,
You say everything can be done with machine code.
But does machine code understand the concept of tables?
Or do you insert the concept of tables into your machine code?
And you say:
Data values in a relation cannot be accessed by row or column indices like a spreadsheet. I think that's a pretty important and fundamental distinction for most people.
Haah: so you're an Excel freak!
I personally use SQL queries together with Excel in end-user applications. The level of detail required by the end user leaves me no choice. Thus I'm sure you're familiar with VBA.
On the other hand, you have these Excel freaks who just want a plain flat file extraction and they take care of the rest with pivot tables and filters and all that Excel stuff.
April 4, 2010 at 8:57 am
David Portas (11/26/2009)
3. Developer productivity. SQL is a seriously deficient language for the 21st century when compared to other languages in the object-oriented world (C++, Java or C#). SQL’s 1980’s style type support, lack of type-inheritence and lack of relation types, relation variables or relation assignment are serious omissions that certainly cost development time and effort on practically every project. Incomplete support for set-based queries and the consequent need to rely on row-by-row processing are another feature of SQL. These defects are why abstraction layers that hide SQL complexity and limitations are so popular today.
This is pure insanity. C++ is probably the most abysmal pig's breakfast of a programming language in serious use today, and comparing SQL (whatever dialect) to C++ is like comparing the crown jewels to horse dung. Maybe if we introduced pointers and pointer arithmetic and coercions between arithmetic types and pointers into SQL you would think we had improved the language? Pure insanity probably wasn't strong enough!
None of C++, Java, and C# makes a decent stab at polymorphism or at inheritance, none of them has classes as first class objects in the language which would be the OO equivalent of SQL having relation variables, so you are barking up the wrong tree there. And C++ doesn't even provide type abstraction ("friend", anyone?).
Take a few examples. The need to eliminate duplicates from queries or from tables without keys are very common requests in forums that deal with SQL problems.
The need to elimate duplicates from queries is exactly the same in relational calculus as it is in SQL - or have you decided (unilaterally - even Date wouldn't agree with you) that we can make do with a subset of relational algebra that doesn't include projection?
Assignment and comparison of tables or sets of rows are two other very frequent SQL problems. Since SQL doesn’t have any straightforward syntax for table assignment or comparison the code has to be written again and again for each new project. RDBMS doesn’t suffer from any of these problems.
SQL has extremely straightforward syntax for adding a rowset to a table, so I don't understand your problem with either table assignment or rowset assignment. It's a pity in fact that rowset and table are considered different and that this is reflected in the syntax, but that's a problem that most relational calculi suffer from too. And yes, it would be nice to have the symmetric difference operator on row sets (or tables, doesn't matter which) in SQL - it would save me a whole line of code (yes, that's one single short line of code) every time I wanted to use it (actually the asymmetric difference would be more useful, but that would mean I had to use it twice to do comparison for equality, so the code saving in the single uncommon instance you have picked would be less).
Inability to support anything other than a few basic types also causes big challenges for developers who are forced to write or duplicate code to emulate native or user-defined types in other languages.
Try supporting Haskell values or even Hope+ values in C++. How do I represent a continuation or a monad or for that matter a table with 1,000,000 rows each about 10,000 bytes long in C++ or in Java or in C# without going down to incredibly detailed low level implementation that I wouldn't have to consider in some other languages (in SQL for the last object named)?
As a very conservative estimate I think it’s not unreasonable to assume a full-time developer writing complex SQL might save 1-2 days per month by using a more full-featured relational language instead. In other words it could be a 5-10% saving on development costs.
Any evidence for this, or is your "conservative estimate" just a number plucked from the top of your head?
If I seem to be confrontational here, that's intentional - David Portas' confrontational and unsubstantiated claim about defects in SQL compared to languages like C++ which contain far more defects deserves a confrontational response.
Tom
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply