Distributed SQL Databases

  • Comments posted to this topic are about the item Distributed SQL Databases

  • edited. See below

    • This reply was modified 3 years, 9 months ago by  Sergiy. Reason: messy spelling and wording

    _____________
    Code for TallyGenerator

  • This is one of my favorite topics. I really enjoy listening to Google's devs talk about how they built Cloud Spanner, like this: https://www.youtube.com/watch?v=nvlt0dA7rsQ and how AWS's teams built DynamoDB and Aurora.

    I don't find myself wanting to learn how to use them myself - I just find it really fun to see how those teams solve the challenges that I see with trying to do, say, multi-master replication in SQL Server.

  • Given Michael Stonebraker's impact on the database world I am curious about  VoltDB.

    Having used Vertica and AWS RedShift for data warehousing I have to say that there is always some trade off in distributed systems.  It is simply a case of whether that tradeoff gives you things you value more in return.

    Vertica does allow you to enforce constraints but by default they aren't enforced because of the performance hit.  We choose the enforce them between reference data tables and tables that aren't subject to regular huge inserts.  Ultimately, if you don't enforce constraints they will be violated.

    Data quality is a perennial problem.  I see Data Quality as the elephant in the room and from what I have seen so far distributed DB systems as feeding the elephant.

  • I saw a talk on conflict free replicated data types at Bits one year. Fascinating, and like Brent, I find the stories interesting, even though I don't really see the need to work with these technologies.

    https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type

  • I find it interesting that we had a fully scalable database platform - and it was killed by HP when they discontinued development on the OS.  VMS had the capability of creating a shared-everything cluster where multiple nodes in the cluster all had access to the same storage.  And - we had several database products that worked in that environment.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • For the beginning MS  should not have stuffed SQL Server with  XML, JSON, columnstore, blob's other big data features which have nothing to do with relational algebra.

    Placing big chunks of barely structured data into SQL Server tables makes inserts and updates "huge" (as David Poole mentioned) , transactions infinitely long, locking unbearable. To beat the locking people resort to replications - which inflates databases even worse, without actually resolving the locking issue (replication process holds its own locks as well).

    Placing those non-relational things into separate data units would allow distributing databases quite easily. If instead of updating blobs they'd update pointers to those blobs then transactions would be small and locks instantenious.

    For 15 years MS was driving SQL Server to the wrong direction in terms of scalability. Even if they do a U turn right now it will take some time to return to the starting point. And from there - they can only play a catch - up game. Which they are trying to play now, anyway.

    _____________
    Code for TallyGenerator

  • Chris Date - An Introduction to Database Systems - 8th Edition  - Chapter 21 Distributed Databases.

    Published in 2004, but everything Date has to say about distributed DBMSs is just as relevant today as it was then.

    One interesting point that Date makes is that client-server is a distributed database.

  • Heh... my observation has been that a lot of people think they need to scale out but, in truth, it's because of bad database design, improper use of ORMs, and a wealth of bad code.  I've also seen many times where people do scale out and then wonder why they still have the same issues.  It's because they worked on the wrong thing... they should have been working on the code. 😀

    "Performance is in the code... or not".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • No code can fix performance issues caused by faulty database design.

    If you think otherwise - try to fix performance (and scalability) issues on msdb.

    _____________
    Code for TallyGenerator

  • Sergiy wrote:

    No code can fix performance issues caused by faulty database design.

    If you think otherwise - try to fix performance (and scalability) issues on msdb.

    I'll be the first to agree that being able to fix performance issues caused by bad database design should never be the reason to not design a database properly.  That, notwithstanding (because people generally ignore that notion)...

    While I agree that faulty database design is a huge PITA and source of such issues, it's just as frequent that the database design won't be changed because of the people that "don't get it" from the very beginning or afterwards.  You and I both know that sometimes you have to work with what we have unfortunately been given to deal with.  You and I both (as well of a lot of other denizens of this forum) have used code to fix many performance issues.

    We both also know that you can have a totally proper database design and still have performance issues because of the way people write code against it.

    With that, I have to say again that "Performance is in the code... or not". 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Well, DDL is still code, right?

    then yes, it’s in code. Of some kind.

    🙂

    _____________
    Code for TallyGenerator

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply