Distributed SQL Databases

Question

Post reply

Distributed SQL Databases

Steve Jones - SSC Editor

SSC Guru

Points: 740346
More actions
March 9, 2021 at 12:00 am

#3849294

Comments posted to this topic are about the item Distributed SQL Databases

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Sergiy SSC Guru Points: 110209 More actions · Answer 1

edited. See below

This reply was modified 4 years, 10 months ago by Sergiy. Reason: messy spelling and wording

_____________
Code for TallyGenerator

Brent Ozar SSCrazy Points: 2455 More actions · Answer 2

This is one of my favorite topics. I really enjoy listening to Google's devs talk about how they built Cloud Spanner, like this: https://www.youtube.com/watch?v=nvlt0dA7rsQ and how AWS's teams built DynamoDB and Aurora.

I don't find myself wanting to learn how to use them myself - I just find it really fun to see how those teams solve the challenges that I see with trying to do, say, multi-master replication in SQL Server.

David.Poole SSC Guru Points: 76323 More actions · Answer 3

Given Michael Stonebraker's impact on the database world I am curious about VoltDB.

Having used Vertica and AWS RedShift for data warehousing I have to say that there is always some trade off in distributed systems. It is simply a case of whether that tradeoff gives you things you value more in return.

Vertica does allow you to enforce constraints but by default they aren't enforced because of the performance hit. We choose the enforce them between reference data tables and tables that aren't subject to regular huge inserts. Ultimately, if you don't enforce constraints they will be violated.

Data quality is a perennial problem. I see Data Quality as the elephant in the room and from what I have seen so far distributed DB systems as feeding the elephant.

LinkedIn Profile

Steve Jones - SSC Editor SSC Guru Points: 740346 More actions · Answer 4

I saw a talk on conflict free replicated data types at Bits one year. Fascinating, and like Brent, I find the stories interesting, even though I don't really see the need to work with these technologies.

https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type

Jeffrey Williams SSC Guru Points: 90351 More actions · Answer 5

I find it interesting that we had a fully scalable database platform - and it was killed by HP when they discontinued development on the OS. VMS had the capability of creating a shared-everything cluster where multiple nodes in the cluster all had access to the same storage. And - we had several database products that worked in that environment.

Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

― Charles R. Swindoll

How to post questions to get better answers faster
Managing Transaction Logs

Sergiy SSC Guru Points: 110209 More actions · Answer 6

For the beginning MS should not have stuffed SQL Server with XML, JSON, columnstore, blob's other big data features which have nothing to do with relational algebra.

Placing big chunks of barely structured data into SQL Server tables makes inserts and updates "huge" (as David Poole mentioned) , transactions infinitely long, locking unbearable. To beat the locking people resort to replications - which inflates databases even worse, without actually resolving the locking issue (replication process holds its own locks as well).

Placing those non-relational things into separate data units would allow distributing databases quite easily. If instead of updating blobs they'd update pointers to those blobs then transactions would be small and locks instantenious.

For 15 years MS was driving SQL Server to the wrong direction in terms of scalability. Even if they do a U turn right now it will take some time to return to the starting point. And from there - they can only play a catch - up game. Which they are trying to play now, anyway.

_____________
Code for TallyGenerator

will 58232 SSC-Addicted Points: 479 More actions · Answer 7

Chris Date - An Introduction to Database Systems - 8th Edition - Chapter 21 Distributed Databases.

Published in 2004, but everything Date has to say about distributed DBMSs is just as relevant today as it was then.

One interesting point that Date makes is that client-server is a distributed database.

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 8

Heh... my observation has been that a lot of people think they need to scale out but, in truth, it's because of bad database design, improper use of ORMs, and a wealth of bad code. I've also seen many times where people do scale out and then wonder why they still have the same issues. It's because they worked on the wrong thing... they should have been working on the code. 😀

"Performance is in the code... or not".

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110209 More actions · Answer 9

No code can fix performance issues caused by faulty database design.

If you think otherwise - try to fix performance (and scalability) issues on msdb.

_____________
Code for TallyGenerator

Jeff Moden SSC Guru Points: 1004686 More actions · Answer 10

Sergiy wrote:

No code can fix performance issues caused by faulty database design.
If you think otherwise - try to fix performance (and scalability) issues on msdb.

I'll be the first to agree that being able to fix performance issues caused by bad database design should never be the reason to not design a database properly. That, notwithstanding (because people generally ignore that notion)...

While I agree that faulty database design is a huge PITA and source of such issues, it's just as frequent that the database design won't be changed because of the people that "don't get it" from the very beginning or afterwards. You and I both know that sometimes you have to work with what we have unfortunately been given to deal with. You and I both (as well of a lot of other denizens of this forum) have used code to fix many performance issues.

We both also know that you can have a totally proper database design and still have performance issues because of the way people write code against it.

With that, I have to say again that "Performance is in the code... or not". 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Sergiy SSC Guru Points: 110209 More actions · Answer 11

Well, DDL is still code, right?

then yes, it’s in code. Of some kind.

🙂

_____________
Code for TallyGenerator