March 9, 2021 at 12:00 am
Comments posted to this topic are about the item Distributed SQL Databases
March 9, 2021 at 3:59 am
edited. See below
_____________
Code for TallyGenerator
March 9, 2021 at 5:56 am
This is one of my favorite topics. I really enjoy listening to Google's devs talk about how they built Cloud Spanner, like this: https://www.youtube.com/watch?v=nvlt0dA7rsQ and how AWS's teams built DynamoDB and Aurora.
I don't find myself wanting to learn how to use them myself - I just find it really fun to see how those teams solve the challenges that I see with trying to do, say, multi-master replication in SQL Server.
March 9, 2021 at 8:35 am
Given Michael Stonebraker's impact on the database world I am curious about VoltDB.
Having used Vertica and AWS RedShift for data warehousing I have to say that there is always some trade off in distributed systems. It is simply a case of whether that tradeoff gives you things you value more in return.
Vertica does allow you to enforce constraints but by default they aren't enforced because of the performance hit. We choose the enforce them between reference data tables and tables that aren't subject to regular huge inserts. Ultimately, if you don't enforce constraints they will be violated.
Data quality is a perennial problem. I see Data Quality as the elephant in the room and from what I have seen so far distributed DB systems as feeding the elephant.
March 9, 2021 at 4:04 pm
I saw a talk on conflict free replicated data types at Bits one year. Fascinating, and like Brent, I find the stories interesting, even though I don't really see the need to work with these technologies.
https://en.wikipedia.org/wiki/Conflict-free_replicated_data_type
March 9, 2021 at 6:06 pm
I find it interesting that we had a fully scalable database platform - and it was killed by HP when they discontinued development on the OS. VMS had the capability of creating a shared-everything cluster where multiple nodes in the cluster all had access to the same storage. And - we had several database products that worked in that environment.
Jeffrey Williams
“We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”
― Charles R. Swindoll
How to post questions to get better answers faster
Managing Transaction Logs
March 12, 2021 at 3:34 am
For the beginning MS should not have stuffed SQL Server with XML, JSON, columnstore, blob's other big data features which have nothing to do with relational algebra.
Placing big chunks of barely structured data into SQL Server tables makes inserts and updates "huge" (as David Poole mentioned) , transactions infinitely long, locking unbearable. To beat the locking people resort to replications - which inflates databases even worse, without actually resolving the locking issue (replication process holds its own locks as well).
Placing those non-relational things into separate data units would allow distributing databases quite easily. If instead of updating blobs they'd update pointers to those blobs then transactions would be small and locks instantenious.
For 15 years MS was driving SQL Server to the wrong direction in terms of scalability. Even if they do a U turn right now it will take some time to return to the starting point. And from there - they can only play a catch - up game. Which they are trying to play now, anyway.
_____________
Code for TallyGenerator
March 13, 2021 at 7:32 pm
Chris Date - An Introduction to Database Systems - 8th Edition - Chapter 21 Distributed Databases.
Published in 2004, but everything Date has to say about distributed DBMSs is just as relevant today as it was then.
One interesting point that Date makes is that client-server is a distributed database.
March 13, 2021 at 10:35 pm
Heh... my observation has been that a lot of people think they need to scale out but, in truth, it's because of bad database design, improper use of ORMs, and a wealth of bad code. I've also seen many times where people do scale out and then wonder why they still have the same issues. It's because they worked on the wrong thing... they should have been working on the code. 😀
"Performance is in the code... or not".
--Jeff Moden
Change is inevitable... Change for the better is not.
March 14, 2021 at 1:28 am
No code can fix performance issues caused by faulty database design.
If you think otherwise - try to fix performance (and scalability) issues on msdb.
_____________
Code for TallyGenerator
March 15, 2021 at 2:44 pm
No code can fix performance issues caused by faulty database design.
If you think otherwise - try to fix performance (and scalability) issues on msdb.
I'll be the first to agree that being able to fix performance issues caused by bad database design should never be the reason to not design a database properly. That, notwithstanding (because people generally ignore that notion)...
While I agree that faulty database design is a huge PITA and source of such issues, it's just as frequent that the database design won't be changed because of the people that "don't get it" from the very beginning or afterwards. You and I both know that sometimes you have to work with what we have unfortunately been given to deal with. You and I both (as well of a lot of other denizens of this forum) have used code to fix many performance issues.
We both also know that you can have a totally proper database design and still have performance issues because of the way people write code against it.
With that, I have to say again that "Performance is in the code... or not". 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
March 15, 2021 at 8:25 pm
Well, DDL is still code, right?
then yes, it’s in code. Of some kind.
🙂
_____________
Code for TallyGenerator
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply