July 31, 2018 at 8:30 am
So let me try to explain what we have. SQL SERVER PRO 2014.
We have an extremely large table that is continuing to grow and become a bit unmanageable. The way data model is designed, the data is always retrieved with a specific GUID key and will return normally up to about 80,000 records.
I would like to generate a specific table for every one of these guids keys rather than store all of them in the same table.
So instead of 1 table with all the records, it will be 10000 tables (and growing) with each one containing the records that would be retrieved.
My original idea was to create a function in which you passed the key, the function would create a temporary table, retrieve the data from the proper table based on the key, and return that.
This would have allowed me to join to just the data I wanted with minimal fuss. So instead of having SELECT a.field FROM table a WHERE key. It would be SELECT a.field from dbo.udfTable(key).
In the function I thought I would use a SQL VARCHAR to create the fields and the table name that is based on the key.
Unfortunately functions do not let me use EXECUTE SP_EXECUTESQL @sql.
Does anyone know a way I can accomplish this?
Thanks
July 31, 2018 at 8:39 am
Yuck. Sounds like a terrible idea.
What does your indexing on the table in question look like?
July 31, 2018 at 8:40 am
Hi,
I think it will be better to normalize the table if possible. Just keep the GUID and few other common properties in the parent table and distribute the other details into the child table(s).
July 31, 2018 at 9:22 am
This sounds like a table partitioning use case. Any reason not to use the built-in functionality?
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
July 31, 2018 at 9:52 am
Matt Miller (4) - Tuesday, July 31, 2018 9:22 AMThis sounds like a table partitioning use case. Any reason not to use the built-in functionality?
If justlikebike were using partitioning they could partition on the CHECKSUM on the GUID mod number-of tablesSELECT abs(CHECKSUM(newid())%10000);
But I think it might be only the enterprise version of SQL Server 2014 that has partitioning.
July 31, 2018 at 9:58 am
The system is Live and working. We are attempting to introduce a solution that keeps the one database from getting too larger than the hard drive as well as speed up retrieval. We have indexes, etc.
So,
1) Don't want to bring the current system down to place on a bigger hard drive that will not solve the problem of a massive database.
2) Would like to speed up inserts where table locking is slowing down inserts.
3) If we can implement multiple table solution, I feel we can dynamically grow the databases as the system grows without affecting speed, or bringing the system down.
4) Having a multiple table solution allows for a quicker simpler archiving solution.
p.s. commenting with suggestions "like this is a terrible idea" without either offering a solution or a reason are not desired.
July 31, 2018 at 11:22 am
Sorry, I got that wrong, it is SQL 2014 Enterprise.
Also another issue we have run into is the maintenance on these giant tables. A reindex takes an hour and a Reorganize takes 12 hours.
If we had many small tables, this wouldn't be an issue.
July 31, 2018 at 11:32 am
Then table partitioning is almost certainly the answer. Basically it works as a whole bunch of separate tables (called partitions) but under a common name so you can query from them as if they were a single table. Data is written/read from partitions based on a partitioning function and by creating aligned indexes you can gain all the benefits you would from having to have separate tables but without all the pain that introduces.
The alternative if you don't want to redesign the entire data structure is to pass every single query through dynamic SQL that can redirect requests to the right table. This quickly becomes a horrible experience to code for and tends to be terribly fragile.
July 31, 2018 at 11:43 am
Hi Jusklikebike,
Just another thought in case if it makes sense. You can have separate set of databases. The current relational database that you are currently using would be the OLTP part (all the insert/update operations will be done on this). But when it comes to reading these data, you can redirect these requests to another read-only database (could be relational or NoSql). You will also need to set up a synchronization from the read-write existing database to this read-only one based on your required interval. This way, you wont need to touch the existing data structure, but you might redesign the structure for the read-only part keeping in view the dynamic nature of the data.
August 1, 2018 at 7:47 am
justlikebike - Tuesday, July 31, 2018 9:58 AMThe system is Live and working. We are attempting to introduce a solution that keeps the one database from getting too larger than the hard drive as well as speed up retrieval. We have indexes, etc.
So,
1) Don't want to bring the current system down to place on a bigger hard drive that will not solve the problem of a massive database.
2) Would like to speed up inserts where table locking is slowing down inserts.
3) If we can implement multiple table solution, I feel we can dynamically grow the databases as the system grows without affecting speed, or bringing the system down.
4) Having a multiple table solution allows for a quicker simpler archiving solution.p.s. commenting with suggestions "like this is a terrible idea" without either offering a solution or a reason are not desired.
Then maybe you should solve your own problem. The idea of having 10,000 tables and growing is far more "unmanageable" than what you have now. It's quite clear that the current design isn't going to remain manageable over time, so clearly you have to do something, but in a situation such as this, without us having, at minimum, the exact same knowledge of how it's used and what it needs to accomplish, that you do, we don't stand much of a chance of doing more than guessing at the best solution. There may not actually BE any particularly good solution. Even partitioning quickly becomes unmanageable over time if you keep those partitions at very small sizes. As to the disk space problem, I'm just not sympathetic. Disk space, in relative terms, is cheap. If you can't take any down time, then blame whoever set up that nightmare to begin with instead of getting upset that folks here point out the fact that you have a "problem". You may not really have a choice on down time. Setting up a system and expecting 100% 24 hour availability has NEVER been realistic for anyone, ever. You are going to need maintenance windows on some kind of regular basis, no matter how good you perceive yourselves to be. Maybe it's only once a quarter or perhaps even just annually, as there are almost always power shutdowns to worry about. Don't assume that other folks here don't think about these things. If you don't want the truth, then don't post here. We can only help you to the extent you provide information for us to use to try and assist you. Ten thousand and growing as a number of tables, or even a number of partitions, just doesn't sound like it's manageable in any practical way over the long term. Seems to me more likely that you are long past the point where you need to upgrade your hardware, because if you are running out of disk space because someone "cheaped out" on that expenditure, it wouldn't be too surprising to find out that your CPU and RAM specs on that server might be in a similar state - underfunded and possibly "ignored for too long already". Maybe we should be disappointed, given that you are so smart that you can afford to criticize others who are trying to help you, but are not smart enough to have avoided this particular nightmare.in the first place...
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
August 1, 2018 at 8:11 am
A Partitioned View may work in this situation.
August 1, 2018 at 8:42 am
How many rows and what table size are we talking about here?
I have a table with 2.4 billion rows and although it isn't my favourite table, I don't feel the need to carve it up in such a manner.
Describe the scale of the issue, so we have some idea of your problem.
The table schema could be informative, too.
August 1, 2018 at 8:47 am
tripleAxe - Wednesday, August 1, 2018 8:11 AMA Partitioned View may work in this situation.
Table partitioning may be a solution but limited. SQL Server 2017 supports up to 15,000 partitions by default. Done properly the partitions could be spread out across multiple files groups on multiple disks to help manage disk space. The problem I see with this is that you will eventually hit a tipping point regarding the number of partitions. Also, partitioning isn't a performance tool, it is a tool for managing data. If you already need 10,000 tables or partitions you have another problem. Just how much data do you really need to keep in the system. You should also look at archiving data to reduce the amount of data in the table.
You may not like this suggestion but here it is, hire a consultant to come in and help you with this problem. This isn't something you are going to resolve asking for help from the internet. It simply comes down to the fact we can't see what you see, and we don't know how you application works or uses the data.
August 1, 2018 at 9:17 am
This sounds like you are searching for a single answer to many different questions.
The time it takes maintenance to run can be addressed. How are you re-indexing? When is the reindex triggered? Is is done when needed, or blindly done on a schedule? What causes the fragmentation? Have steps been taken to prevent fragmentation, thereby removing the need to reindex? Finally, maybe stop reindexing. It may not make any sense.
If there are locking and blocking issues on this large table, then the answer is likely not going to be partitioning, breaking it apart into thousands of smaller tables, or any number of other ideas. This is a separate issue, and should be addressed as such.
Slow retrieval of data is also a separate issue. What specifically is the issue, and what has been done to alleviate that? Have you attempted to tune the queries? Are you indexes correct? Do statistics need to be updated more frequently? Some possible thoughts, suggestions, and additional questions include:
Pre-aggragation of data. Has this method been considered?
Do all of the selects need to be on real time data? Can you offload some of the workload to different tables that are populated by an ETL?
Can the application be re-written to reduce the number of calls to the DB and the amount of data being retrieved from it? Has caching been explored?
Here's an example. In one of our systems, there is a proc that was being called hundreds of times a minute. It returns a single code. For each session (user/client), the value does not change. It actually only needed to be retrieved once. The original developers were using the call to the procedure like a global variable. The proc ran in a few milliseconds; it used the clustered key and read one row to get the value. But the fact that this was called millions of times an hour caused stress on the system. Once the dev's changed the code to stop doing this, the DB servers were a lot happier. The original thought was to add more RAM and CPU to fix the performance issues. The moral here is that unless you have completed some in-depth analysis of the causes, and break these causes apart, your initial idea is going to move the problems from one place to another.
You asked a simple questions, what do we think about creating thousands of separate tables to fix this. The response is going to be equally simple. We think it's a bad idea. Without a more in-depth analysis, getting any ideas that may solve your issues is likely not possible.
Michael L John
If you assassinate a DBA, would you pull a trigger?
To properly post on a forum:
http://www.sqlservercentral.com/articles/61537/
August 1, 2018 at 9:30 am
Thank you for your responses.
I will run some testing with partitioning as well as the checksum idea, I like that.
I understand the issues with responding to simple questions, I was attempting to keep the questions specific, I obviously didn't do a good job. My apologies if I insulted anyone, that was not my intent.
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply