February 4, 2020 at 10:01 am
We have a table (tblASAP_Detail_v3) which we fill from different sources and which was derived from an Excel with around 100 columns. This table will then be used in a view we use for an Excel Pivot.
The table has grown and grown and needs around 3.7 GB for 2.2 Million records. (Every year it will grow for 1 Million records.) In order to "shrink" it I want to get rid of columns which I then will replace with Values from joined tables and views. Which is a good approach, right? Less space used for the table tblASAP_Detail_v3.
Right now I have 6 joins and would add more, but I read this article which made me suspicious:
https://www.sqlservercentral.com/blogs/a-view-on-views
Is there a limit to joined tables?
Would it be wise to split the table to designated tables for each year (tblASAP_Detail_v3_2019, tblASAP_Detail_v3_2020 and so on), which we then join for the union? Because for inserts we only insert in one year (we insert actual data monthly, budget data only between October and February).
Christoph Kauflin
working in Controlling, but trying to program the tools we need to do that
February 4, 2020 at 2:50 pm
do you have enterprise edition ? I would advise against the union solution - if you have enterprise then use table partitioning based on the date.
how often is the data for previous years queried?
MVDBA
February 4, 2020 at 4:25 pm
No, it is the size of the Table on SQL-Server, when I check the report Disk Usage by top tables. Excel is only 80 MB.
Christoph Kauflin
working in Controlling, but trying to program the tools we need to do that
February 4, 2020 at 4:28 pm
No it is the standard edition.
Usually we need two years, the current one (which still is 2019 right now) and the next one (2020) for Budget. Currently I am changing the Pivot Table Excel source from view to stored procedure, which enables passing parameters to it.
Christoph Kauflin
working in Controlling, but trying to program the tools we need to do that
February 4, 2020 at 5:02 pm
You can use table partitioning with standard edition on SQL Server 2017. I believe starting with 2016 SP1 there is support for partitioning with all editions.
Sue
February 4, 2020 at 9:35 pm
I believe starting with 2016 SP1 there is support for partitioning with all editions.
Sue is correct.
More details (2016): https://docs.microsoft.com/en-gb/sql/sql-server/editions-and-components-of-sql-server-2016?view=sql-server-ver15
More details (2017): https://docs.microsoft.com/en-gb/sql/sql-server/editions-and-components-of-sql-server-2017?view=sql-server-ver15
There is an overview of partitioning here https://docs.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-ver15
Can I ask your reasons for shrinking the table? Do you have problems with the storage space required for the data?
To answer your question about moving columns to other tables. Yes it is generally a good approach because it avoids issues such as data repetition (which can cause a few problems, for example, if someone updates one copy of the data and not the other). However implementing it on an existing system could introduce complications. Is the View for Excel Pivot the only use for this table?
Also sorry if I'm stating the obvious but changing the table design is likely to impact the processes used to load the data. I'm not saying it's a bad impact, it might help.
February 5, 2020 at 6:58 am
Thanks (to all) for the advice and links. This seems to be helpful too.
Yes, one reason is space. Our database-administrator is really being difficult about the amount of space we are using, for data and for the log. The table is used for reporting (a view restricting the available rows) with the excel pivot. But we also use it for budgeting, so with Access-frontends people input their numbers for the budget. So space is one issue.
You are right with the second last paragraph (…data repetition…):
But the other reason is that we import the data, from 20 Excel sheets, exported from different programs such as SAP, payment program. We then enrich it, using several stored procedures, it with additional columns for organizational and cost structure, which I want to replace with views. Organizational structuring is not so difficult. But the structuring of the cost (6 levels) is really confusing. Up to now I didn’t know how to do this with a cost view. (I have one for direct cost, one for indirect cost and one for personnel cost.) At the end of the process we had several stored procedures which we called cosmetic, taking more and more time to run. Every time something changed in the organization or renaming of cost, which happened surprisingly regularly, we had to run the “cosmetic” procedures.
And we are also aware of the issue mentioned in the last paragraph, adapting our processes, which is why we moved it from the access databases to the MS SQL databases without least changes necessary, because we had to learn enough already. Now that we know more, we try to improve our solutions
I must admit that I (we) are learning by doing, which is an efficient way to learn, but also quite stressful, because mistakes directly affect our customers. (We have a development database for a few months now …)
Christoph Kauflin
working in Controlling, but trying to program the tools we need to do that
February 5, 2020 at 8:49 am
if space is the issue then look at stretch to azure - it's not cheap but it is a monthly cost that offsets you buying a new SAN/Server - and it is a zero code change solution.. just read the documentation and choose the correct tables
MVDBA
February 5, 2020 at 2:28 pm
One thing that I forgot to mention about partitions is that one of the things you can do is store different parts of the table on different storage. The application should not be aware of this, it's managed through SQL Server.
I am not saying Azure is a good, or bad, option. I'm only adding to the information in my earlier post.
Also the log file size might be linked to your backup plan. Do you know which recovery model you are using for the database and how often you are doing backups?
Do you know if the log file is increasing significantly during the data import? I think 'bulk logged' recovery mode is sometimes used in this situation. Sorry I can't provide any specific information or advice about it because I've never used it.
February 5, 2020 at 2:52 pm
Do you know if the log file is increasing significantly during the data import? I think 'bulk logged' recovery mode is sometimes used in this situation. Sorry I can't provide any specific information or advice about it because I've never used it.
that's because no-one in history has ever used bulk logged 🙂
MVDBA
February 5, 2020 at 3:21 pm
I don't know about the backups, this concerns (only) my db-Admin. He restored some dbs, so that seems to work 😉
but when I copy a big chunk of data, sometimes I have to do it in a loop, for each month differently, because the 4 BG logfiles seems not to be big enough. I am trying out the partitioning on my netbook, but with SQL Server Express I run out of memory, 10 GB is the limit...
Christoph Kauflin
working in Controlling, but trying to program the tools we need to do that
February 5, 2020 at 3:38 pm
I don't know about the backups, this concerns (only) my db-Admin. He restored some dbs, so that seems to work 😉
but when I copy a big chunk of data, sometimes I have to do it in a loop, for each month differently, because the 4 BG logfiles seems not to be big enough. I am trying out the partitioning on my netbook, but with SQL Server Express I run out of memory, 10 GB is the limit...
you do realise that developer edition is free for your netbook as long as it's not a production machine? not sure why you are using express.
as for looping - as much as i dislike it (for total transaction atomicity) it does make things faster and easier - i test ten thousand rows, then test 100 thousand rows and work out the scale of economy - is it quicker to do 100k in one batch or in10 batches... just keep experimenting on that one
MVDBA
February 5, 2020 at 4:11 pm
My IT is quite strict and they wouldn't install developer edition because of licensing reasons, but as you say, it is not a production machine. Maybe I will install it myself...
Christoph Kauflin
working in Controlling, but trying to program the tools we need to do that
February 5, 2020 at 4:20 pm
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply