April 25, 2018 at 7:41 am
Hello,
I am very new to SQL and am in the process of building my first database. I have one data set that is 20,000 rows and 500 columns. This is quarterly financial data for multiple companies. The database will contain 10 of these tables (to start), it will grow to over 40. This leads to my question. Would it be best to load the all the data (20,000 rows X 500 columns) in one table for Q1 and the same for Q2, Q3 and Q4, respectively. Or should the data be divided into multiple tables to increase speed?
Lastly, is SQL server a secured environment or is it open where others can potentially access your code and or database?
Thanks.
April 25, 2018 at 7:52 am
kenfred88 - Wednesday, April 25, 2018 7:41 AMHello,I am very new to SQL and am in the process of building my first database. I have one data set that is 20,000 rows and 500 columns. This is quarterly financial data for multiple companies. The database will contain 10 of these tables (to start), it will grow to over 40. This leads to my question. Would it be best to load the all the data (20,000 rows X 500 columns) in one table for Q1 and the same for Q2, Q3 and Q4, respectively. Or should the data be divided into multiple tables to increase speed?
Lastly, is SQL server a secured environment or is it open where others can potentially access your code and or database?
Thanks.
500 columns will make for extremely difficult coding. Why so many?
SQL Server has numerous security features. You can lock down to whatever level is appropriate for your data and your customers.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
April 25, 2018 at 8:06 am
Thanks for your reply.
The reason for the number of columns is 2 fold. The first is that is how the files are sent over to me in .csv format. The second is that it includes all the fields in the financial reports submitted by each business. It is an aggregated table that has all the date in each quarter in one tab in excel.
April 25, 2018 at 8:11 am
500 columns is a lot to put into a single table. It is also not a good approach to create a seperate table for each quarter.
Take a good look at the contents of your data and apply the normalization rules (up to 3NF is commonly used). This will minimize duplicate data and structure the contents accross logical tables. Here's some explanation with a simple example to start with. Search the web for more explanation and samples.
April 25, 2018 at 8:13 am
Thank you
April 25, 2018 at 8:14 am
kenfred88 - Wednesday, April 25, 2018 8:06 AMThanks for your reply.The reason for the number of columns is 2 fold. The first is that is how the files are sent over to me in .csv format. The second is that it includes all the fields in the financial reports submitted by each business. It is an aggregated table that has all the date in each quarter in one tab in excel.
Perhaps loading the data from Excel into a staging table can be considered for (mainly) performance reasons. From the staging table you split the data to a normalized database structure. But you could also consider to split the data to a normalized form during the loading process. A tool like SSIS can be used to load the data and split it before writing it to the different destination tables.
April 25, 2018 at 8:30 am
Greatly appreciated!
Since it was suggested that loading all the data by quarter is not the best approach. Do you have suggestion on what the tables would look like? To dumb this down a bit for my sake, suppose we have a balance sheet and P&L and supplemental reports with individual fields for each business. Do you think it is best to list ASSETS as its own table and include all the quarterly data (10 years) in its own table? Asking as I would then have 500 tables given the number of fields.
April 25, 2018 at 8:34 am
Really it depends on how much you actually know about the data and what you're planning to do with the data downstream. At the end of the day 20,000 columns per quarter even for 40 tables isn't a very large amount of data for SQL Server to handle. If all these are are reports you're getting from various companies and people just need access to them there might not be any value in trying to normalize them(assuming that's even possible)
April 25, 2018 at 8:38 am
kenfred88 - Wednesday, April 25, 2018 8:30 AMGreatly appreciated!Since it was suggested that loading all the data by quarter is not the best approach. Do you have suggestion on what the tables would look like? To dumb this down a bit for my sake, suppose we have a balance sheet and P&L and supplemental reports with individual fields for each business. Do you think it is best to list ASSETS as its own table and include all the quarterly data (10 years) in its own table? Asking as I would then have 500 tables given the number of fields.
Han Shi has suggested "normalisation" and provided links - have a read, but essentially yes. You want a table for each entity, which will include Company, Assets, BalanceSheet and P&L. Columns for Company and Datetime will differentiate rows - you don't need separate tables for quarters or companies.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
May 1, 2018 at 2:28 pm
ChrisM@Work - Wednesday, April 25, 2018 8:38 AMkenfred88 - Wednesday, April 25, 2018 8:30 AMGreatly appreciated!Since it was suggested that loading all the data by quarter is not the best approach. Do you have suggestion on what the tables would look like? To dumb this down a bit for my sake, suppose we have a balance sheet and P&L and supplemental reports with individual fields for each business. Do you think it is best to list ASSETS as its own table and include all the quarterly data (10 years) in its own table? Asking as I would then have 500 tables given the number of fields.
Han Shi has suggested "normalisation" and provided links - have a read, but essentially yes. You want a table for each entity, which will include Company, Assets, BalanceSheet and P&L. Columns for Company and Datetime will differentiate rows - you don't need separate tables for quarters or companies.
Unless those companies are your customers, and there's a regulatory need to maintain separation of data.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply