March 12, 2012 at 9:02 am
Hello,
I am converting data from thousands of old DBF tables into SQL Server 2008. Each Old DBF table has a maximum of 25 fields. There are 300 different fields in total from all the Tables. There are 3 similar groups of data from these tables. Similar meaning each group would be indexed
the same. There is a total of 45 million records. The groups are not equal and the largest group will have 30 million records. Also relevant to my
question is number of joins to get to this data. A very common query will be something like this:
Select xxx From Client
Join On Warehouse
Join On Boxes
Join On Files
And Files would of course be my 45 million records. I'm a bit Gun-Shy about creating a table with 300 fields. I don't know why, it just feels wrong. It seems to large, like I'd always be searching for the right column when viewing data. I also have never used a data dictionary with a table containing 30 million records nor have I introduced one knowing it would be a 5th or 6th join in common queries. So, my question, which is more like advise, Do I make my Files table 25 fields and use a data dictionary or make the 300 column table?
I realize how generic this is, I am hoping to tap into some extensive expirence for a general guideline.
Thanks,
TP
March 12, 2012 at 12:21 pm
You are correct, question is a bit generic and exposes very little backgroud on the specifications therefore is a bit hard to guess.
Either way, is this an OLTP system?
In the affirmative case I would suggest to model at least to the third normal form a.k.a. 3NF
Client, Warehouse, Boxes and, Files appear to be different entities - in a E/R model - which usually are physically implemented as different tables.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.March 12, 2012 at 12:29 pm
I dont think there is anythig wrong with having 300 columns in a table , however are they all sparse columns ?
the three groups should be the starting place and you can consider vertical and horizontal paritioning if required.
the current setup leads to too many joins can cause issues if th data in these table are highly transactional
March 13, 2012 at 5:52 am
Jayanth_Kurup (3/12/2012)
...the current setup leads to too many joins can cause issues if th data in these table are highly transactional
Not necessarily. It all depends on having a proper indexing strategy; having multiple joins does not equates to having multiple large scans and in most cases the magic of data buffering reduces the need of most physical I/O to a minimun.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply