Table Splitting

Question

Table Splitting

DBTeam

SSC Eights!

Points: 956
More actions
March 22, 2010 at 5:03 am

#386553

Hi All,
I have 28+ million table with 30 columns. Now we want to split into
multiple tables.
It having the information
like name info, address info , company info, employees, sales, webaddress..
This having Rowid as identity data type and
email having primary key.
daily we want all these columns data.
Please suggest us the better way to split the table.
B'coz it hurts the performance of querying process.

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 1

Without knowing more about the structure and what exactly is performing badly, it's going to be very hard to suggest solutions. Is the performance bad on inserts or on reads? If on reads, have you examined the execution plans to ensure that your indexes are being properly used? Are you maintaining the indexes and statistics?

If on inserts, it sounds like you might need to come up with a more normalized storage mechanism. That's a big topic. Again, it will be hard to advise you on what to do precisely in a post on a news group especially without adequate information. I would suggest getting more knowledgeable on general database design. Check out an old book, Handbook of Relational Database Design by Fleming and von Halle. That should move you in the right direction.

If you need more help, you'll need to post specifics. What is the existing structure? Where are your performance bottlenecks? That sort of thing.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 2

DBTeam (3/22/2010)
Hi All,
I have 28+ million table with 30 columns. Now we want to split into
multiple tables.
It having the information
like name info, address info , company info, employees, sales, webaddress..
This having Rowid as identity data type and
email having primary key.
daily we want all these columns data.
Please suggest us the better way to split the table.
B'coz it hurts the performance of querying process.

Are you importing this data into some form of staging table daily or ???

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

GSquared SSC Guru Points: 260824 More actions · Answer 3

RedGate's SQL Refactor can do table splitting for you, but it might have trouble biting off that many rows all at once.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

DBTeam SSC Eights! Points: 956 More actions · Answer 4

Not daily ,monthly 5 lakhs we are inserting into it with the insert statement.

Daily we need to pull the records from that 20Million table.

Table is keep on increasing like this.now we want to split the table.

No idea how to do that.

How is that if we split vertically?(based on ID's and foreign keys)

But dalily we need all the columns data.

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 5

If you're seeking to break it up in that way, then instead of ID & foreign key, maybe breaking it up by date makes more sense.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

DBTeam SSC Eights! Points: 956 More actions · Answer 6

DBTeam

SSC Eights!

Points: 956

March 25, 2010 at 7:25 am

#1139616

we do not have Date column

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 7

Then possibly adding it might be in order. You've pretty much said that the access is by date. If you're principal access is by date, but you split it by foreign key, then instead of isolating the reads, you're going to spread them, making them less efficient instead of more efficient.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

DBTeam SSC Eights! Points: 956 More actions · Answer 8

I did not say that accessing by date.

I mentioned that there is a possibility of adding around 5 lakhs per month.

Only id that is identitity datatype is there

remaining all are text columns.

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 9

I'm sorry, I'm not there and I don't know your structure. When you said this earlier:

But dalily we need all the columns data.

I thought that meant that you needed to access data by time.

I still don't understand where you are seeing slow performance, and without that knowledge and full knowledge of your structure, suggesting how you can partition this data is not really possible.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

DBTeam SSC Eights! Points: 956 More actions · Answer 10

Again i'm explaining my exact problem

Hi All,

Splitting a Large table

We have a 30 million + table with 30 fields

as follows

ID

name

job

company

address

city

state

zip

phone

fax

.

Email

like this we have 30 fields

In the above ID column is Identity datatype and email is clustered Primary key.

Mainting large table is becoming difficult for us.

Planning to split this table

Please suggest us the best way of how can be done this..

No front end access for this.

Do i need to go for Normalization

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 11

The problem is you're explaining the table and it's size and that you have a problem, but you're not describing what the problem is. Are you having trouble on inserts? Are you having trouble on reads? Both? What kind of queries are you running against it? Do you have execution plans?

In general, for most applications, I'd say yes, you absolutely should normalize your data unless it's a reporting system, in which case you should look at setting up a star schema or one some other type of BI oriented storage architecture. Either way, simply slapping 30 fields up with all sorts of duplicate data (not to mention, the complete lack of integrity, how many different spellings do you have in the state, country or city columns for any given location?) is a poor choice for storage.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Ben Teraberry SSChampion Points: 11845 More actions · Answer 12

Give an example of some queries you're running against the data and it would be much easier to get an understanding of what improvements would help.

Unless your business is selling spam email lists, I can almost guarantee you would be much better off picking a clustered index other than the email field. Potentially, depending on what queries you're running, you could see huge performance gains just by fixing that.

└> bt

Forum Etiquette: How to post data/code on a forum to get the best help[/url]

Johan Bijnens SSC Guru Points: 135376 More actions · Answer 13

just my 2ct

In case of splitting the columns of one table to n tables with a (unique) subset of the columns of the original table:

- keep in mind to split according to the assumed usage.

- maybe even start your exercise going to 3NF and only if needed denormalize

- since you're talking about a repeating load process, reconsider the whole load process to come to your optimal solution as soon as possible.

You should post more info on the performance issues you are experiencing today, maybe a split operation can be avoided by optimizing your current physical structures.

provide the actual ddl + indexes and a typical problem query with the consumption numbers and execution plan (save the graphical execution plan in SSMS and post it)

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Jeff Moden SSC Guru Points: 1004412 More actions · Answer 14

DBTeam (3/26/2010)
Mainting large table is becoming difficult for us.
Planning to split this table
Please suggest us the best way of how can be done this..
No front end access for this.
Do i need to go for Normalization

Yes... but don't do it without a net. Lookup "designing databases [SQL Server], normalization" in Books Online and learn how to make a plan and why. Also, be sure to read the link in that section titled "Clustered Index Design Guidelines" to see how very important all of this is.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)