January 12, 2010 at 8:16 am
Hi SQL experts:
I’m working on a project in which we need to store huge amount of data (more than 3TB). For customers we only support simple read-only operations which will be finished in several seconds. Due to the disk space and performance issue, we are considering to partition the data to multiple machines.
I can think out a simple plan. I define a partition function, place data to several machines according to the function, and clients should use the same function to determine which machine to query against. But this plan has some shortcomings:
(1)Clients are close coupled with Server.
(2)The partition function should be carefully designed. Use another function in the future is risky because of (1)
Maybe we can add a web service as a query router in front of these servers, which redirects queries to proper backend SQL Servers. But this adds the complexity a lot and the web service could become a bottleneck.
Do we have a better solution for this scenario?
Thanks in advance for any comments
January 12, 2010 at 9:29 am
I would consider having a middle tier for this. The clients would always hit this tier and the logic will handle all the fetching/pre-fetching and caching data as needed to handle these requests.
Of course I would need more information regarding application and data details in order to provide architectural specifics.
The probability of survival is inversely proportional to the angle of arrival.
January 12, 2010 at 1:51 pm
This same question has been asked 6 times in different forums. Please only create one thread per question as it causes confusion for those who frequently answer questions. It also takes away for the debate that could ensue if all posters were working the same thread. It is beneficial for everyone, including you, to keep everything inside one thread.
With that said, let's continue this discussion here.
January 12, 2010 at 5:06 pm
Some lite reading:
http://sqlblog.com/blogs/linchi_shea/archive/2007/01/02/some-vldb-availability-tidbits.aspx
http://msdn.microsoft.com/en-us/library/ms345146(SQL.90).aspx
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
January 12, 2010 at 11:16 pm
Hi Sturner:
Thanks for your reply.
Basically we might have more than 6TB data and several servers, each with 3TB disks. Does SQL server support this naturally except the middle tier solution?
Please let me know if you need to know more detail information.
January 13, 2010 at 8:49 pm
I'm so sorry to post duplicate messages and cause the confusion. but John's link points to my other topic "How to switch data online". So let's discuss just talk about "How to partition huge data to multiple machines?" here.
Thanks:)
John Rowan (1/12/2010)
This same question has been asked 6 times in different forums. Please only create one thread per question as it causes confusion for those who frequently answer questions. It also takes away for the debate that could ensue if all posters were working the same thread. It is beneficial for everyone, including you, to keep everything inside one thread.With that said, let's continue this discussion here.
January 14, 2010 at 7:58 am
If you wish to be successful in this endeavour, which is complex and has high-end requirements/constraints, you REALLY need to get a qualified professional to assist you. It is my professional opinion that you cannot succeed without assistance. No amount of forum back-and-forth will get you to a point where probability of a good outcome is very high.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 14, 2010 at 8:14 am
xuejianpan (1/13/2010)
I'm so sorry to post duplicate messages and cause the confusion. but John's link points to my other topic "How to switch data online". So let's discuss just talk about "How to partition huge data to multiple machines?" here.Thanks:)
John Rowan (1/12/2010)
This same question has been asked 6 times in different forums. Please only create one thread per question as it causes confusion for those who frequently answer questions. It also takes away for the debate that could ensue if all posters were working the same thread. It is beneficial for everyone, including you, to keep everything inside one thread.With that said, let's continue this discussion here.
Sorry if I added confusion to the threads. I lumped them all together because switching data and partitioning your data are both solved by SQL Server table partitioning.
Like TheSQLGuru stated, this is a pretty complex deal that is not easily implemented without expreience. Getting some help is a good idea, but becoming proficient with the technology yourself is not a bad idea either. Depending on the time frame your working with, you could learn enough to do this on your own.
So back to SQL Server table partitioning. Here's a great read from Kimberly Tripp on the subject.
January 15, 2010 at 7:13 am
You may want to read a bit about MPP.
Matching the server to the storage system is part of this strategy.
I'd imagine that with TB's of data, there are also loading challenges.
http://www.microsoft.com/sqlserver/2008/en/us/parallel-data-warehouse.aspx
Greg E
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply