Database Design

Question

Database Design

krishnavamshi24

Mr or Mrs. 500

Points: 553
More actions
June 6, 2011 at 4:11 am

#238712

Hi Everyone,
We currently have started a project (Work Flow Based Application) , where i have been asked to design a database which would accept both simplified chinese and english characters. My current approach is to use nvarchar and the collation as SQL_Latin1_General_CP1_CI_AS. Would that approach work, i have never designed a database for bilingual interface. Please provide suggestions on collations or tips or your experiences while designing such databases (Challenges , Learnings, Key points).
Also we need to store files, and development team wants to store them on Database, but i feel its better to store them on file system and have a reference of filename on DB. As the Database size would increase very quickly. We are expecting more than 1 lakh files to be loaded with average file size of 2 - 3mb.
Regards,
Krishna
Note: Mods Hope this is the right place for posting the query DB Design

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Krishna

The only thing I would say on this is that the collation you mention is for backwards compatibility. You should use Latin1_General_CI_AS instead. I'm afraid I haven't worked in a multi-lingual environment either. Check out this recently published article to see whether it helps you get started:

http://www.sqlservercentral.com/articles/Stairway+Series/72660/

John

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 2

Let me take the "file store" part of the question.

In my experience Development team will always ask to store files in the database - most of the time they don't understand the difference in between storing those files as LOB and storing those files as file system files. What they want is the "database" to be able to return back a specific file as needed.

The only advantage I can see in storing the files as LOBs is to simplify backup/recovery processes - once you took a backup everything is in there.

On the other hand, depending on the rate of creation and volatility of those LOBs it may be a good idea to take them out of the actual database - provided file system backup procedures are in place, well documented and reliable.

Just my two cents.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

Orlando Colamatteo SSC Guru Points: 182279 More actions · Answer 3

Regarding collations:

I have seen folks on this site posting problems related to storing data in two different languages in the same column.

Regarding file storage:

Here is a good whitepaper I have passed along several times re: LOB storage in the DB: http://research.microsoft.com/apps/pubs/default.aspx?id=64525

How big are the files they want to store? From the whitepaper:

Of course, this depends on the particular filesystem, database system, and workload in question. This study shows that when comparing the NTFS file system and SQL Server 2005 database system on a create, {read, replace}* delete workload, BLOBs smaller than 256KB are more efficiently handled by SQL Server, while NTFS is more efficient BLOBS larger than 1MB. Of course, this break-even point will vary among different database systems, filesystems, and workloads.

How often will the files be updated? They go on to say that performance degrades as the same BLOBS are re-written to the database so for update-heavy apps database storage may not be the way to go even if the file size is small.

The whitepaper above was written in 2006 however the findings are still relevant in 2008 and the research and testing is very complete. Here is a whitepaper targeting 2008 called "Managing Unstructured Data with SQL Server 2008" which essentially says the same thing recommending only using in-database BLOB storage for files less than 250KB. It also talks about FILESTREAM. FILESTREAM may be is an option for you. It is designed to handle any size file (can exceed 2GB limit of in-database BLOB) and retrieval does not affect the buffer pool.

http://www.microsoft.com/sqlserver/2008/en/us/wp-sql-2008-manage-unstructured.aspx

For the benefit of others dropping by the site please post what you decide in terms of a design and any findings along the way.

There are no special teachers of virtue, because virtue is taught by the whole community.
--Plato