A Global Reach

Question

A Global Reach

Steve Jones - SSC Editor

SSC Guru

Points: 736602
More actions
November 16, 2006 at 11:45 pm

#368069

A Global Reach
With the last few weeks being fun polls and next week being a holiday in the US, I thought would be good to get back to a semi-serious, SQL Server related poll this week. Also being in Seattle at the PASS Summit, has inspired me a bit as well.
It's interesting to come to the Summit every year and meet so many new people, from so many different places. Each of us uses SQL Server in a different way and together we push the limits of what the product can do. As I've been wandering in and out of sessions and working on various projects here at SQLServerCentral.com and with FourDeuce.com, a few things have come up that I haven't dealt with very much. One of those items related to handling multiple code pages and character sets in SQL Server, so ...
How many of you work with multiple languages in your database?
I'm curious because I've very rarely dealt with Unicode data or even used nvarchar or ntext datatypes in SQL Server. For the most part I've worked in companies that were English speaking and dealt exclusively with English data. Often American English and badly worded at that, but still the default US English code page and plain old varchar and char data.
So I'm wondering, where do you see a need for Unicode data in your systems, especially if you're an English speaking company. Or if you're using a language other than English in SQL Server, how well does it work and what are some of the interesting problems you've faced.
Steve Jones

Viewing 15 posts - 1 through 15 (of 22 total)

You must be logged in to reply to this topic. Login to reply

Phil Factor SSC-Insane Points: 20254 More actions · Answer 1

The whole issue of multilanguage support is tackled very early on in any database that is designed from the ground up to work in Europe, so others would describe it far better than I. These issues start to get very complex when one includes arabic/semitic languages, and then, when you start including Japanese it becomes a highly speciallised subject.

I wrote a SQL Server-based Internet trading site for a large UK operation a short while ago. We'd got it all finished and operational when the decision was made to go pan-european. This involves a lot of languages, though it seems that, when push comes to shove, one can persuade a European to speak one of eight core languages as long as French and English are amongst them.

You therefore obviously need to separate the concept of a country where a person is, with the language they want to use. I know this is obvious but when I explained it to the website designers, they seemed surprised. The whole issue of currency, postage rates, and local taxes goes with the country and is entirely separate to language.

Then there are date formats. If the languages are the ones included with SQL Server, then it is a simple matter of setting the language (to get 'month and day names). If the language isn't there, then it is a bit messy. The actual date formats have to be set separately because they tend to be country-based rather than language based.

Each user session has to have a language and country setting, taken from the known preferences of the customer if known, or the source of the connection if you can detect it.

We'd set the collation to accent-insensitive and case-insensitive, which turned out to be pretty good, so we could then get stuck in to finding all places where the database returned language-based string data, and re-engineering it to take the correct translation from the tables. Getting the correct translations was a character-building exercise, since the translators generally have little or no understanding of a database. We ended up dumping all the strings that got as far as the web front-end in a large excel spreadsheet, getting the translators to do their version and then port it all back into the database. Any DBA should be getting strange feelings of visceral panic at this point because it certainly took more than a few minutes

Then we had to check the translations because words mean different things in different contexts. Eeeek! words with a specialised IT meaning such as 'Submit' took on strange meanings in other languages if the translator wasn't concentrating. Brand names can take on very embarassing connotations in other languages.

One could go on and on and on about all the detailed work that was necessary. And that was before we got to any issues to do with collations and unicode data... In the end these were the least of our worries.

Best wishes,
Phil Factor

Scott Arendt SSCrazy Eights Points: 8218 More actions · Answer 2

Four years ago when our current application began its design, the decision was made to support other languages, so we went with unicode and a way to alter all text based on user preferences.

Jump forward four years, no use of any language other than English and now most of the development doesn't take into account support for those languages. We recently had a customer that inquired about another language. It turns out now that so much effort would be needed to update the code that it wasn't even worth pursuing.

Regards,

Scott

Stephen Hirsch SSCommitted Points: 1822 More actions · Answer 3

I used to work for Smith Corona. Remember them? I call it the company too stupid to live. They had Palm 5 years before Palm did. Oh well...

Anyways, I did the internationalization of the software. 18 different languages at one point, including Russian. It is a tremendous amount of work and process. I was in at the beginning of the Unicode effort, but the company died before I could make a meaningful contribution.

Michael-284794 SSC Eights! Points: 962 More actions · Answer 4

Oddly enough, despite the fact that South Africa (where I'm currently working) has 11 official languages, this hasn't come up yet for me. Nothing beyond the ASCII character set has been called for, and that has certainly made life simpler...

-----------------

C8H10N4O2

Grant Fritchey SSC Guru Points: 398911 More actions · Answer 5

I'm working for an international company. All the databases we've created in the last five years have been using unicode to support the languages we'll need in the system. There are few rows in all the databases we have developed that actually have another language in them at this point. I'm still not sure why we needed it unless it was to satisfy a regulatory requirement.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

P Jones SSChampion Points: 12361 More actions · Answer 6

Though I only work in English, it often feels like I'm working in another language as I want British English not US English and everything defaults to US. We have colours not colors and 9/11 is the 9th of November - perhaps that's why they chose 7/7 over here - an unambiguous date!

I have to make sure I do the formatting rather than relying on regional settings as something is bound to have slipped through with the default US settings - I have one particular date that only turns US on the live system - dev and test are fine. Yet the sql and iis and user boxes are all set to British!!

BTW US is also an abbreviation for useless - often how we feel about the settings

Loner SSC-Insane Points: 21279 More actions · Answer 7

When I worked for the marketing researching company, one of my project was changing a lot of main fields (name, address, ...) to unicode because they wanted to expand the market to Europe and Asia. After I changed the fields, I put in different language in the database. Actually SQL server 2000 worked very good in Unicode, all data of different languages in the database came out fine (in their own language).

(The only problem was typing those data in, each language has its own keyboard, I had a hard time to use Chinese keyboard even my brother and sister taught me. The French keyboard I could not figure how to get the ^ on top of the a, actually it was in the keyboard, I just did not know how to get it. The German keyboard the @ was not above the 2. When I was in Germany I tried to send out an email, it took me 5 minutes to find the @ sign to type in the email address. :doze

Rudyx - the Doctor SSC-Forever Points: 43695 More actions · Answer 8

US English 20+ years.

RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 9

Lol.. simple solution to that. Ask a coworker to IM you the characters... or use copy paste from your adress book .

swjohnson Hall of Fame Points: 3254 More actions · Answer 10

In the current company I work for, we currently have SQL Server handling text/sorting in over 90 different langugages/dialects (including right to left languages). It's been a long road (about 6 years since I first started this journey) but we are finally starting to realize the significant benefits of this path. What's funny is that our Oracle team is miles (or kilometers) behind us...HAHAHAHA!

The hardest part was that while SQL was ready to handle the Unicode, the ASP stuff was not quite up to snuff. However, .Net has overcome that though.

Another thing that was difficult was changing the mentality of management and programmers to think in a global perspective instead of a US centric view. The interface had to be thought about differently too...colors, dates, times, left to right issues, spacing and such.

The best thing to do is analyze what fields need to be unicode and which ones don't. Really analyze how your data is going to be used. Not everything will need to be ntext, nvarchar, or nchar. Know your application and know your data.

I will try and find my old resources and post the links/names here if anyone is interested.

SJ

Accepting all invites

Charles Kincaid SSChampion Points: 13593 More actions · Answer 11

Our multi-ligual is English and Spanish. The reason all of our fields are nvarchar is that varchar is NOT SUPPORTED on the mobile. Our SP writer did not want to jack with the DDL creator in our sync objects so we just moved everything to nvarchar, ntext, n-whatever and called it good. Well, I just said "acceptable" and bit my lip.

ATBCharles Kincaid

Jasmine D. Adamson Hall of Fame Points: 3428 More actions · Answer 12

I did the database for Schwab Japan's web site, when they were still in business. It was an absolute nightmare because we were on SQL 6.5 back then. We had to have our COM object translate the Shift-JIS to ASCII and back again. Most of the Japanese-language stuff was in the web code, but a few columns like company names had to be databased. When we upgraded we switched to unicode and things went very well.

Right now I'm having a problem with Japanese-language stuff on my Netzero email account. If anyone knows anything about that, please let me know. My Japanese emails go out fine, but when I get the reply, it looks like base-64 jibberish...

pshotts Right there with Babe Points: 784 More actions · Answer 13

Don't get me started...

It took a lot of trial and error to extract Korean, Chinese, Thai and other characters stored in an Oracle 9i database with UTF-8 language setting into SQL Server 2000 as Unicode using DTS.

And that is with several years of experience combining Hebrew (right to left, but pre-Unicode characters) with English in various SQL Server databases.

David.Poole SSC Guru Points: 76037 More actions · Answer 14

I worked on government sites that required Urdu, Bengali, Gujarati, Punjabi etc.

The problem with those languages is that there is NO standard font for them and therefore you are at the mercy of the font provider.

The Welsh language has a funny character like a W which is unicode 333 but you can get away with ASCII.

LinkedIn Profile