Multiple Language Database Design - Additional Columns vs. Additional Tables...

  • unfortunately Spanish does not work with the same collation as English, for ñ is a letter (to appear after n) by itself. moreover ch and ll were 'letters' by themselves until the 1994 reform, so it is also necessary to choose between traditional or modern sorting...

  • I reckon the question should be, why cant we have SQL manage this for us?

    Here would be my ideal scenario:

    Example Schema. Assume we have a 'default culture', (let's say 'en-US') set at the Database Instance level.

    CREATE TABLE Advertisement

    (

    AdvertisementID INT NOT NULL PRIMARY KEY,

    Header NVARCHAR(500),

    Body NVARCHAR(MAX)

    )

    When INSERTing into this table there are 2 ways to handle different cultures:

    - insert to the default culture. no changes for this.. (ie.. INSERT INTO Advertisement (AdvertisementIDm, Header, Body) VALUES (1, 'American Header', 'American Body')).

    - specify a culture string when inserting. i.e:

    SET SESSIONCULTURE 'en-UK'

    GO

    INSERT INTO Advertisement (1, Header, Body) VALUES (1, 'British Header', 'British Body')

    Then, when we select from this table we'd again, have two options

    SELECT * FROM Advertisement WHERE AdvertisementID = 1 --returns default (en-US) value, or;

    SELECT * FROM Advertisement WHERE AdvertisementID = 1 AND @@CultureString == 'en-UK' --returns UK. (or you could do this via a SET SESSION).

    My thinking would be that you woulndt be able to insert a culture-specific value (i.e en-UK) until the default value has been inserted.

    Anybody got any thoughts??

    Cheers,

  • SET SESSIONCULTURE 'en-UK'

    GO

    SELECT * FROM Advertisement WHERE AdvertisementID = 1 AND @@CultureString == 'en-UK'

    I'm not seeing these Functions in SQL anywhere...

  • yeah i wrote them intended to be a wishlisht, or if similar functionality exists, someone might be able to point it out to me??

    Regards,

  • Well, yes, I also wish MS would have just allowed us to specify a culture code. There are two functions @@LANGID and @@LANGUAGE. These will show you what the current Language is configured at on your current ConnectionString (or Query Analyzer / Mgmt Studio).

    Select @@LANGID, @@LANGUAGE

    You can set the "Current Language" on your connection strings (see www.connectionstrings.com). However, this refers to the LangID defined in SQL Server's SysLanguages. So, you can't just specify a culture of "en-US", it has to be LangID "0". British for example is LangID "23".

    So, let's assume you modify your connectionstrings based on the current culture of the client (keep in mind, SQL Server does NOT by any means have a complete list of cultures you can map to - there's currently only 33 on my server). Let's also assume you have a "mapping" table created that maps between SQL Server LANGID's and Culture codes.

    CREATE TABLE dbo.Cultures (

    ID [int] IDENTITY(1,1) Primary Key,

    Name [varchar](50),

    Code [varchar](5),

    LanguageID [int]

    )

    ConnectionString = "Data Source=(local);Initial Catalog=MyDB;Integrated Security=SSPI;Current Language=23;"

    Now, in you Procs or Queries, you can determine the culture from that language.

    Declare @Culture VarChar(5)

    Select @Culture = Culture

    FROM dbo.Cultures

    Where LangID = @@LANGID

    Then...

    Select *

    From dbo.SomeTable

    Where ID = ?

    And Culture = @Culture

    ------------------------------------------------------------

    This is flaky at best because there's no possible way to ensure that SQL has the LangID you want to use. A better approach would be to have a standard applied to all your PROCS that the last parameter is always a "@ClientContext xml" parameter which contains certain specifics about the client requesting data (Culture, TimeZoneOffset, any thing else necessary). Then, use that parameter to determine the correct content.

    Declare @ClientContext xml

    SET @ClientContext = '<Context>

    <Culture>en-US</Culture>

    <TimeZoneOffset>240</TimeZoneOffset>

    </Context>'

    SELECT

    T.ID,

    T.Culture,

    T.CreatedDate As CreatedDateUtc

    DateAdd(mi, T.CreatedDate, @ClientContext.value('//TimeZoneOffset[1]', 'int')) As CreatedDate

    FROM dbo.SomeTable T

    Where (Culture = @ClientContext.value('//Culture[1]', 'varchar(5)'))

    (This is just off the top of my head, so it's just a general idea of how to query things)

  • tymberwyld (12/5/2007)


    Ok, so there's absolutely no design pattern for Globalizing "data". So I've come up with a solution which seems to be working well. It's based on a combination of different patterns including the way the .NET Framework stores Resources, the EAV Model, and my own brew of things. I knew all along this was the way to go but just wanted to see if there were other suggestions.

    First off, let me state I am sorry about the formatting, this Forum seems to strip all Tabs and Spaces out of everthing!

    So, I've created a Table (ObjectGlobalization - sorry, couldn't think of a better name) which allows me to globalize data for any table in my database using an EAV Model. You'll notice there are multiple "Value" columns. Only one is used per Column / Row depending on the Column's DataType. For all DataTypes besides VarChar(MAX), Text, NText, and Xml use the Value column, otherwise use the appropriate ValueText or ValueXml columns (you can add others if need be; for example, if you're storing Icons per Culture add a ValueBinary columns with a data type of either VarBinary(MAX) or Image). Also, because the Value column is a Sql_Variant, you should be able to store Chinese characters in there because it can accept a NVarChar data type.

    CREATE TABLE [dbo].[ObjectGlobalization](

    [ObjectName] [varchar](128) NOT NULL,

    [ColumnName] [varchar](128) NOT NULL,

    [Culture] [varchar](5) NOT NULL,

    [RowID] [sql_variant] NOT NULL,

    [Value] [sql_variant] NOT NULL,

    [ValueText] [varchar](max) NULL,

    [ValueXml] [xml] NULL,

    [CreatedBy] [varchar](50) NULL,

    [CreatedDate] [datetime] NULL CONSTRAINT [DF_ObjectGlobalization_CreatedDate] DEFAULT (getutcdate()),

    [ModifiedBy] [varchar](50) NULL,

    [ModifiedDate] [datetime] NULL CONSTRAINT [DF_ObjectGlobalization_ModifiedDate] DEFAULT (getutcdate()),

    CONSTRAINT [PK_ObjectGlobalization_1] PRIMARY KEY CLUSTERED ([ObjectName] ASC,[ColumnName] ASC,[Culture] ASC,[RowID] ASC) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    CREATE FUNCTION [dbo].[ufn_get_GlobalizedValue]

    (

    @ObjectName VarChar(128),

    @ColumnName VarChar(128),

    @Culture VarChar(10) = Null,

    @RowID Sql_Variant,

    @Default Sql_Variant = Null

    )

    RETURNS Sql_Variant

    AS

    BEGIN

    Declare @Value Sql_Variant

    Select @Value = [Value]

    From dbo.ObjectGlobalization

    Where (ObjectName = @ObjectName)

    And (ColumnName = @ColumnName)

    And (Culture = @Culture)

    And (RowID = @RowID)

    Set @Value = IsNull(@Value, @Default)

    -- Return the result of the function

    RETURN(@Value)

    END

    GO

    CREATE FUNCTION [dbo].[ufn_get_GlobalizedValueText]

    (

    @ObjectName VarChar(128),

    @ColumnName VarChar(128),

    @Culture VarChar(10) = Null,

    @RowID Sql_Variant,

    @Default VarChar(MAX) = Null

    )

    RETURNS VarChar(MAX)

    AS

    BEGIN

    Declare @Value VarChar(MAX)

    Select @Value = IsNull([ValueText], @Default)

    From dbo.ObjectGlobalization

    Where (ObjectName = @ObjectName)

    And (ColumnName = @ColumnName)

    And (Culture = @Culture)

    And (RowID = @RowID)

    -- Return the result of the function

    RETURN(@Value)

    END

    GO

    Ok, so how is this used? Well, for my scenario I needed to store Product Descriptions and Names per Culture. I started out attempting to dynamically determine the Culture by using the SqlConnectionString option called "LocaleID", however, it's WAY too much work because you can't simply set it to an LCID, it needs to be SQL Server's own Unique ID for Locales in the SysLanguages table. So, instead I've opted to *sigh* pass in another parameter to my Procedures called @Culture no less.

    Most of you will be designing you data using "Invariant Culture" which means basically "English". For my Products table, I opted NOT to store the Description in the table but instead to just store all of them including the "Invariant" version in the ObjectGlobalization table. However, I do store the Invariant version of the Product "Name" in the table, so you'll notice two different techniques in this Proc for retrieving these values.

    CREATE PROCEDURE [dbo].[usp_get_ProductsByCategoryID]

    (

    @CategoryID UniqueIdentifier,

    @Culture VarChar(10) = Null-- Defaults to Invariant

    )

    AS

    SET NOCOUNT ON

    Select

    P.ID,

    P.Code,

    Convert(VarChar(50), dbo.ufn_get_GlobalizedValue('Product.Products', 'Name', @Culture, P.ID, P.Name)) As [Name],

    Convert(VarChar(255), Left(dbo.ufn_get_GlobalizedValueText('Product.Products', 'Description', @Culture, P.ID, Null), 255)) As [Description],

    IsNull(P.UnitPrice, 0.00) As UnitPrice,

    IsNull(P.ListPrice, 0.00) As ListPrice,

    P.Attributes,

    P.CreatedBy, P.CreatedDate,

    P.ModifiedBy, P.ModifiedDate

    From Product.Products P

    Inner Join Product.ProductCategories C ON (P.ID = C.ProductID)

    Where

    (C.CategoryID = IsNull(@CategoryID, C.CategoryID))

    You'll notice that you can pass in a "Default" value to these functions which is what I do in the case for the Product Name just in case there is no data for that particular Culture. Also, in my case this is the Proc for returning Products in a List, so I only need the 1st 255 chars of the Description. Also, since the first function (dbo.ufn_get_GlobalizedValue) returns a Sql_Variant, it's very important to explicitly cast it to the correct type so that you UI code performs correctly.

    And lastly, I basically just typed the values I needed for this test using SQL MGMT Studio, however, the following Proc will allow you to save the Globalized values. You'd most likely just call this Proc from you're other Ins/Upd/Save Procs.

    CREATE PROCEDURE [dbo].[usp_save_GlobalizedValue]

    (

    @ObjectName VarChar(128),

    @ColumnName VarChar(128),

    @Culture VarChar(10),

    @RowID Sql_Variant,

    @Value Sql_Variant = Null,

    @ValueText VarChar(MAX) = Null,

    @ValueXml Xml = Null,

    @createdby VarChar(50) = Null,

    @CreatedDate DateTime = Null,

    @ModifiedBy VarChar(50) = Null,

    @ModifiedDate DateTime = Null

    )

    AS

    Set NoCount On

    -- Exit if there is no data

    If (@Value Is Null) And (@ValueText Is Null) And (@ValueXml Is Null) RETURN(1)

    -- Validate the Data

    Select

    @ObjectName = NullIf(LTrim(RTrim(@ObjectName)), ''),

    @ColumnName = NullIf(LTrim(RTrim(@ColumnName)), ''),

    @Culture = NullIf(LTrim(RTrim(@Culture)), ''),

    @createdby = IsNull(@CreatedBy, SUser_Name()),

    @CreatedDate = IsNull(@CreatedDate, GetUtcDate()),

    @ModifiedBy = IsNull(@ModifiedBy, SUser_Name()),

    @ModifiedDate = IsNull(@ModifiedDate, GetUtcDate())

    If (@ObjectName Is Null) RaisError('The [@ObjectName] Parameter cannot be Null or Empty.', 16, 1)

    If (@ColumnName Is Null) RaisError('The [@ColumnName] Parameter cannot be Null or Empty.', 16, 1)

    If (@RowID Is Null) RaisError('The [@RowID] Parameter cannot be Null or Empty.', 16, 1)

    If (@@ERROR <> 0) RETURN(-1)

    BEGIN TRY

    If Not Exists(

    Select Null From dbo.ObjectGlobalization

    Where ObjectName = @ObjectName

    And ColumnName = @ColumnName

    And Culture = IsNull(@Culture, '')

    And RowID = @RowID

    )

    BEGIN

    Insert Into dbo.ObjectGlobalization

    (ObjectName, ColumnName, Culture, RowID,

    Value, ValueText, ValueXml,

    CreatedBy, CreatedDate, ModifiedBy, ModifiedDate)

    Values

    (@ObjectName, @ColumnName, IsNull(@Culture, ''), @RowID,

    @Value, @ValueText, @ValueXml,

    @createdby, @CreatedDate, @ModifiedBy, @ModifiedDate)

    END

    Else

    BEGIN

    Update dbo.ObjectGlobalization Set

    ObjectName = @ObjectName,

    ColumnName = @ColumnName,

    Culture = IsNull(@Culture, ''),

    RowID = @RowID,

    Value = @Value,

    ValueText = @ValueText,

    ValueXml = @ValueXml,

    ModifiedBy = @ModifiedBy,

    ModifiedDate = @ModifiedDate

    Where ObjectName = @ObjectName

    And ColumnName = @ColumnName

    And Culture = IsNull(@Culture, '')

    And RowID = @RowID

    END

    If (@@TRANCOUNT > 0) COMMIT

    RETURN(1)

    END TRY

    BEGIN CATCH

    If (@@TRANCOUNT > 0) ROLLBACK

    RETURN(-1)

    END CATCH MMMMM

  • I have to admint I did not spent too much time reading all the posts carefully but I think I did not find anyone suggesting the usage of the sysmessages table. We create software for the international markets and we have to support a whole set of languages. All our translations are stored in sysmessages tables. The nice thing about this apoproach is that you can use bulid in SQL function FORMATMESSAGE to deliver the translated text to you. On top of this the mesasges could be parametrized so you do not have to worry about the proper string concatenation to create a proper message in a given order (Many languages have different sentence structure).

    In your tables you just have to store the ID of the message stored in sysmessages (and any replacement parameters if you use them) and depending on the connection string options your application is displaying everything in the correct language.

    ---------------------------------------------
    [font="Verdana"]Nothing is impossible.
    It is just a matter of time and money.[/font]

  • If an application uses sys anything in the code means it may change with the next service pack of SQL Server so your user need you to keep that application running my version with byte changes made to Nvarchar/Nchar in SQL Server 2008 most people can make changes to any application as needed.

    Define a column for each language using correct collation and start with translated text, it is that easy.

    Kind regards,
    Gift Peddie

  • I do not think the sysmessages table structure changed since at least version 7

    Plus I am not linking to sysmessages but using FORMATMESSAGE function so I do not care much if the schema changes or not.

    I like to use existing tools instead of reinventing the wheel and FORMATMESSAGE gives you a lot of functionality.

    We use it to translate all texts used by tha application including the windows forms captions.

    This may not be viable solution for everyone but wasn't even considered in the post thread so I figured I can mention it.

    ---------------------------------------------
    [font="Verdana"]Nothing is impossible.
    It is just a matter of time and money.[/font]

  • The thing is what is relevant to Winform is not relevant to Asp.net in localization and even Winform some forms are part of the operating system so to develop for right to left langauges like Arabic and Thai it is better to install Visual Studio for Arabic development.

    You can use the Windows Resource localization editor for Winform but Asp.net translated text is still better because I still get wrong language emails from sites using automatic localization.

    Kind regards,
    Gift Peddie

  • @sam-3 McGoldrick

    It seems that multilingual databases have been though about, but their concept may have been patented and no one is willing to pay the patent license fees. I may be wrong in this, but it's the closest I could come up with as a solution to this all to common problem of localized data.

    Oh, and here is an example of such a patent:

    http://www.freshpatents.com/Techniques-for-processing-data-from-a-multilingual-database-dt20081009ptan20080249998.php

  • I have just looked at that application the US government will not even issue trade mark for crap people like me can create in our sleep with most RDBMS. So I blame the Latin alphabet controlled Unicode standard for such confusion about pedestrian bytes counting and nothing more.

    :hehe:

    Kind regards,
    Gift Peddie

  • Gift Peddie (4/7/2009)


    I have just looked at that application the US government will not even issue trade mark for crap people like me can create in our sleep with most RDBMS...

    I would not be so sure about that. 🙂

    They issued a patent to Microsoft for the Cascading Style Sheets.

    ---------------------------------------------
    [font="Verdana"]Nothing is impossible.
    It is just a matter of time and money.[/font]

  • JacekO (4/7/2009)


    Gift Peddie (4/7/2009)


    I have just looked at that application the US government will not even issue trade mark for crap people like me can create in our sleep with most RDBMS...

    I would not be so sure about that. 🙂

    They issued a patent to Microsoft for the Cascading Style Sheets.

    I am hopeful the US government just rejected Dell application to trade mark Cloud as in Cloud computiing.

    :hehe:

    Kind regards,
    Gift Peddie

  • Am I mistaken in thinking most of the information posted in this thread is more about translating static (or semi static) information back to the end user? Such as within an application, or on a web page?

    Although we look to do similar, I am also interested in learning whether it's possible / feasible to receive input data from the end user in different languages

    In other words, we'll have some users using a single database whereby someone wants to add in their name in their native language (Korean for example) someone else will be adding in similar data in English, another user will be adding it in Arabic, so effectively we want to be populating the same field (first_name, for example) with data from 3 (or more) different languages

    I would be expecting the users to be inputting this data via a public facing web interface

Viewing 15 posts - 16 through 30 (of 40 total)

You must be logged in to reply to this topic. Login to reply