November 14, 2011 at 5:42 am
Hi all,
This is my first real post and english is not my native language so bear with me. Also this topic is not something i have deep knowledge about as i am not a DBA and only has a couple of years experience with databases.
I'm an SQL developer employed with an european telco, we are in total 4 people working on the legacy platform which is a Db2-platform, our main focus is report-generation and serving up data to the users. The past 1 1/2 half years we have been a part of the project that is to replace the old platform with a new SQL Server 2008. Its' quite a big platform handling/processing hundreds of millions of rows on a daily basis. Size-wize we are talking 40+ terabytes of data in the database from around 20 different sources.
At the moment we are finishing up configuring and loading the data into this new platform from various business applications. Most of the development has been done by external consultants which includes a complete framework built on SQL Server to handle the whole ETL-proces. One of the phases for each source is testing the loaded data by comparing output with known output from the old platform.
The other day one of us experienced a mismatch in this comparison, basicly a select yielded a result that differed from the one we expected...
Old platform:
select count(*) as count ,customer_type from
prod.CUSTOMER
where customer_type like 'LVA%'
group by customer_type
order by customer_type
Output:
countcustomer_type
72LVA1
3LVAB
1LVAC
1LVAD
2LVAE
2LVAF
2LVAG
2LVAH
1LVAJ
4LVAA
New Platform
select count(*) as count,customer_type from
prod.CUSTOMER
where customer_type like 'LVA%'
group by customer_type
order by customer_type
Output:
countcustomer_type
72LVA1
3LVAB
1LVAC
1LVAD
2LVAE
2LVAF
2LVAG
2LVAH
1LVAJ
As you can see we are missing some customers in the output of the new platform, the issue is that the collation chosen on our new platform is somewhat different from the one on the old db2-platform - the collation chosen on the new platform is Danish_Norwegian_CI_AS. We have not in the project discussed which collation to use, nor the effects it might have on old code once we move form the old platform to the new one.
The development team (consultants) has replied to us that this collation is the Microsoft recommended one for this particular server. They have also proposed a fix which means creating views on all of the tables with the correct collation and then build all reports/ouput on these views.
I am a bit worried about this "fix" - won't this spell trouble in the long run? And how are we to enforce this when users are granted access to the platform, sure we can limit access to views only but at some point a new developer might be given access to the tables themselves or we might forget to apply the correct collation going forward.
If any of you has got input/experience that might be relevant with regards to the above I would like to hear from you in this thread! Also if something is unclear please let me know and i'll try to elaborate as best as I can!
Thanks
/Marty
November 14, 2011 at 7:30 am
Personally, I would change the collation for the whole database.
If this is the only database on the instance, I would also change the instance default collation, in order to avoid collation conflicts with system databases.
If you're interested, you can read here how to change server collation[/url].
-- Gianluca Sartori
Viewing 2 posts - 1 through 1 (of 1 total)
You must be logged in to reply to this topic. Login to reply