Background
Lately I've been working on a Java application, specifically a search engine
implementation, where I needed to work with data from a SQL Server database. The
company I work for have a Java based platform for implementing search engines
and I've worked with it much before, but this was the first time that the search
engine I was implementing was to index and search data stored in a SQL Server
database. Working with one of the indexes I encountered some problems and
learned a few experiences, so I thought I'd share these in this article. The
index in question was one that contained dates, i.e. a column with data type
datetime in SQL Server.
Dates in Java
In Java, data of type date is stored in objects of the class Date. The normal way to create an instance of one is to supply the constructor with a value of type long, which represents the number of milliseconds that have expired since 1970-01-01 00:00:00. However, when users would use the search engine they would supply dates in the format above, i.e. yyyy-mm-dd hh:nn:ss (ANSI standard format and also the way we write dates in Sweden). So the first thing I needed to do was to convert these strings and the above format into dates. Using a nice programming language such as Java this is easy. The code snippet below shows how to do this:
String s = new String("2003-08-30 16:00:00"); SimpleDateFormat sdf = (SimpleDateFormat) SimpleDateFormat.getDateTimeInstance(); sdf.applyPattern("yyyy-MM-dd HH:mm:ss"); Date d = sdf.parse(s);
SQL Server datetime
Now that I had code that converted the user supplied date to search for I needed to write a T-SQL query that was to be run to populate the index in the search engine. When the search engine starts up it populates all indexes and then all queries run against it are only run inside the search engine, the database (or whatever data store is being indexed) is not used again until the indexes are repopulated or updated. Since dates are stored as number of milliseconds since 1970-01-01 I thought it would be a good idea to convert the dates in SQL Server the same way in my T-SQL query. Using the datediff-function, I tried this query:
SELECT DATEDIFF(ms, '1970-01-01 00:00:00', datecolumn) AS numberofms
Overflow error
This didn't work to well though. As most of you probably know, the datediff-function returns an int, and the int data type in SQL Server can represent numbers from -2^31 (-2,147,483,648) through 2^31 - 1 (2,147,483,647). This means that the greatest difference between the two days that can be represented by a number of milliseconds (using an int) is 24 days, 20 hours, 31 minutes and 23.647 seconds. As soon as the dates in the datecolumn where larger than that I would get an overflow error. So I needed a way to represent the number of milliseconds with a bigint instead, as this data type supports much bigger numbers. I was surprised to find that SQL Server does not have a function for doing this, like a datediff_big (compare with count and count_big). Instead I had to convert it myself. The way I came up with was to first use datediff to get the difference in number of minutes, convert them to milliseconds and then add the seconds part of the date and finally add the milliseconds part of the date. This is the T-SQL query I ended up with:
SELECT CAST(DATEDIFF(n,'1970-01-01 00:00:00',datecolumn) AS bigint) * 60 * 1000 + DATEPART(s,datecolumn) * 1000 + DATEPART(ms,datecolumn) AS numberofms
UTC dates
So, all is well, right? Not quite. When testing I noted that the dates
(represented by long values in my Java application) that I had converted in Java
where different from those I got from SQL Server, even when they where created
using the same date in string format. I found out that Java by default stores
it's dates in UTC time (Universal Time Coordinate). SQL Server however does not,
so I once again started looking through Books Online for a nice function to
convert a datetime value to a UTC datetime. No luck. SQL Server does have one
function that returns the current date and time in UTC time, getutcdate, but
that didn't help me because I needed to convert the stored values in datecolumn.
Next thought was to find out what to add or subtract from my dates to convert
them into UTC time. This proved to be difficult though because in UTC about once
every year or two there is an extra second, called a "leap second." And I
couldn't just use getutcdate and getdate and compare the difference between them
as that would mean that I would have to reset the system time to whatever date
and time I needed to compare for each value in datecolumn. Since leap seconds
are added 'all the time' the difference between two dates in UTC time and
'normal' time is not the same for say 1970-01-01 and 2003-08-30.
Back to Java
Once again I found the answer in Java with all the help it provides being such a rich programming language. Assuming that the computer running SQL Server and the computer running SQL Server is configured to use the same time zone I can use the SimpleDateFormat object to get the current time zone, then use the function getOffset. This function returns the offset of this time zone from UTC at the specified date (represented by a number of long data type). So I revised the Java code from above to the one below, and I was all set!
String s = new String("2003-08-30 16:00:00");
SimpleDateFormat sdf = (SimpleDateFormat) SimpleDateFormat.getDateTimeInstance();
sdf.applyPattern("yyyy-MM-dd HH:mm:ss");
Date temp = sdf.parse(s);
Date d = new Date(temp.getTime() + sdf.getTimeZone().getOffset(temp.getTime()));
Summary
In this article I have discussed my experiences with working with dates in Java
and SQL Server. I hope to come back to it in the future and do a similar
comparison in .Net, or even using SQL Server Yukon. Being under an NDA I can't
say anything about Yukon, but I can say that I wouldn't have implemented my
solution exactly as above if I was using Yukon instead of SQL Server 2000 as my
data storage.