May 21, 2010 at 2:37 pm
Let's say you have an order table defined as:
CREATE TABLE order (
order_number INT NOT NULLl,
order_received DATETIME NOT NULL,
order_filled DATETIME NULL
)
and assume that order_filled is either null, or it is the same or larger than order_received.
The challenge is to find all orders that have been received and not yet filled for a given date:
SELECT order_number
FROM order
WHERE order_received <= given_date
AND (order_filled > given_date OR order_filled IS NULL)
In my experience, an index starting with the order_received field returns results very quickly when the given date is near the last date value in the column; this works quite well since most queries are in the near past. Also an index on the order_filled column helps out when the given date is near the first date value in the column. Splitting the query into two parts, put back together by a UNION can also perform well under certain circumstances. But, given a large quantity of data and a given date that is near the middle of the date range, neither index not the split query will lead to good performance.
Naturally, the users of such a query are baffled by what they see as inconsistant performance.
Does anybody have any tricks I could try out? I'm not looking for a canned solution, simply a new perspective.
May 21, 2010 at 9:05 pm
man ,i'm confused:hehe:.
to find all orders that have been received and not yet filled for a given date, shouldn't the query be like this:
SELECT order_number
FROM order
WHERE order_received <= given_date
AND (order_filled = order_received OR order_filled IS NULL)
instead of yours like:
SELECT order_number
FROM order
WHERE order_received <= given_date
AND (order_filled > given_date OR order_filled IS NULL)
?
May 25, 2010 at 7:16 am
No. Your query would give orders received in the past and either filled on the same date or not at all. In general, orders can be filled on any date on or after the date they are received and the query must pick up all orders that are filled after the given date but were received on or before the given date.
For example, an order received on Jan 2, 2010 and filled on Jan 10, 2010 should be picked up when the given date is anywhere in the range of Jan 2, 2010 to Jan 9, 2010, which would not happen with your query.
May 25, 2010 at 3:22 pm
Do you store date and time in your datetime column or just a date?
If you need to store only date and your table is going to contain very large volume of data, you can significantly improve performance of your queries (with date and date ranges conditions) by storing your date as INT! SQLServer will convert date to int and int to date easely, you will save 4 bytes per value, performance of BETWEEN will be much better. You will only need to convert your int's into datetime when return date value to the consumer which really need it (eg. GUI).
May 25, 2010 at 5:28 pm
Just to make sure...
I am not advising to store date in YYYYMMDD format (or other date format) as INT (that would be stupid).
Try SELECT CAST(GetDate() AS INT)
You will see the interesting value, it is a serialized value of date. That value should be stored!
So, you table can look like
CREATE TABLE SomeName
(
ColId INT,
SomeDate INT
)
Your query can be like this:
declare @iDtNow Int
set @iDtNow = CAST(CAST('20100101' as datetime) as int)
insert into SomeName select 1, @iDtNow
select * from SomeName where SomeDate = CAST('20100101' as datetime)
NOTE, the date should be converted into INT explicitly, but INT can be converted to datetime implicitly.
So, when inserting date into INT, it should be casted explicitly. But
"select * from SomeName where SomeDate = CAST('20100101' as datetime) " will work without explicit convertion of datetime to int. However, here is a trick! If you do this, each value in your INT column will be implicitly converted to date and you will not have any performance benefit. But if you convert date into INT before using it in the WHERE clause, INT will be used for operation giving you better preformance.
WARNING: This technique will only work when you need to operate with only DATE part of DATETIME. Be carfull with using GetDate() as it will be serialised into deifferent INT value depend on time of the day. Safe way of using it is:
select CAST(CAST(CONVERT(VARCHAR,GETDATE(),112) AS DATETIME) AS INT)
Formatting into ISO removes the time part of datetime...
May 26, 2010 at 7:22 am
Thanks for the ideas elutin. Unfortunately, since we allow our users to develop their own reports (as well as hundreds of reports developed by us) using Crystal Reports I'm stuck with the data types we have.
I can see how the change of data types would make the comparisons quicker, but I think the real problem is the number of reads being done by the query, both physical and logical. A lot of rows are being read and rejected to come out with the few that are needed.
I'm persuing an idea involving additional tables that came to me recently, but I'm going to keep it to myself rather than cloud the creative minds of the readers of this forum.
May 26, 2010 at 7:26 am
By the way, the datetimes are really just dates.
May 26, 2010 at 7:56 am
john.delahunt (5/26/2010)
...I can see how the change of data types would make the comparisons quicker, but I think the real problem is the number of reads being done by the query, both physical and logical. ...
Funny enough, the main reason for method I've advised to perform better is:
It will cause around 25-30% less of reads!
Just test the following:
create table TestDates_DT1
(
RecId INT NOT NULL IDENTITY(1,1)
,DTValue DATETIME
)
insert into TestDates_DT1 (DTValue)
select
CAST(
'19' + right(cast((cast(cast(NEWID() as varbinary) as bigint)) as varchar(30)),2)
+
REPLACE(
'0' + right(cast((cast(cast(NEWID() as varbinary) as bigint)) as varchar(30)),1)
+ '0' + right(cast((cast(cast(NEWID() as varbinary) as bigint)) as varchar(30)),1)
, '00','01')
as datetime)
from sys.columns sc1
cross join sys.columns sc2
SELECT RecId, CAST(DTValue as INT) as DTValue INTO TestDates_INT1 FROM TestDates_DT1
set statistics IO ON
select count(*) as reccount
from TestDates_DT1 t1
where DTValue between cast('19500101' as datetime) and cast('19700101' as datetime)
select count(*) as reccount
from TestDates_INT1 t1
where DTValue between cast(cast('19500101' as datetime) as int) and cast(cast('19700101' as datetime) as int)
And for your Crystal Report users there are few options:
1. Create views (or stored procs) and ask users to use them instead of tables (actually, I would not allow Crystal Developers to query my tables directly, who knows what kind of query they can write :-D)
2. Name your date column something like PayrollDateSerialized and ask them to use Crystal Report formula to convert the int to the date in report. At the end, the int value you have from SQL server for the date is nothing else than number of days since 01 Jan 1900!
3. At the end, you can duplicate you date columns as computed columns. You can use INTs in your queries for performance and Crystal devs can use datetime equivalent for reports.
This trick can be used for datawarehouse Dates dimension implementation. The serialised representation of the date is a perfect candidate for the "artificially natural" or "naturally artificial" key of the date dimension. The having such key in the fact tables, will allow , if needed, to get the date attributes without going into dimension itself.
May 26, 2010 at 8:20 am
I'm sure your test queries work much better, but they don't represent the essential element of my required query, which is that the given value is being compared against two different date columns. If written using 'between' (and assuming a given date of '01/01/2009') part of my requirement would be
select order_number where '01/01/2009' between order_received and order_filled
For any value of the given_date, SQL Server has to decide whether to use an index on order_received and one on order_filled. For values near the middle of the range of values, this means using the chosen index to scan about half of the entries. Making the indexes into covering indexes would eliminate the reads back into the data, but would not reduce the number of rows for which the non-indexed date must be examined.
Of course the part about NULL values in order_filled would have to be handled separately.
We do provide views to our clients, but the minute their requirements go outside the scope of the views, they simply link in whatever data they need. At this point our clients have hundreds of reports accumulated over many years. One client has nearly 900 reports! You don't simply tell clients that they have to change their reports without repurcussions which could include losing clients, which, for some reason, upsets senior management.
May 26, 2010 at 8:34 am
Using INTs (with same index structure) would reduce number of reads anyway.
However I agree, the changing legacy system is not a viable option in your case.
May 26, 2010 at 9:42 am
I created 2 test tables, one with datetime columns and the other with int columns and populated them with 9.5 million rows from a copy of a client database. Then I ran the query against each of the databases. The result was 237 logical reads when using datetimes and 238 when using ints. Either way it's much faster than it was against earlier version of either SQL Server or Sybase.
I'm dropping this issue now; performance is adequate.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply