September 30, 2010 at 5:32 am
I have table that currently has 40,000,000 rows (206 GB data space, 6.9 index space). This table holds text messages the system receives. It stores some key information about the message and then the text of the message. These messages are almost all less than 1 MB. The table is constantly being inserted into (usually around 500,000 new messages per day). We only keep the most recent 120 days.
We are now trying to do some reporting against this table and the reporting query performance is poor.
All the data is in the same file group. We generally won't need to query to get the messages and if we do we should be able to make use of an index.
Is there a better way to store these messages? Any thoughts? I've looked at filestream but because they are less then 1 MB this appears to not be a good solution. Should I store the messages in a separate table and join using a key? Maybe in another file group?
The table currently has no key - I've been told there really isn't a unique key to use ...
CREATE TABLE [dbo].[MessageLog](
[uuid] [varchar](50) NOT NULL,
[message_id] [varchar](50) NOT NULL,
[message_version] [varchar](20) NOT NULL,
[message_type] [varchar](20) NOT NULL,
[direction] [varchar](3) NOT NULL,
[processing_code] [char](1) NOT NULL,
[event_time] [datetime] NOT NULL,
[sending_app] [varchar](45) NULL,
[sending_facility] [varchar](45) NULL,
[receiving_app] [varchar](45) NULL,
[receiving_facility] [varchar](45) NULL,
[person_id] [varchar](45) NULL,
[person_last_name] [varchar](45) NULL,
[person_first_name] [varchar](45) NULL,
[person_dob] [varchar](45) NULL,
[path] [varchar](255) NOT NULL,
[service_name] [varchar](45) NOT NULL,
[acknowledged] [nvarchar](max) NULL,
[message] [nvarchar](max) NOT NULL
) ON [PRIMARY]
September 30, 2010 at 5:39 am
What is the search criteria for the reports? Are there any joins?
Which column(s) are the indexes on?
I dont see a PK declared is there one?
Is there a clustered index, if so which column(s)?
September 30, 2010 at 6:11 am
Please post ...
1- Available indexes
2- Predicate a.k.a "where clause" of offending queries.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.September 30, 2010 at 6:21 am
There is no primary key.
There are non-clustered indexes on uuid, service_name, event_time,last_name and personid.
Basically any report that causes a full table scan takes a long time (around an hour last time I checked).
The reports are general reports ex. a count of messages by service name where they meet another condition on a field without an index.
I was thinking that I needed to separate the message out of this table so the reports don't read the message field? Or maybe because of the volume of inserts the query is slow?
September 30, 2010 at 6:29 am
Personally, based on what you're describing, I'd look into partitioning with a rolling window. You can seperate out by date into different partitions on different servers, break it up to every... 15 days as a starting point for experimentation. This does a bunch of stuff for you. First, you get a clustered index on the table, which just about every single table should have (exceptions should be exceptional). Next, you get an easy way to remove large amounts of data without having to run delete statements on the system. Finally, data access is split up amongst multiple drives and/or servers, which out to speed things up considerably.
But, assuming you can't do that, you need to assess the most common access path for the select statements (which columns are used in the WHERE clause) and make that the clustered index, depending on what the columns are of course. From there, ensure that your queries are well constructed, sargeable, etc., the whole standard set of tuning operations.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 30, 2010 at 6:34 am
jackimo (9/30/2010)
There is no primary key.There are non-clustered indexes on uuid, service_name, event_time,last_name and personid.
Basically any report that causes a full table scan takes a long time (around an hour last time I checked).
The reports are general reports ex. a count of messages by service name where they meet another condition on a field without an index.
I was thinking that I needed to separate the message out of this table so the reports don't read the message field? Or maybe because of the volume of inserts the query is slow?
Hard to help when you don't help us to help you 😀 why you refuse to post the offending "where clause"?
Going back to the "reporting" issue which I understand is the core issue here...
-- 40M rows is not a large table
-- I bet a buck indexing strategy is not designed to serve offending reporting query.
Look at offending query predicate, look at the cardinality of each column used either for access or filtering, look at available indexes.
Hope this helps.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.September 30, 2010 at 7:08 am
What is the total number of records return for the report?
I agree with the above, in that its doubt that your indexing (and maintenance) is setup for the reporting queries, which it should be. Definitely not that big a table and not very wide either, so should be workable. Although, personal, I hate to see reports run against a PROD system :hehe:
I would also look at the statistics for this table and find out when they were last updated.
Do you have 5 indexes? One on each of those fields?
I would suggest a clustered index... criteria for a CI are:
* narrow
* unchanging
* ever increasing
* unique
WHERE statement would be good 😉
September 30, 2010 at 8:57 am
I'm not trying to hide the query (I just didn't think it would help) - so here it is;
SELECT
b.sending_app,
b.service_name,
b.event_time,
b.message_type,
b.receiving_app,
b.direction
FROM [Message_Journal].[dbo].[Journal] b
WHERE CONVERT(DATETIME,LEFT(CONVERT(VARCHAR, b.event_time, 120),10)) >= '20100901'
AND CONVERT(DATETIME,LEFT(CONVERT(VARCHAR, b.event_time, 120),10)) <= '20100930'
AND (b.sending_app = 'MAD'
OR b.sending_app = 'VS'
OR b.service_name = 'svcFrom_TEST_V3_MRI'
OR (b.service_name = 'svcFrom_TEST_V3_ADT_MRI' AND
( b.message_type = 'PRPA_IN101204CA' OR b.message_type = 'PRPA_IN101201CA'))
OR b.service_name = 'svcFrom_STAR_ADT'
OR b.service_name = 'svcOutToV3'
OR (b.service_name LIKE 'svcFromCR%' OR b.service_name = 'svcXlateFromCR') -- CR Outbound
)
GROUP BY
b.sending_app, b.service_name, b.event_time, b.message_type, b.receiving_app, b.direction
Statistics were updated 4 days ago. Thanks for all the posts! (and assistance)!
September 30, 2010 at 9:12 am
And there's a table scan:
CONVERT(DATETIME,LEFT(CONVERT(VARCHAR, b.event_time, 120),10))
Having a function on the column means that SQL Server can not know what the value is, and must scan the available values to find the match. This is known as a non-sargeable search argument.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 30, 2010 at 9:56 am
Thanks Grant,
I do understand that and the fact is in some cases we'll have to do a full scan. Do you think that storing the message in a separate table (in another file group) would help with performance?
Thanks for all the help!
September 30, 2010 at 10:28 am
Really bad idea:
WHERE
CONVERT(DATETIME,LEFT(CONVERT(VARCHAR, b.event_time, 120),10)) >= '20100901'
and
CONVERT(DATETIME,LEFT(CONVERT(VARCHAR, b.event_time, 120),10)) <= '20100930'
This will be much better if you have an index on event_time, because it allows you to use the index.
where
b.event_time >= '20100901' and
b.event_time < '20101001'
September 30, 2010 at 10:46 am
Micheal that doesn't work for me because the column is a datetime column so I need a conversion to get only the date, unless I'm missing something?
IE the date is 2010-08-18 05:30:36.133
September 30, 2010 at 11:17 am
:blink: I'm a little confused here.
First post identifies the table as being [dbo].[MessageLog] - ddl was provided.
Query is referencing table [Message_Journal].[dbo].[Journal]
Are both the same?
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.September 30, 2010 at 11:22 am
Sorry to confuse you - yes both are the same. I'm changing between environments and even confusing myself. Thanks!
September 30, 2010 at 11:30 am
jackimo (9/30/2010)
Micheal that doesn't work for me because the column is a datetime column so I need a conversion to get only the date, unless I'm missing something?IE the date is 2010-08-18 05:30:36.133
The example you showed would select everything for the month of September, 2010.
So will my query, just more efficiently.
Viewing 15 posts - 1 through 15 (of 28 total)
You must be logged in to reply to this topic. Login to reply