Large Table Design

Question

Post reply

Large Table Design

jackimo

SSCommitted

Points: 1885
More actions
September 30, 2010 at 5:32 am

#228861

I have table that currently has 40,000,000 rows (206 GB data space, 6.9 index space). This table holds text messages the system receives. It stores some key information about the message and then the text of the message. These messages are almost all less than 1 MB. The table is constantly being inserted into (usually around 500,000 new messages per day). We only keep the most recent 120 days.
We are now trying to do some reporting against this table and the reporting query performance is poor.
All the data is in the same file group. We generally won't need to query to get the messages and if we do we should be able to make use of an index.
Is there a better way to store these messages? Any thoughts? I've looked at filestream but because they are less then 1 MB this appears to not be a good solution. Should I store the messages in a separate table and join using a key? Maybe in another file group?
The table currently has no key - I've been told there really isn't a unique key to use ...
CREATE TABLE [dbo].[MessageLog](
[uuid] [varchar](50) NOT NULL,
[message_id] [varchar](50) NOT NULL,
[message_version] [varchar](20) NOT NULL,
[message_type] [varchar](20) NOT NULL,
[direction] [varchar](3) NOT NULL,
[processing_code] [char](1) NOT NULL,
[event_time] [datetime] NOT NULL,
[sending_app] [varchar](45) NULL,
[sending_facility] [varchar](45) NULL,
[receiving_app] [varchar](45) NULL,
[receiving_facility] [varchar](45) NULL,
[person_id] [varchar](45) NULL,
[person_last_name] [varchar](45) NULL,
[person_first_name] [varchar](45) NULL,
[person_dob] [varchar](45) NULL,
[path] [varchar](255) NOT NULL,
[service_name] [varchar](45) NOT NULL,
[acknowledged] [nvarchar](max) NULL,
[message] [nvarchar](max) NOT NULL
) ON [PRIMARY]

Viewing 15 posts - 1 through 15 (of 28 total)

You must be logged in to reply to this topic. Login to reply

grahamc SSCertifiable Points: 6762 More actions · Answer 1

What is the search criteria for the reports? Are there any joins?

Which column(s) are the indexes on?

I dont see a PK declared is there one?

Is there a clustered index, if so which column(s)?

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 2

Please post ...

1- Available indexes

2- Predicate a.k.a "where clause" of offending queries.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

jackimo SSCommitted Points: 1885 More actions · Answer 3

There is no primary key.

There are non-clustered indexes on uuid, service_name, event_time,last_name and personid.

Basically any report that causes a full table scan takes a long time (around an hour last time I checked).

The reports are general reports ex. a count of messages by service name where they meet another condition on a field without an index.

I was thinking that I needed to separate the message out of this table so the reports don't read the message field? Or maybe because of the volume of inserts the query is slow?

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 4

Personally, based on what you're describing, I'd look into partitioning with a rolling window. You can seperate out by date into different partitions on different servers, break it up to every... 15 days as a starting point for experimentation. This does a bunch of stuff for you. First, you get a clustered index on the table, which just about every single table should have (exceptions should be exceptional). Next, you get an easy way to remove large amounts of data without having to run delete statements on the system. Finally, data access is split up amongst multiple drives and/or servers, which out to speed things up considerably.

But, assuming you can't do that, you need to assess the most common access path for the select statements (which columns are used in the WHERE clause) and make that the clustered index, depending on what the columns are of course. From there, ensure that your queries are well constructed, sargeable, etc., the whole standard set of tuning operations.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 5

jackimo (9/30/2010)
There is no primary key.
There are non-clustered indexes on uuid, service_name, event_time,last_name and personid.
Basically any report that causes a full table scan takes a long time (around an hour last time I checked).
The reports are general reports ex. a count of messages by service name where they meet another condition on a field without an index.
I was thinking that I needed to separate the message out of this table so the reports don't read the message field? Or maybe because of the volume of inserts the query is slow?

Hard to help when you don't help us to help you 😀 why you refuse to post the offending "where clause"?

Going back to the "reporting" issue which I understand is the core issue here...

-- 40M rows is not a large table

-- I bet a buck indexing strategy is not designed to serve offending reporting query.

Look at offending query predicate, look at the cardinality of each column used either for access or filtering, look at available indexes.

Hope this helps.

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

grahamc SSCertifiable Points: 6762 More actions · Answer 6

What is the total number of records return for the report?

I agree with the above, in that its doubt that your indexing (and maintenance) is setup for the reporting queries, which it should be. Definitely not that big a table and not very wide either, so should be workable. Although, personal, I hate to see reports run against a PROD system :hehe:

I would also look at the statistics for this table and find out when they were last updated.

Do you have 5 indexes? One on each of those fields?

I would suggest a clustered index... criteria for a CI are:

* narrow

* unchanging

* ever increasing

* unique

WHERE statement would be good 😉

jackimo SSCommitted Points: 1885 More actions · Answer 7

I'm not trying to hide the query (I just didn't think it would help) - so here it is;

SELECT

b.sending_app,

b.service_name,

b.event_time,

b.message_type,

b.receiving_app,

b.direction

FROM [Message_Journal].[dbo].[Journal] b

WHERE CONVERT(DATETIME,LEFT(CONVERT(VARCHAR, b.event_time, 120),10)) >= '20100901'

AND CONVERT(DATETIME,LEFT(CONVERT(VARCHAR, b.event_time, 120),10)) <= '20100930'

AND (b.sending_app = 'MAD'

OR b.sending_app = 'VS'

OR b.service_name = 'svcFrom_TEST_V3_MRI'

OR (b.service_name = 'svcFrom_TEST_V3_ADT_MRI' AND

( b.message_type = 'PRPA_IN101204CA' OR b.message_type = 'PRPA_IN101201CA'))

OR b.service_name = 'svcFrom_STAR_ADT'

OR b.service_name = 'svcOutToV3'

OR (b.service_name LIKE 'svcFromCR%' OR b.service_name = 'svcXlateFromCR') -- CR Outbound

)

GROUP BY

b.sending_app, b.service_name, b.event_time, b.message_type, b.receiving_app, b.direction

Statistics were updated 4 days ago. Thanks for all the posts! (and assistance)!

Grant Fritchey SSC Guru Points: 398909 More actions · Answer 8

And there's a table scan:

CONVERT(DATETIME,LEFT(CONVERT(VARCHAR, b.event_time, 120),10))

Having a function on the column means that SQL Server can not know what the value is, and must scan the available values to find the match. This is known as a non-sargeable search argument.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

jackimo SSCommitted Points: 1885 More actions · Answer 9

Thanks Grant,

I do understand that and the fact is in some cases we'll have to do a full scan. Do you think that storing the message in a separate table (in another file group) would help with performance?

Thanks for all the help!

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 10

Really bad idea:

WHERE

CONVERT(DATETIME,LEFT(CONVERT(VARCHAR, b.event_time, 120),10)) >= '20100901'

and

CONVERT(DATETIME,LEFT(CONVERT(VARCHAR, b.event_time, 120),10)) <= '20100930'

This will be much better if you have an index on event_time, because it allows you to use the index.

where

b.event_time >= '20100901' and

b.event_time < '20101001'

jackimo SSCommitted Points: 1885 More actions · Answer 11

Micheal that doesn't work for me because the column is a datetime column so I need a conversion to get only the date, unless I'm missing something?

IE the date is 2010-08-18 05:30:36.133

PaulB-TheOneAndOnly SSC Guru Points: 66772 More actions · Answer 12

:blink: I'm a little confused here.

First post identifies the table as being [dbo].[MessageLog] - ddl was provided.

Query is referencing table [Message_Journal].[dbo].[Journal]

Are both the same?

_____________________________________
Pablo (Paul) Berzukov

Author of Understanding Database Administration available at Amazon and other bookstores.

Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.

jackimo SSCommitted Points: 1885 More actions · Answer 13

Sorry to confuse you - yes both are the same. I'm changing between environments and even confusing myself. Thanks!

Michael Valentine Jones SSC Guru Points: 64818 More actions · Answer 14

jackimo (9/30/2010)
Micheal that doesn't work for me because the column is a datetime column so I need a conversion to get only the date, unless I'm missing something?
IE the date is 2010-08-18 05:30:36.133

The example you showed would select everything for the month of September, 2010.

So will my query, just more efficiently.