Re-design messaging table SQL Server 2008

  • I've recently built messaging functionality into my site, my current tables are designed as follows:

    This is basically the users sent box, when one user contacts another it first inserts a record into this table and then output's the inserted.id which is then inserted into MessageRecipient table

    CREATE TABLE [MailBox].[Message](

    [Id] [bigint] IDENTITY(1,1) NOT NULL,

    [SenderId] [bigint] NOT NULL, -- foreign key to user table

    [Message] [nvarchar](max) NOT NULL,

    [SentDate] [datetime] NOT NULL,

    [DeletedDate] [datetime] NULL,

    This one is basically the users inbox

    CREATE TABLE [MailBox].[MessageRecipient](

    [Id] [bigint] IDENTITY(1,1) NOT NULL,

    [MessageId] [bigint] NOT NULL, -- foreign key to table above

    [RecipientId] [bigint] NOT NULL, -- foreign key to user table

    [ReadDate] [datetime] NULL,

    [DeletedDate] [datetime] NULL,

    Now I'm not experiencing any issues with this, but from my experience just because something works doesn't mean its the correct way to do it. I'm curious to find out if theirs a better way of doing this? reason I ask is because my knowledge on SQL is limited and I've basically be crawling the web looking for examples and the above is what I've ended up with.

    These two tables may potentially grow which may cause issues with performance and also by the looks of it they don't really seem to be very flexible (I maybe thinking this because of my lack on knowledge on SQL)

    So I'm asking for some expert advice from a DBA or maybe someone similar in regards to how I should design such architecture for the messaging functionality.

    Examples would help me tremendously.

  • Do you need BIGINT for FKey to user? That's a lot of users! (and twice the width for each index entry / half the index keys per-index-page)

    What about NVarchar? I'm in UK, we have imposed Varchar on our users and never had a complaint yet that someone couldn't input the extended character they needed to. Users do enter some some foreign accents, but they are few and definitely but no Chinese characters etc., so your requirements may be very different? Varchar would halve your storage requirement, and I suspect speed things up, but obviously that would be useless if your users need extended character support.

    Probably nuts?!! but I suppose you could have both Varchar(MAX) and NVarchar(MAX) columns and only populate the NVarchar column if the message contains at least one extended character. Your display would be "one or the other" as they will never both be populated.

    Bit of a pain if you have Free Text Search functionality on email Messages

    I don't have any suggestions about your table design, looks OK to me. I might have done Message in one table and Sender/Recipient/Message IDs in another (with perhaps DeletedBySender and DeletedByRecipient columns)

    We have messages that are not sent by a user, per se. e.g.broadcast messages which are put in everyone's inbox and then each person can say "read" or "delete". Most such broadcast messages are deleted after some expiry date. I find it a bit easier to implement that with a Message Table and separate Inbox tables.

  • The design is basic, but it seems workable for very straightforward messaging.

    Just some possibilities, not necessarily better than what you have:

    1) I agree with an identity column on the Message table, but you should at least consider clustering that table on SentDate, or ( SentDate, Id ) if you want to explicitly specify a unique clustering key. If you do, you'd need a separate unique index on Id.

    2) I too would reduce the SenderId and RecipientId to int rather than bigint.

    3) I'd leave message as nvarchar, just in case.

    4) Strongly consider clustering the MessageRecipient table on ( MessageId, RecipientId ) rather than on Id. You could even consider dropping the identity column, but in this case I'd probably keep it.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • I think your basic design is good, although (as others have suggested) I would take a serious look at how many users you're going to have on your system and consider making the Users table ID as an integer instead of a bigint. Then adjust your foreign keys accordingly. Given the volume you might get, I don't think a bigint as your ID on your Messages table is out of line at all. All in all, it looks solid.

    One thing you might not have considered yet are replies and forwards. If this is something you need to keep track of, you're going to need a ParentMessageID column in your Messages table with a foreign key to ID. You'll have to leave it nullable, however, to cover an original message.

    Edit: If there's any chance of users needing to enter Unicode (Chinese, Thai, etc.) then I'd leave them as nvarchar. If there's no chance, then varchar is fine.

  • First, consider giving surrogate key identifiers a unique names, and a broadly used standard is to use <tablename>ID (ie: MessageID instead of just ID). I'd recommend making this same identifier both the primary key (as opposed to what in this case?) and also clustered (if you're concerned about optimizing disk storage, efficient joins, and low maintenance). Also as suggested by others, use the 4 byte Int datatypes instead of 8 byte BigInt.

    It's my understanding that SQL Server doesn't compress LOB columns, so enabling page compression will probably not buy you anything. However, I'd also suggest turning on the table option 'large value types out of row'. What this means is that message data contained within your nvarchar(max) column will be contained in a separate storage space from your other key and header columns. When you think about how most email or messaging columns work, only the header columns are displayed until the user clicks on a specific message to read it, so keeping the LOB data physically separate optimizes the storage and retrieval of the more frequently used header data.

    https://technet.microsoft.com/en-us/library/ms189087(v=sql.105).aspx

    I've also suggested a couple of non-clustered indexes.

    Regarding nvarchar (Unicode) versus varchar, I'd recommend nvarchar for an application that supports free-form messages. For reference data, product descriptions, localized city names, and even customer names, you can use varchar.

    CREATE TABLE MailBox.Message

    (

    constraint PK_Message

    primary key clustered ( MessageId ),

    MessageId int IDENTITY(1,1) NOT NULL,

    SenderId bigint NOT NULL, -- foreign key to user table

    Message nvarchar(max) NOT NULL,

    SentDate datetime NOT NULL,

    DeletedDate datetime NULL

    );

    create index ix_Message_SenderId on MailBox.Message ( SenderId );

    exec sp_tableoption 'MailBox.Message', 'large value types out of row', 'ON';

    For the MailBox.MessageRecipient table, I don't think you really need a surrogate key here. A combination of MessageID + RecipientID would make a suitable primary key. I'm assuming your application won't send the exact same MessageID to the same RecipientID twice. A forward or reply, yes, but should be a new message.

    CREATE TABLE MailBox.MessageRecipient

    (

    constraint PK_MessageRecipient

    primary key clustered ( MessageId, RecipientId ),

    MessageId int NOT NULL, -- foreign key to table above

    RecipientId int NOT NULL, -- foreign key to user table

    ReadDate datetime NULL,

    DeletedDate datetime NULL

    );

    create index ix_MessageRecipient_RecipientId on MailBox.MessageRecipient ( RecipientId );

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Ed Wagner (9/17/2015)


    I think your basic design is good, although (as others have suggested) I would take a serious look at how many users you're going to have on your system and consider making the Users table ID as an integer instead of a bigint. Then adjust your foreign keys accordingly. Given the volume you might get, I don't think a bigint as your ID on your Messages table is out of line at all. All in all, it looks solid.

    One thing you might not have considered yet are replies and forwards. If this is something you need to keep track of, you're going to need a ParentMessageID column in your Messages table with a foreign key to ID. You'll have to leave it nullable, however, to cover an original message.

    Edit: If there's any chance of users needing to enter Unicode (Chinese, Thai, etc.) then I'd leave them as nvarchar. If there's no chance, then varchar is fine.

    That's the thing in regards to replies, I havent got a parent table so when the user replies to a email it follows the same process i.e

    First insert a record into [MailBox].[Message] then insert a record into [MailBox].[MessageRecipient] the way I keep track of a conversation is by passing in two user Id's and basically selecting all correspondents between those users.

    At this time we aren't supporting forwarding so its just a send and reply.

    @SSCRAZY

    For the MailBox.MessageRecipient table, I don't think you really need a surrogate key here. A combination of MessageID + RecipientID would make a suitable primary key. I'm assuming your application won't send the exact same MessageID to the same RecipientID twice. A forward or reply, yes, but should be a new message.

    When the user replies to an email it follows the same process i.e you sent a new message instead of saying it was a reply to a previous message, which is giving me doubts because I personally think I should have a third table that keeps track of a conversation between two users so when I reply I can check that table to see if a conversation exists if it does then I wont need to regenerate a messageId

    But this is where I got confused and ended up starting this thread.

    I will also change the SenderId and ReceipientId to intergers as I've just looked what the max value it for an integer and its 2,147,483,647. Am I going to get more then 2 billion users? no.

  • Eric M Russell (9/17/2015)


    I'd also suggest turning on the table option 'large value types out of row'.

    What about if majority of messages are small enough to fit in-row? (I don't know the answer, but I'm curious about that 🙂 )

    .Netter (9/17/2015)


    Am I going to get more then 2 billion users? no.

    How do you feel about 65K users? Chopping the UserID down to smallint will double the number of keys-per-index-page. TO get to 65k you would have to live with negative user IDs - unless you can live with a 32K UserID limit.

  • Kristen-173977 (9/17/2015)


    Eric M Russell (9/17/2015)


    I'd also suggest turning on the table option 'large value types out of row'.

    How do you feel about 65K users? Chopping the UserID down to smallint will double the number of keys-per-index-page. TO get to 65k you would have to live with negative user IDs - unless you can live with a 32K UserID limit.

    I'm not sure what you mean in regards to chopping the userId to smallint? I think a Integer would be more then sufficient

  • .Netter (9/17/2015)


    I'm not sure what you mean in regards to chopping the userId to smallint? I think a Integer would be more then sufficient

    If your User Base is less than 65,535 then you could use SMALLINT instead of INT. Its half the size - so you'd get twice as many keys on an index page. (But SMALLINT is from -32K to +32K, so either you live with only 32K users (all positive numbers) or you have to also allow negative numbers to get the full 64K - SQL won't care, of course, but the minus-signs can look a bit daft if they need to be materialised on any reports etc. Plus URLs can mangle hyphen and Emails have a habit of word-wrapping a "hyphen" (you can escape/encode it to prevent that, but that's yet-more-hassle ...)

  • Kristen-173977 (9/17/2015)


    .Netter (9/17/2015)


    I'm not sure what you mean in regards to chopping the userId to smallint? I think a Integer would be more then sufficient

    If your User Base is less than 65,535 then you could use SMALLINT instead of INT. Its half the size - so you'd get twice as many keys on an index page. (But SMALLINT is from -32K to +32K, so either you live with only 32K users (all positive numbers) or you have to also allow negative numbers to get the full 64K - SQL won't care, of course, but the minus-signs can look a bit daft if they need to be materialised on any reports etc. Plus URLs can mangle hyphen and Emails have a habit of word-wrapping a "hyphen" (you can escape/encode it to prevent that, but that's yet-more-hassle ...)

    Oh ok, well I'll be sticking with Integer purely because we may have loads of users sign up 🙂

  • Can I be on commission pls? :w00t:

  • ha ha, Ok you have lost me now.......

  • I think that's a good idea. 32K comes sooner than you think, especially if this is something that's going to be public.

  • Kristen-173977 (9/17/2015)


    Eric M Russell (9/17/2015)


    I'd also suggest turning on the table option 'large value types out of row'.

    What about if majority of messages are small enough to fit in-row? (I don't know the answer, but I'm curious about that 🙂 )

    .Netter (9/17/2015)


    Am I going to get more then 2 billion users? no.

    How do you feel about 65K users? Chopping the UserID down to smallint will double the number of keys-per-index-page. TO get to 65k you would have to live with negative user IDs - unless you can live with a 32K UserID limit.

    Even if most messages are < 8 KB (max page size), I'd still probably want to contain message text off-row. My thinking is that you would typically want to optimize performance of retrieving and displaying message headers linked to a specific user. However, it depends on the case usage of the application.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Eric M Russell (9/18/2015)


    My thinking is that you would typically want to optimize performance of retrieving and displaying message headers linked to a specific user. However, it depends on the case usage of the application.

    Ah ... I've got it now (you didn't say anything different I must have just needed to hear it twice!!)

    Perhaps a short Subject column (which would be in-row) would achieve that (otherwise its just some IDs I think ...)

    In fact I'm not sure that a message system, in the style of EMail, is much use without a subject - all you can do is READ NEXT rather than being able to scan down a list looking for interesting ones, or "the one you read earlier which you now want to reply to" and probably a host of other actions.

    Our message system is more of a read-and-delete, rather than (EMail style) Read, Reply, Forward, File ... and just maybe: Delete)

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply