What shall I do with large tables with most of rows expired?

  • Dear guys

    I'm not very good in sql server, but I should develop a database for a web site. My problem is with two tables of subscription and delivery.

    In subscription table, it's obvious that each row of subscription is expired after a period(say 1 month). Then, after a while, I will have a large table of expired lines and a few ongoing lines. how can I help server to find alive lines soon in between too many expired liens? which kind of index? my table contains a column called ExpirationDate for this purpose.

    my problem with delivery table is more or less the same as subscription(but more sever), because delivery records are expired after delivery is performed and after a short while I have too many delivered lines and find alive lines in between them.

    In fact, I can imagine of moving dead lines to another table, but at this time, I have not enough time and I'm not sure about final structure of my tables. Then managing such programs become hard.

    and about size of tables, say less than 2'000'000 lines for next few years. is this size worth of paying attention to such matters?

    Thanks a lot.

  • [font="Verdana"]You think right. Moving such records to History table is good approch, so that you can get active records quickly. Also go through the below URL, this might help you to design your DB.

    http://www.sqlskills.com/resources/Whitepapers/Partitioning%20in%20SQL%20Server%202005%20Beta%20II.htm#_Toc79339943

    Thanks,.

    Mahesh

    [/font]

    MH-09-AM-8694

  • When you say 2 million rows, do you mean, per year or total. If total, I wouldn't sweat migrating the data out of that table assuming you've got good indexes in place and if you don't have good indexes in place, getting good indexes will do more for you than moving data around. If you're talking 2 million rows a year, even then, you might not have any need to move data out of the table. It really depends on how well your queries are functioning with the data & indexes you have.

    Can you set up a test, load the table up with 10 million rows and verify that your queries still work? That's how I'd do it.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks guys. At this time, regarding time of project, I prefere going for Indexes.

    But there are two things that I don't know

    1- In my queries, there are more than 1 criteria to look for. one of them is subscription date. I don't know if sequence of writing criteria is Important or not? I'm worried if query searches for other criteria in whole table and at last searching for Indexed column in result.

    2-I'm going to use Non Clustered index on SubscriptionExpiration column because:

    2-1- There are many values in this field (many dates)

    2-2- Not much of data is retrived(Only one record in each query)

    2-3- This query is frequently used.

    Do you have any recomendations for me? And pleas accept my appologize for so basic questions, As I wrote befor, I'm new to sql server.

    Thanks in advance.

  • Unless you've got a very good reason not to, you should make sure there is a clustered index on the table. It's really hard to tell you which column not knowing the structure or your queries.

    As to additional indexes, again, it really depends on the queries being run against it. You'll know best what's there. I think you'll usually find that there is a common access path, a column or set of columns that is used in most queries against the table. These are the candidates for use in an index. Yes, order of columns does matter. You should have either the most frequently used or the most selective first (best if it's the most frequently used AND the most selective) and the others in order of use and selectivity. There isn't one perfect answer, and even the best answer today may not be tomorrow as the data changes. You'll need to perform tests to verify that you've picked the right index. Use the execution plans to ensure that the indexes are being used appropriately.

    That's the basics. For more detailed answers, you'll need to post queries and structures.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • [font="Verdana"]Grant, whatever you said earlier, regarding indexes, I am completely agrees with you. Using Indexes in proper manner, performance increases. However as Ahmedi said, each row get expire after some time. Means such rows can not be used in future, except for reporting kind of purposes. So still I believe, moving such records to history table(s) is a good approach. DBA needs to maintain them, that’s it. Here is the matter millions of records, which can keep increasing year by year. SQL Server has ability to handle such huge data, but why should increase a bit overhead on server? If Ahmedi has a system where he is concern only about live records, then just keeps those records which are alive and keep the rest in another, say history, table. And here Ahmed can create proper indexes only on alive data, so that he can get much better performance.

    Correct me, if I am going wrong.

    Thanks,

    Mahesh

    [/font]

    MH-09-AM-8694

  • No way, you're not wrong. It's just a question of trade-offs. Does having the "dead" rows affect general performance of the system? No? Then you have to weigh whether or not setting up the mechanism for cleaning them out is worth the trouble or not. It's going to be an individual decision based on factors such as database size, number of rows, how often you need to run maintenance, are there reports that will need to union between the two tables, etc. On the other hand, if the answer is, yes, then you just have to go & set up the maintenance routines to do the work.

    So I'm not saying you're wrong, not by any stretch of the imagination. I'm saying, it depends.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks alot

    I think History tables are necessary ,but I may take care of it after debugging the other parts of project. now I have enough things to think about. and I don't tink within first few months of my web site started, I have more than 100 subscribers.

    Because this site will have many frequent queries to run, later I will need to optimize queries and definetely one of the most important thing will be swaping dead rows. But for now, I think I have time for that.

    This is my subscriptions table:

    CREATE TABLE [dbo].[Subscriptions](

    [AutoNumber] [int] IDENTITY(1,1) NOT NULL,

    [UserName] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,

    [ChannelUniqueID] [int] NULL,

    [SubscriptionEnd] [datetime] NULL,

    [SubscriptionApproved] [bit] NULL CONSTRAINT [DF_Subscriptions_SubscriptionApproved] DEFAULT ((0)),

    [FeePaid] [bit] NULL CONSTRAINT [DF_Subscriptions_FeePaid] DEFAULT ((0)),

    [TotalPips] [int] NULL,

    CONSTRAINT [PK_Subscriptions] PRIMARY KEY CLUSTERED

    (

    [AutoNumber] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    and this is a sample of my queries; This will send emails for all subscribers of a news channel.

    SELECT @ChannelID= ChannelID, @CommandType=CommandType,@OpenTime=OpenTime , @CloseCode=CloseCode

    FROM CommandGroups

    WHERE (GroupUniqueID = @GroupID)--This is for creating message body. @OpenTime is used in next query

    /*The following is main query for finding email address of subscribers*/

    SELECT aspnet_Membership.Email

    FROM aspnet_Users INNER JOIN Subscriptions ON aspnet_Users.LoweredUserName = Subscriptions.UserName INNER JOIN aspnet_Membership ON aspnet_Users.UserId = aspnet_Membership.UserId

    WHERE (Subscriptions.ChannelUniqueID = @ChannelID) AND (Subscriptions.SubscriptionEnd > @OpenTime) AND (Subscriptions.SubscriptionApproved <> 0)

    AND (Subscriptions.FeePaid <> 0)

    OPEN abc

    FETCH NEXT FROM abc into @Emails

    WHILE @@FETCH_STATUS = 0

    BEGIN

    if Len(@Emails)=0

    set @Emails=@Email

    else

    set @Emails= @Emails + ';' + @Email

    if Len(@Emails)>3500

    begin

    EXEC msdb.dbo.sp_send_dbmail

    /*@profile_name = 'AdventureWorks Administrator',*/

    @blind_copy_recipients = @Emails,

    @body = @Message,

    @subject = @MailSubject;

    set @Emails='';

    end

    FETCH NEXT FROM abc into @Email

    END

    if Len(@Emails)>0

    begin

    EXEC msdb.dbo.sp_send_dbmail

    /*@profile_name = 'AdventureWorks Administrator',*/

    @blind_copy_recipients = @Emails,

    @body = @Message,

    @subject = @MailSubject;

    set @Emails='';

    end

    CLOSE abc

    DEALLOCATE abc

    May you have any recommendation for query order or index?

    thanks alot

  • It looks like your main index on your Subscriptions table should be:

    create index IDX_Subscriptions_Main

    on dbo.Subscriptions (UserName, ChannelUniqueID, SubscriptionEnd, SubscriptionApproved,

    FeePaid)

    That will cover what you need for that query. You might need to change the sequence of the columns in the query. Try it with SubscriptionEnd first and with UserName first and see which works best.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply