Speedup big database - MSSQL 2005

  • Hi All,

    I'm working with 150GB database. And there only one table with 427 columns in there. There are more then 175 million records in it. Each row like below:

    77084,6495,0018,M,0,,1,91835,3360,3,1,540600,1,007,48,201,001008,064,TX,,N,0000000000,S,1,N,F,0027A....

    When I run "select count(*) from table" query, It took me more than 2 hours. And It seems all queries takes same time like that. I tried creating index on some columns but the speed is no better.

    I don't know what problem is? Anyway to speedup this DB?

    I'm running Window server 2003 with 2 CPUs (2.13GHz), 4GB of RAM, MS SQL 2005.

    Thank for any help.

    Eirc.

  • for sql server 2005 and windows server 2003 you do not have enough memory or processors.

    Also when you do a SELECT COUNT(*) you are scanning the whole table so changing the indexes around will not help

    Have you checked to see if you are being blocked when you are running the count?

  • Hi JDixon,

    Thank for reply.

    Running the count is just a sample query. What I do in this case? Increase memory or buy new CPUs?

  • sidenote to get a count much faster than select count(*) :

    DBCC UPDATEUSAGE (0) WITH COUNT_ROWS, NO_INFOMSGS

    SELECT OBJECT_NAME(ID) AS TableName, [Rows]

    FROM dbo.SysIndexes

    WHERE IndID IN (0,1)

    AND OBJECTPROPERTY(ID,'ISTABLE') = 1

    ORDER BY TableName

    performance is much more related to the execution plan of specific slow running queries. while you can throw hardware improvements at the problem, the analysis of execution plans and making adjustments will make a larger impact than a hardware upgrade.

    can you show us the CREATE TABLE of the table?(at 400+columns, attach it, don't paste it into the forum)

    also, what indexes are on the table already?

    do you regularly update statistics on the table?

    can you give us an example for a slow running query? can you attach the execution plan as a .sqlplan file so we can see it?

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Why on earth do you have a table with 400+ columns?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Lowell (10/10/2009)


    sidenote to get a count much faster than select count(*) :

    DBCC UPDATEUSAGE (0) WITH COUNT_ROWS, NO_INFOMSGS

    SELECT OBJECT_NAME(ID) AS TableName, [Rows]

    FROM dbo.SysIndexes

    WHERE IndID IN (0,1)

    AND OBJECTPROPERTY(ID,'ISTABLE') = 1

    ORDER BY TableName

    Update usage should not be required on SQL 2005. The bugs that affected the metadata are (mostly) fixed.

    Also, on 2005 you can use sys.partitions instead of sysindexes (which is deprecated). Sys.partitions should be very close to accurate, though not guaranteed 100% accurate.

    SELECT Object_name(object_id) AS TableName, sum(rows) AS TotalRows

    FROM sys.partitions where index_id IN (0,1)

    AND ObjectProperty(object_id,'IsUserTable') = 1 -- remove this filter if you want to see the system tables too

    Group By object_id

    The sum is there for the case of partitioned tables which will have more than one entry in sys.partitioned for the heap/cluster

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • JDixon-385826 (10/10/2009)


    Also when you do a SELECT COUNT(*) you are scanning the whole table so changing the indexes around will not help

    Count(*) will scan the smallest (most efficient) index that exists on the table. It will only scan the cluster/heap is there are no NC indexes or if all the NC indexes are less efficient (massive and/or badly fragmented)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Please see my attachment for table detail. I tried running "select distinct STATE_3038 from table" query and it took me more than 20 minutes to complete. Or run query like "select * from table where STATE_3038 = 'FL'", it took me a lot of time.

    Should we create many indexes (around 10) on one table?

    Thanks for reply.

  • you have two options:

    1) redesign your database - use normalization

    2) buy much faster hardware

    there is one more options:

    maybe indexed views can help you in this problem ... but in my opinion you should redesign this database

  • agreed. the only way you will get better performance from your setup is to increase hardware. If I had to say buy what Memory or CPU I guess I would say Memory then after the first execution the 2nd should be in memory for a faster execution.

    I do not think you will have anything looking like a good execution plan for any query hitting a table with over 400 columns.

    You have a real issue that requires real solutions.

    Best idea would be to increase cpu and memory and if possible upgrade to sql 2008 and use page compression so you can fit more data in memory. But you would need alot more memory and cpu but you would see better performance in the long run (after the first execution). You might also benefit from filtered indexes.

    Also there are ways to make a query or sproc better be following best practices you can make sure you have the best and most reusable execution plan. (that being said with the limited memory you most likely do not have cached execution plans for very long or data sitting in cache for very long.

    New hardware, normalize data (break in to multiple tables and use a view for display),

    Also it is not unheard of for index size to be more then the actual data. If needed for performance and it does not degrade insert performance I would yes add more indexes.

  • I receive a 150GB (maybe more) flat file from customer regularly. And It will take me 2 or 3 days to import into DB. In that case, re-design DB is a good solution? How can we import DB if we re-design DB?

  • you could stage the data in the one table but then break it out from there for performance. It really depends on how you are using the data too. Perhaps it is more beneficial to create summary tables or something similar. How is the data used?

  • Eric1208 (10/11/2009)


    I receive a 150GB (maybe more) flat file from customer regularly. And It will take me 2 or 3 days to import into DB.

    2-3 days to import 150GB? That's an exceptionally long time. How are you importing it?

    In that case, re-design DB is a good solution? How can we import DB if we re-design DB?

    Normalisation is good and massive, single tables are, as you've noticed, often bad. Without knowing how you're currently doing the import I can't suggest a replacement. Have you looked at SSIS though?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • No matter how you put it, 400+ columns in a table is flat out bad design.

    There is also no reason for it to take that long to import the data.

    Why are you not using SSIS to do it?

    It does not matter how you get the data. They are sending it to you for a reason. What are you doing with it?

    Why are you doing a COUNT(*)? If you want to know how many records, go to the properties of the table and let that tell you.

  • 2 words:

    Slow Disk.

    disk i/o is almost always the bottleneck in a database.

    Your system could use a little more RAM, too. that would help. Reducing demand is also a good solution.

    To get some facts about the supply of I/O in your system, check perfmon for acceptable disk queue lengths and page life expectency for SQL Server memory while you're running your imports. You might see some alarming numbers.

    You can reduce I/O demand by making good index choices and good read/write choices.

    for example... if your database's data and log file are both on a RAID 5 D drive, don't also put the 150GB import file on D you're doing all your reading and writing from the same disk.

    Also, use simple recovery on the database so you don't log the whole import transaction.

    Thirdly, don't delete * from the table, use truncate table

    Fourthly, drop the indexes before importing and create them after the load is finished, that way your indexes don't start out completely fragmented and you're not writing more than you need to.

    If all this is good and fine, I'd suggest following the normalization practices and taking a good look at your indices. I have a strong suspicion that several of the things I've outlined are not in practice yet.

    Thanks

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply