Size of Data Roadblocks

  • Hello!

    I'm starting to hit some roadblocks with my SQL Server due to the sheer amount of data I am querying. I need to find some ways to speed up SQL Server regardless of just optimizing queries and optimizing indexes if possible.

    I say that because something as simply counting data by day over the course of a week where this is a clustered index on time and non-clustered index on a single column in which I am counting is taking long periods of time itself. This is because I am starting to hit a mark in the sheer amount of data per day that is really starting to impact the process.

    That said, I'm pretty new to SQL Server with only 2 years in. I deal with a rather large data set that is in a OLAP structure with about 3 facts and 30 dimensions. Getting data in is not a problem. It's in batches and only contains new records for the previous day. Time is something that is critical and something I cluster on the most because it allows me to fit data into structure without having a lot of fragmentation.

    Unfortunately right now, I do not have the luxury of table partitioning, in-memory tables or loads of processing power. But, that's why I wanted to fish around the community to see what options are available? What really impacts performance outside of just optimizing queries and indexes with SQL Server when reading hundreds of millions even potentially a billion records? Processing power? Fast disks? Clustering? Where should I look to first?

    I'm on SQL Server 2008 R2 Standard. I have 2 processors with 4 cores a pop with a total of 8 cores total. I have about 16 GB dedicated to SQL Server and virtualized with about 2 vprocs dedicated to SQL Server on 10K disks.

    My first instinct is table partitioning and then more processing power dedicated to SQL Server.

  • If you want table partitioning, you need Enterprise edition.

    Also, table partitioning in itself is not a performance solution. It's more a maintenance solution for big tables. The possible performance benefit is just an extra.

    If you ever upgrade to Enterprise, you should definately enable page compression on your biggest tables.

    How big is your data? About how many gigabytes are we talking here?

    My first thought is to add more RAM, so that more data can be held in cache.

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (3/10/2015)


    If you want table partitioning, you need Enterprise edition.

    Also, table partitioning in itself is not a performance solution. It's more a maintenance solution for big tables. The possible performance benefit is just an extra.

    If you ever upgrade to Enterprise, you should definately enable page compression on your biggest tables.

    How big is your data? About how many gigabytes are we talking here?

    My first thought is to add more RAM, so that more data can be held in cache.

    I know table partitioning is only available in Enterprise. I can upgrade though. Sorry I failed to mention.

    I also know table partitioning is not in itself a performance solution, but it should increase performance. There should be no doubt about that. But, the management is also needed too with a table this size. So, don't get me wrong. Performance is what I am after along with other things not mentioned.

    We're talking about 500 GB of data at the moment with indexes included. It can grow over 1 TB or more in the near future. I would estimate a padding of 2 TB of data with indexes depending on what I restrict.

    Memory is something I can increase to 32 GB in the near future.

  • Partitioning is not done for performance reasons. It's for data management, fast loads and fast deletes. It won't necessarily improve performance, it may, in fact, reduce performance.

    If you're looking for performance benefits, stop looking at partitioning.

    My first thought is also memory. My home media server has 24GB of memory, my home (gaming) desktop has 16GB.

    32GB was a lot of memory 5 or so years ago. Standard edition can't support more than 64GB in SQL 2008 R2, so that should be what you increase memory to. If you can upgrade to SQL Server 2014, then Standard supports up to 128GB, so go for that.

    Above that, you need to identify what the current bottlenecks are. It's pointless increasing CPUs if the server it bottlenecked on IO. It's not useful increasing memory if the server is flatline 100% CPU due to bad queries, etc.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • It sure sounds like you need to do fundamental query tuning, indexing, statistics maintenance, etc. Assuming you can't get someone in to help you who has experience with all this, I'd suggest you start studying now. I'd suggest starting with the Troubleshooting for the Accidental DBA[/url] book. It's free to download. Then move on to my books.

    And partitioning can, in some cases, lead to performance increases, but it is absolutely not a guarantee of any kind. In fact, I have more often seen it lead to performance headaches because after implementing it, the people find that they can't guarantee the queries will always be based on the clustering key. If you can't absolutely guarantee that, then partitioning will not only fail to improve your performance, it will degrade it. Do not approach partitioning as a performance tuning mechanism. It is meant as a data management tool.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (3/10/2015)


    ... Then move on to my books.

    Subtle. :Whistling: 😀

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • To be honest, I wouldn't like to manage 2TB of data with standard edition (and possible 16GB of RAM).

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Koen Verbeeck (3/10/2015)


    Grant Fritchey (3/10/2015)


    ... Then move on to my books.

    Subtle. :Whistling: 😀

    I keep updating them for a reason.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Koen Verbeeck (3/10/2015)


    To be honest, I wouldn't like to manage 2TB of data with standard edition (and possible 16GB of RAM).

    The Standard Edition I could live with (if it was SQL 2014), the 16 GB memory though, definitely not. Maybe if the entire 2TB was on SSD so that everything wasn't frequently waiting on slow IOs. Maybe.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Grant Fritchey (3/10/2015)


    Koen Verbeeck (3/10/2015)


    Grant Fritchey (3/10/2015)


    ... Then move on to my books.

    Subtle. :Whistling: 😀

    I keep updating them for a reason.

    The private jet doesn't pay for itself, right 😀

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • I have 2 processors with 4 cores a pop with a total of 8 cores total. I have about 16 GB dedicated to SQL Server

    As has been said just that sounds very underpowered for a datawarehouse that's 500GB and you plan to exceed 1TB. Partitioning won't necessarily help if users are querying on the entire data set, which is much more likely in an OLAP set up. And query tuning, index optimizing, defragmetning, statistic updates will only go so far if you don't have the hardware to support it.

    One option would be to determine what reports people are running out of it and preload those reports, at least that might put some of the performance hits on the server during off hours time instead of impacting users when they want to see their data.

  • Koen Verbeeck (3/10/2015)


    Grant Fritchey (3/10/2015)


    Koen Verbeeck (3/10/2015)


    Grant Fritchey (3/10/2015)


    ... Then move on to my books.

    Subtle. :Whistling: 😀

    I keep updating them for a reason.

    The private jet doesn't pay for itself, right 😀

    Yep. Should have enough money saved up to buy this one soon.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • If the table is already clustered on time (hooray!), then reading a week's worth of data shouldn't take that long.

    We need to review two things (at least).

    First, the query code itself. We need to verify that the query is doing a seek. Maybe the lookup datetime are "nonsargable", and you're forcing SQL to scan the table rather than do a keyed seek. That would have a huge impact on performance with a large table.

    Second, missing index and index usage stats. This will identify how SQL is using existing index(es) and what index(es) SQL "thinks" are missing.

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • ScottPletcher (3/10/2015)


    If the table is already clustered on time (hooray!), then reading a week's worth of data shouldn't take that long.

    We need to review two things (at least).

    First, the query code itself. We need to verify that the query is doing a seek. Maybe the lookup datetime are "nonsargable", and you're forcing SQL to scan the table rather than do a keyed seek. That would have a huge impact on performance with a large table.

    Second, missing index and index usage stats. This will identify how SQL is using existing index(es) and what index(es) SQL "thinks" are missing.

    Yeah, it's doing a seek. It wasn't at first. I threw in a table hint to point to the clustered index, which shaved off some time by 20 seconds versus the scan on the non-clustered index that's on a column that's not even being factored at all. That's over 200 million records over 20 days in about 49 seconds with the hint, after a statistics update and 0 fragmented index.

  • Woke up to loads of responses. Good!

    But, there is a lot of assumptions being thrown around here on how the data is being used that wasn't specified. 😀

    GilaMonster (3/10/2015)


    Partitioning is not done for performance reasons. It's for data management, fast loads and fast deletes. It won't necessarily improve performance, it may, in fact, reduce performance. If you're looking for performance benefits, stop looking at partitioning.

    Unfortunately, I have to disagree here mainly because you're making a big assumption here (see Grants response). You know without a doubt that table partitioning can in fact increase performance on querying depending on your partitioning strategy and how it's being used. It doesn't guarantee it, but it can increase performance. It also doesn't mean it should be the sole reason to add table partitioning, but I have an extremely large table that will benefit from it on other ends too. That and how your partition and how it's used is in queries from my understanding (correct me here if I'm wrong) greatly impacts how table partitioning either improves or degrades performance. Like for example, joining on a column that is not in a partition will slow down the query, not improve the query performance. That among other things.

    GilaMonster (3/10/2015)


    Above that, you need to identify what the current bottlenecks are. It's pointless increasing CPUs if the server it bottlenecked on IO. It's not useful increasing memory if the server is flatline 100% CPU due to bad queries, etc.

    Very good point. Right now, I'm just talking about a basic queries that are utilizing the indexes in place along with updated statistics and so forth. I'm using the basics to sort of see how it's handling basic querying of a large data set. If it's not measuring up, complex queries are likely going to be worse in my mind. But, feel free to correct me if that's the wrong way of thinking too.

Viewing 15 posts - 1 through 15 (of 24 total)

You must be logged in to reply to this topic. Login to reply