March 21, 2011 at 10:12 am
Hello everyone. My company does not have a DBA and I can't convince my boss to hire one, even short term, so it is up to me to learn what I can.
We have a growing database for an asp.net application. I am trying to learn performance testing and such.
I'm wondering though if the database was not best designed in the first place. The guy who originally created it is no longer with the company.
Ok, let me try to describe our setup. We have hundreds of clients with several mobile units each. Mobile units submit data to be inserted into the DB 24/7. Clients log in the website and run reports typically during business hours in their time zone. I would call that the peak reporting request time, anyways. It is also the peak mobile unit inserting time as well I suppose.
I know our inserts aren't as fast as they could be. The insert routine itself does a few checks itself to make sure it can insert the data and to also update other data based on the inserted data. You could say it is an insert transaction/rollback scenario.
Overall there are thousands of mobiles inserting data. Even though we have hundreds of users I would guess that there are never a hundred users online simultaneously and running reports.
A typical report could take 20 seconds to a minute or more from request to webpage is done loading.
We dump old data after a month to not bog down. Most clients are happy with a month of data but others would love to be able to see a year of data.
My points I would like advice on are:
- I am thinking I need to alter the design to somehow focus on mobile data getting submitted quickly to the database since there can be hundreds of mobiles simultaneously trying to submit.
-Reporting is important but not at the expense of new data getting inserted in a timely fashion
-Reporting can not be too slow or users will complain
- I am wondering if the data tables should be partitioned somehow for better performance. A query on company A's 1000 records will be much quicker than a search for that 1000 within millions of everybody combined's records.
- How to adjust to be able to store a year of data without slowing down the whole system
Any help is greatly appreciated!
March 21, 2011 at 10:25 am
First, you really need to think about breaking this down and hiring someone with experience. I understand the boss might not want to do that, but perhaps you can collect some metrics, show some slowdowns and it would be worth getting consulting.
In the short term. Redesigning a db is different that redesigning code. You might think about fixing code where you can, rather than changing the database. The database design changes are usually going to require lots of code changes, and that's a big deal.
How much data is there? Trying to get a feel for the scale of how much data you have.
In terms of long term data, can you move that data to another system and run reports from there? Keep the same structure, just provide a way to connect to a different server and run reports there?
What type of latency can you have on reports? In other words, do you need reports on the data that was just inserted? Or can it be seconds/minutes/hours old?
Check your indexes. Often a better indexing strategy can make a huge difference.
Partitioning can work, if you have Enterprise edition.
March 21, 2011 at 1:26 pm
Thanks for the quick reply, Steve.
I won't give up on getting someone on staff, but entertain me here for a moment.
I'm working off of a backup that has 2.6 million data records over a 30 day period for all clients. I don't know what is considered a lot of records, 2 million, 200 million, more? This table is the one inserted into all day long and is also the main table that reports are run against (inner joined with other tables).
We have a ping feature where a request is made to the unit to report back "right now" so in that case immediate display is necessary.
I want to say the majority of the time a report is run on a date range of data weekly or less frequently. So in that case the inserted data does not need to be immediately available.
Somewhere in between are the people who run a daily report or several times a day report.
We have Workgroup edition of SQL. Can we do partitioning?
I've read where people suggest to partition based on clients into multiple databases.
I've also read suggestions to have a cutoff date so like all data younger than 30 days is in the main database and data older than 30 days is in the archival database, which would be understood by the user to take longer to display since it is an "archive".
What would be an example of designing a database that favors inserts?
March 21, 2011 at 2:13 pm
Well, the first thing to check would be the table design. There are designs that help with multiple, concurrent inserts, and designs that cripple them. The most important thing on that is the structure of the clustered index. Are you comfortable with checking that? (I can't quite tell from your posts how familiar you are with database design, so if I come across as patronizing or whatever, just let me know and I'll up the vocabulary and all that.)
You mentioned table partitioning, which would definitely be something to look into, but first I'd focus on indexes (clustered and non), and on the report code vs the insert code.
Can you post the definition (create script) for the main table?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 21, 2011 at 4:32 pm
Partitioning isn't an option with Workgroup Edition. Need Enterprise Edition for that.
2mm isn't a lot. 200mm, yes. However that depends on table design. If I had a narrow table, say 100bytes of width, then I have 80 rows per page. I can really limit pages and extents with a narrow table. 8mm records would be 12k extents, holding around 700MB of data. Not a ton these days.
As Gus mentioned, indexing is probably the first thing I'd look at.
Another thing to consider, can you perhaps pull the data into one database and then slide it into another one for reporting? If you can move the data, you might be able to do some aggregation or separate indexing on the second database that is more in line with what you need for reporting.
Separating different clients into different databases can help. It allows you to scale out one client if they are out of whack with the others in terms of load. There's admin overhead to this, and you don't necessarily get any benefits up front here unless you can separate out the clients' data onto separate physical drives.
March 21, 2011 at 4:41 pm
allbymyself (3/21/2011)
I'm working off of a backup that has 2.6 million data records over a 30 day period for all clients. I don't know what is considered a lot of records, 2 million, 200 million, more? This table is the one inserted into all day long and is also the main table that reports are run against (inner joined with other tables).
Is it a lot or a little? That's going to depend entirely on your hardware. What's your gear look like? SAN or attached RAID? single dual core or a couple of quad cores? Memory? Is 2k5 the correct version (you're in the 2k5 forum, just confirming)? There's a lot of different things that go into is 2 million rows a lot.
We have a ping feature where a request is made to the unit to report back "right now" so in that case immediate display is necessary.
Is this the database pulling the last entry for a unit in a report, the database/app pinging a mobile to give new data, or something else entirely I'm not catching?
I want to say the majority of the time a report is run on a date range of data weekly or less frequently. So in that case the inserted data does not need to be immediately available.
Running for a week overall, or viewing the month grouping by the week?
I've read where people suggest to partition based on clients into multiple databases.
I've also read suggestions to have a cutoff date so like all data younger than 30 days is in the main database and data older than 30 days is in the archival database, which would be understood by the user to take longer to display since it is an "archive".
A reasonable possibility, both of them, but you'd have to have the hardware to support that concept.
What would be an example of designing a database that favors inserts?
Your design is the classic contention between OLTP (Transactional) and OLAP (analytical) database designs. Less indexes, tighter data, append only indexing usually helps with OLTP. Heavy indexing, de-normalized data, and indexes associated more with data retrieval then inclusion are more apparent in OLAP. A mix between the two is what you're going to need here, most likely.
As above, can you show us the ddl and a few sample rows from your primary table, and maybe a reporting call or two? The more specifics we have about your actual structure and usage methods, the better we can advise you.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
March 22, 2011 at 5:19 am
It's hard to do more than suggest general approaches based on the information provided.
The one concern I would have with the existing structure: what do the cluster indexes look like? Tables and data and queries are so dependent on those things or the lack, that would be my first concern.
Other than that, gather metrics. Identify the pain points. Work through them one at a time. You'll probably start to identify patterns that are problematic. When you do, try to address the problem as a unit. Test everything. Document the changes you make. The basics.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 22, 2011 at 12:04 pm
Thanks all for the feedback.
In my googling i've come across OLTP vs OLAP articles and didn't realize that my app was of that kind of magnitude. This has turned into one of those examples of a college degree can't replace years of experience. I went back to one of my database textbooks and wouldn't you know it that there is a whole chapter on data warehousing at the END OF THE BOOK, chapter 25. We probably didnt make it further than chapter 12 in class.
I've always worked on solutions thinking of "the database" as "the" place for all data, without the notion that I may need 2 or more databases with different purposes for 1 solution. This is new and interesting to me. It makes sense that the reporting queries and the insert routines can adversely effect each other's performance.
It will be quite a task to rethink about our solution in terms of a transaction part and an analysis part.
I wonder what the performance gain would be if I were able to successfully split the database up into OLTP and OLAP? If it would mean queries would be speedier or that I can process more simultaneous queries?
I'll get back to you on indexes when I have a little more time.
Thanks
March 22, 2011 at 12:09 pm
This is not automatic, but usually when you have 2 dbs like that, you also have 2 servers. So the speed increase is almost automatic.
Even if you keep it all on the same machine, you get less contention so that gives a boost too.
March 22, 2011 at 12:16 pm
Just splitting them up will usually give some performance increase, since it will reduce locks and contention. Maybe not enough, though.
Even in that scenario, the design of the main table, including its clustered index, will still be the key to fast inserts.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 22, 2011 at 12:57 pm
Since you don't have a DBA, I have to ask this. Are you comforatable with your backup and disaster recovery processes?
March 24, 2011 at 10:52 am
I've managed to get my boss to understand the need for a DBA. I think though that he wouldn't want to hire one fulltime.
What are your thoughts on hiring a fulltime or parttime DBA as an employee vs paying for database consulting services from some company?
Is there enough work to keep a DBA busy 40hrs/week?
Maybe we just need a consultant for periodic auditing and maintenance?
March 24, 2011 at 10:58 am
Just consider the ressources required to find / hire a competent DBA (especially with noone around to tell you what a good candidate looks like).
It might be eaiser to hire a consultant out for a couple days for an audit and then follow recommendations for training, disaster recovery and whatnot.
Then you'll know if it's worth investing with full time staff.
Also you could find someone like me who's decent enough at db maintenance and recovery... and can double as full time report dev.
March 24, 2011 at 1:00 pm
allbymyself (3/24/2011)
I've managed to get my boss to understand the need for a DBA. I think though that he wouldn't want to hire one fulltime.What are your thoughts on hiring a fulltime or parttime DBA as an employee vs paying for database consulting services from some company?
Is there enough work to keep a DBA busy 40hrs/week?
Maybe we just need a consultant for periodic auditing and maintenance?
It'll depend on what you want by way of ROI on hiring one.
Generally speaking though, you don't hire a DBA because you have enough work for one 40 h/w, you hire one on salary exempt because you have enough work for one to keep one busy most of the year, or because you have enough value in your databases that the salary is worth it just to prevent crash-and-burn scenarios, regardless of day-to-day activity.
I haven't had a DBA job yet where I was busy all day every day. There are very quiet days where I spend my time sharpening skills, networking with other DBAs, or pondering oddities of T-SQL that may or may not have any future value (researching something specific doesn't always pay off, but researching as a general activity almost always pays off). And then there are busy days where I barely have time to breath every few minutes so I don't die of anoxia (well, I might be exagerating a little bit on that part).
The questions to ask are: How much would database downtime cost the company? How much does it cost to have slow databases? What are the opportunity costs associated not knowing whether your databases are administered correctly? How much would it cost if a database were completely lost? Now, compare those potential costs, and the odds of them happenning, vs the known cost of having an on-staff DBA to prevent those problems.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
March 24, 2011 at 1:11 pm
GSquared has great points. Typically a DBA brings value. They tune code, they write new reports, the improve the systems. If you are looking to hire someone that sits around and waits for something to happen or someone to ask them to get something done, you're hiring the wrong person. There are places that are big enough to have DBAs that just support production work, but if you're smaller, a DBA can bring in great value, as well as insurance, to the company.
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply