August 24, 2018 at 10:25 pm
Good Morning Experts,
There is a mega sale on a TV for 5 days, starting tomorrow. We are expecting millions of users hitting our orders database to book the TV. As a DBA , what should i do so that the 5 day sale goes on without any issues.Like, any load balancing or anything else that I should be doing?
August 25, 2018 at 3:49 am
coolchaitu - Friday, August 24, 2018 10:25 PMGood Morning Experts,
There is a mega sale on a TV for 5 days, starting tomorrow. We are expecting millions of users hitting our orders database to book the TV. As a DBA , what should i do so that the 5 day sale goes on without any issues.Like, any load balancing or anything else that I should be doing?
You should make sure all the SQL is tuned and any missing indexes added.
August 25, 2018 at 8:29 am
coolchaitu - Friday, August 24, 2018 10:25 PMGood Morning Experts,
There is a mega sale on a TV for 5 days, starting tomorrow. We are expecting millions of users hitting our orders database to book the TV. As a DBA , what should i do so that the 5 day sale goes on without any issues.Like, any load balancing or anything else that I should be doing?
If you're not prepared already, then I would be very careful changing anything now; considering how close you are. If you're making changes you're going to have very little time to test against your device and/or UAT environments and make sure that nothing else is adversely affected. Indexes will definitely help for reads but too many of them and you might ruin your writes (this is a very sweeping statement mind). If you don't get an opportunity to make changes now, then make sure you study the problems you have this time now, and then start preparing for next time as soon as the current event is done.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
August 25, 2018 at 10:30 am
Thom A - Saturday, August 25, 2018 8:29 AMcoolchaitu - Friday, August 24, 2018 10:25 PMGood Morning Experts,
There is a mega sale on a TV for 5 days, starting tomorrow. We are expecting millions of users hitting our orders database to book the TV. As a DBA , what should i do so that the 5 day sale goes on without any issues.Like, any load balancing or anything else that I should be doing?
If you're not prepared already, then I would be very careful changing anything now; considering how close you are. If you're making changes you're going to have very little time to test against your device and/or UAT environments and make sure that nothing else is adversely affected. Indexes will definitely help for reads but too many of them and you might ruin your writes (this is a very sweeping statement mind). If you don't get an opportunity to make changes now, then make sure you study the problems you have this time now, and then start preparing for next time as soon as the current event is done.
I mean making sure DBA is ready for the big day. Like making sure there is an index on the product name. If primary web server goes down, it will automatically failover to secondary. What would be the checklist?
August 25, 2018 at 3:28 pm
If it were me, I'd have wanted to run some pretty extensive load testing on it before getting that close to launch so that I was aware of how much traffic it could take before running into issues and having identified missing indexes or other bottlenecks in advance. If you don't have time to do that, it's pretty much going to be a reactive process of trying to monitor things as they happen and responding accordingly.
August 25, 2018 at 8:47 pm
coolchaitu - Friday, August 24, 2018 10:25 PMGood Morning Experts,
There is a mega sale on a TV for 5 days, starting tomorrow. We are expecting millions of users hitting our orders database to book the TV. As a DBA , what should i do so that the 5 day sale goes on without any issues.Like, any load balancing or anything else that I should be doing?
If it weren't so close to a holiday weekend, I'd say you're trying to get us to answer an interview question for you. 🙂
I can only assume that you have monitoring and alerts already setup to inform you if any machine is struggling. The question now is, have you done any serious stress testing? And asking if you should setup for load balancing and other preparatory things to do the night before the big event is a wee bit thoughtless on your part.
That being said, the event started today and I'm curious. How's the real life stress test going? You OK?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 26, 2018 at 1:37 am
Jeff Moden - Saturday, August 25, 2018 8:47 PMcoolchaitu - Friday, August 24, 2018 10:25 PMGood Morning Experts,
There is a mega sale on a TV for 5 days, starting tomorrow. We are expecting millions of users hitting our orders database to book the TV. As a DBA , what should i do so that the 5 day sale goes on without any issues.Like, any load balancing or anything else that I should be doing?
If it weren't so close to a holiday weekend, I'd say you're trying to get us to answer an interview question for you. 🙂
I can only assume that you have monitoring and alerts already setup to inform you if any machine is struggling. The question now is, have you done any serious stress testing? And asking if you should setup for load balancing and other preparatory things to do the night before the big event is a wee bit thoughtless on your part.
That being said, the event started today and I'm curious. How's the real life stress test going? You OK?
Please don't hurt me by Saying that I am thoughtless If you can help do it but please don't hurt
August 26, 2018 at 7:19 am
coolchaitu - Sunday, August 26, 2018 1:37 AMJeff Moden - Saturday, August 25, 2018 8:47 PMcoolchaitu - Friday, August 24, 2018 10:25 PMGood Morning Experts,
There is a mega sale on a TV for 5 days, starting tomorrow. We are expecting millions of users hitting our orders database to book the TV. As a DBA , what should i do so that the 5 day sale goes on without any issues.Like, any load balancing or anything else that I should be doing?
If it weren't so close to a holiday weekend, I'd say you're trying to get us to answer an interview question for you. 🙂
I can only assume that you have monitoring and alerts already setup to inform you if any machine is struggling. The question now is, have you done any serious stress testing? And asking if you should setup for load balancing and other preparatory things to do the night before the big event is a wee bit thoughtless on your part.
That being said, the event started today and I'm curious. How's the real life stress test going? You OK?
Please don't hurt me by Saying that I am thoughtless If you can help do it but please don't hurt
My apologies... Wrong word and a wee bit thoughtless on my part. I'm just terribly surprised by you waiting until the night before to ask such a question about something so important.
Not sure what to tell you at this point because you're knee deep into the event. I guess my best advice would be to make sure that you get enough sleep. A lack of sleep would be your worst enemy if something that requires (knocking on wood for you) your concentration were to happen. Like I asked in my previous email, you OK? Any problems so far?
--Jeff Moden
Change is inevitable... Change for the better is not.
August 26, 2018 at 7:30 am
Jeff Moden - Sunday, August 26, 2018 7:19 AMcoolchaitu - Sunday, August 26, 2018 1:37 AMJeff Moden - Saturday, August 25, 2018 8:47 PMcoolchaitu - Friday, August 24, 2018 10:25 PMGood Morning Experts,
There is a mega sale on a TV for 5 days, starting tomorrow. We are expecting millions of users hitting our orders database to book the TV. As a DBA , what should i do so that the 5 day sale goes on without any issues.Like, any load balancing or anything else that I should be doing?
If it weren't so close to a holiday weekend, I'd say you're trying to get us to answer an interview question for you. 🙂
I can only assume that you have monitoring and alerts already setup to inform you if any machine is struggling. The question now is, have you done any serious stress testing? And asking if you should setup for load balancing and other preparatory things to do the night before the big event is a wee bit thoughtless on your part.
That being said, the event started today and I'm curious. How's the real life stress test going? You OK?
Please don't hurt me by Saying that I am thoughtless If you can help do it but please don't hurt
My apologies... Wrong word and a wee bit thoughtless on my part. I'm just terribly surprised by you waiting until the night before to ask such a question about something so important.
Not sure what to tell you at this point because you're knee deep into the event. I guess my best advice would be to make sure that you get enough sleep. A lack of sleep would be your worst enemy if something that requires (knocking on wood for you) your concentration were to happen. Like I asked in my previous email, you OK? Any problems so far?
The sale is postponed after 2 weeks and it will be for 5 days. What would be the checklist as a DBA for me?
August 26, 2018 at 8:03 am
coolchaitu - Sunday, August 26, 2018 7:30 AMJeff Moden - Sunday, August 26, 2018 7:19 AMcoolchaitu - Sunday, August 26, 2018 1:37 AMJeff Moden - Saturday, August 25, 2018 8:47 PMcoolchaitu - Friday, August 24, 2018 10:25 PMGood Morning Experts,
There is a mega sale on a TV for 5 days, starting tomorrow. We are expecting millions of users hitting our orders database to book the TV. As a DBA , what should i do so that the 5 day sale goes on without any issues.Like, any load balancing or anything else that I should be doing?
If it weren't so close to a holiday weekend, I'd say you're trying to get us to answer an interview question for you. 🙂
I can only assume that you have monitoring and alerts already setup to inform you if any machine is struggling. The question now is, have you done any serious stress testing? And asking if you should setup for load balancing and other preparatory things to do the night before the big event is a wee bit thoughtless on your part.
That being said, the event started today and I'm curious. How's the real life stress test going? You OK?
Please don't hurt me by Saying that I am thoughtless If you can help do it but please don't hurt
My apologies... Wrong word and a wee bit thoughtless on my part. I'm just terribly surprised by you waiting until the night before to ask such a question about something so important.
Not sure what to tell you at this point because you're knee deep into the event. I guess my best advice would be to make sure that you get enough sleep. A lack of sleep would be your worst enemy if something that requires (knocking on wood for you) your concentration were to happen. Like I asked in my previous email, you OK? Any problems so far?
The sale is postponed after 2 weeks and it will be for 5 days. What would be the checklist as a DBA for me?
Heh... good golly. When did they make that decision? You posted after 10PM Friday night and the event was supposed to start the next day.
First and foremost... do a stress test and make sure transaction log backups are both tight and guaranteed not to run out of space. I'd be running them every 15 minutes for such an event. I'd also ask management would sort of sales volume they expected and compare that to the normal sales volume. Use that information to interpolate how much disk space you'll need and then double that estimate and make sure that you have enough room to handle the additional sales.
Have you got code ready to do a restore at the drop of a hat? Have you tested it and know precisely how long and what it would take to get back into business if everything else failed? Is there a standby box available if everything else fails even if it's for Dev or Staging or ???
Make sure the stress test goes through your load balancing to make sure that it is actually balancing the load.
If you have "Always On" active or you have passive nodes, make sure that a "flop over" actually works and that such passive nodes actually have the resources to take over if something happens. Check to make sure that any alerts that you have setup will actually work.
Definitely make sure that you're well rested and make sure that you're not the only one on alert for this. Make sure you have a phone list of people that will be needed if something happens. For example, if something goes wrong with the SAN, is that going to be all you or do you have someone that can help with the issue? How about network folks? Will they be monitoring to see if a switch or other piece of hardware goes down?
What about your internet provider? Have you checked with them to see if they can handle the additional load? And what is their SLA for this "hot" time? Do you have contact numbers for them and have a clear understanding of who to call if something happens after hours?
Most people balk at the idea because they don't actually allocate enough RAM to the Windows OS on their servers but I'd be RDC'd into the servers with both PerfMon and SSMS up and running. It hasn't happened to me often but there have been cases where even the Windows Admins couldn't get into one of our production servers. Because I was already in, either I could fix the problem or I could let a Windows Admin take over from my machine to get a server out of trouble. It doesn't take a lot of memory to do such a thing and, when the chips fall on the floor, it's a real lifesaver.
I'd also carry a voice recorder with me to keep a log on (smart phones usually have an app for this). It's a whole lot easier to record your voice than it is to write something down especially if you're in the process of using the keyboard to troubleshoot/repair or whatever. You can also use it to record any verbal orders from someone else so that you don't miss anything.
Of course, a good part of all this should be a part of normal operations. If it's not, plan on making it a part of normal operations (including regular "flop over" and restore tests and stress tests and outage tests, etc) so that no one will view future such sales events as a fire drill.
In retrospect, sorry you were hurt by my initial response but consider the timing of your question... why would you expect people to not be a bit aghast at the timing of it? 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
August 26, 2018 at 8:17 am
Jeff Moden - Sunday, August 26, 2018 8:03 AMcoolchaitu - Sunday, August 26, 2018 7:30 AMJeff Moden - Sunday, August 26, 2018 7:19 AMcoolchaitu - Sunday, August 26, 2018 1:37 AMJeff Moden - Saturday, August 25, 2018 8:47 PMcoolchaitu - Friday, August 24, 2018 10:25 PMGood Morning Experts,
There is a mega sale on a TV for 5 days, starting tomorrow. We are expecting millions of users hitting our orders database to book the TV. As a DBA , what should i do so that the 5 day sale goes on without any issues.Like, any load balancing or anything else that I should be doing?
If it weren't so close to a holiday weekend, I'd say you're trying to get us to answer an interview question for you. 🙂
I can only assume that you have monitoring and alerts already setup to inform you if any machine is struggling. The question now is, have you done any serious stress testing? And asking if you should setup for load balancing and other preparatory things to do the night before the big event is a wee bit thoughtless on your part.
That being said, the event started today and I'm curious. How's the real life stress test going? You OK?
Please don't hurt me by Saying that I am thoughtless If you can help do it but please don't hurt
My apologies... Wrong word and a wee bit thoughtless on my part. I'm just terribly surprised by you waiting until the night before to ask such a question about something so important.
Not sure what to tell you at this point because you're knee deep into the event. I guess my best advice would be to make sure that you get enough sleep. A lack of sleep would be your worst enemy if something that requires (knocking on wood for you) your concentration were to happen. Like I asked in my previous email, you OK? Any problems so far?
The sale is postponed after 2 weeks and it will be for 5 days. What would be the checklist as a DBA for me?
Heh... good golly. When did they make that decision? You posted after 10PM Friday night and the event was supposed to start the next day.
First and foremost... do a stress test and make sure transaction log backups are both tight and guaranteed not to run out of space. I'd be running them every 15 minutes for such an event. I'd also ask management would sort of sales volume they expected and compare that to the normal sales volume. Use that information to interpolate how much disk space you'll need and then double that estimate and make sure that you have enough room to handle the additional sales.
Have you got code ready to do a restore at the drop of a hat? Have you tested it and know precisely how long and what it would take to get back into business if everything else failed? Is there a standby box available if everything else fails even if it's for Dev or Staging or ???
Make sure the stress test goes through your load balancing to make sure that it is actually balancing the load.
If you have "Always On" active or you have passive nodes, make sure that a "flop over" actually works and that such passive nodes actually have the resources to take over if something happens. Check to make sure that any alerts that you have setup will actually work.
Definitely make sure that you're well rested and make sure that you're not the only one on alert for this. Make sure you have a phone list of people that will be needed if something happens. For example, if something goes wrong with the SAN, is that going to be all you or do you have someone that can help with the issue? How about network folks? Will they be monitoring to see if a switch or other piece of hardware goes down?
What about your internet provider? Have you checked with them to see if they can handle the additional load? And what is their SLA for this "hot" time? Do you have contact numbers for them and have a clear understanding of who to call if something happens after hours?
Most people balk at the idea because they don't actually allocate enough RAM to the Windows OS on their servers but I'd be RDC'd into the servers with both PerfMon and SSMS up and running. It hasn't happened to me often but there have been cases where even the Windows Admins couldn't get into one of our production servers. Because I was already in, either I could fix the problem or I could let a Windows Admin take over from my machine to get a server out of trouble. It doesn't take a lot of memory to do such a thing and, when the chips fall on the floor, it's a real lifesaver.
I'd also carry a voice recorder with me to keep a log on (smart phones usually have an app for this). It's a whole lot easier to record your voice than it is to write something down especially if you're in the process of using the keyboard to troubleshoot/repair or whatever. You can also use it to record any verbal orders from someone else so that you don't miss anything.
Of course, a good part of all this should be a part of normal operations. If it's not, plan on making it a part of normal operations (including regular "flop over" and restore tests and stress tests and outage tests, etc) so that no one will view future such sales events as a fire drill.
Hi Jeff,
I told my manager that i cannot handle and I require time. He understood my pain and convinced management on Saturday early morning to cancel the event. It is not going to be all me, I have someone that can help with the issue. In fact I have various teams(network, SAN etc). Thanks a lot for listing out in detail.
August 26, 2018 at 8:25 am
coolchaitu - Sunday, August 26, 2018 8:17 AMJeff Moden - Sunday, August 26, 2018 8:03 AMcoolchaitu - Sunday, August 26, 2018 7:30 AMJeff Moden - Sunday, August 26, 2018 7:19 AMcoolchaitu - Sunday, August 26, 2018 1:37 AMJeff Moden - Saturday, August 25, 2018 8:47 PMcoolchaitu - Friday, August 24, 2018 10:25 PMGood Morning Experts,
There is a mega sale on a TV for 5 days, starting tomorrow. We are expecting millions of users hitting our orders database to book the TV. As a DBA , what should i do so that the 5 day sale goes on without any issues.Like, any load balancing or anything else that I should be doing?
If it weren't so close to a holiday weekend, I'd say you're trying to get us to answer an interview question for you. 🙂
I can only assume that you have monitoring and alerts already setup to inform you if any machine is struggling. The question now is, have you done any serious stress testing? And asking if you should setup for load balancing and other preparatory things to do the night before the big event is a wee bit thoughtless on your part.
That being said, the event started today and I'm curious. How's the real life stress test going? You OK?
Please don't hurt me by Saying that I am thoughtless If you can help do it but please don't hurt
My apologies... Wrong word and a wee bit thoughtless on my part. I'm just terribly surprised by you waiting until the night before to ask such a question about something so important.
Not sure what to tell you at this point because you're knee deep into the event. I guess my best advice would be to make sure that you get enough sleep. A lack of sleep would be your worst enemy if something that requires (knocking on wood for you) your concentration were to happen. Like I asked in my previous email, you OK? Any problems so far?
The sale is postponed after 2 weeks and it will be for 5 days. What would be the checklist as a DBA for me?
Heh... good golly. When did they make that decision? You posted after 10PM Friday night and the event was supposed to start the next day.
First and foremost... do a stress test and make sure transaction log backups are both tight and guaranteed not to run out of space. I'd be running them every 15 minutes for such an event. I'd also ask management would sort of sales volume they expected and compare that to the normal sales volume. Use that information to interpolate how much disk space you'll need and then double that estimate and make sure that you have enough room to handle the additional sales.
Have you got code ready to do a restore at the drop of a hat? Have you tested it and know precisely how long and what it would take to get back into business if everything else failed? Is there a standby box available if everything else fails even if it's for Dev or Staging or ???
Make sure the stress test goes through your load balancing to make sure that it is actually balancing the load.
If you have "Always On" active or you have passive nodes, make sure that a "flop over" actually works and that such passive nodes actually have the resources to take over if something happens. Check to make sure that any alerts that you have setup will actually work.
Definitely make sure that you're well rested and make sure that you're not the only one on alert for this. Make sure you have a phone list of people that will be needed if something happens. For example, if something goes wrong with the SAN, is that going to be all you or do you have someone that can help with the issue? How about network folks? Will they be monitoring to see if a switch or other piece of hardware goes down?
What about your internet provider? Have you checked with them to see if they can handle the additional load? And what is their SLA for this "hot" time? Do you have contact numbers for them and have a clear understanding of who to call if something happens after hours?
Most people balk at the idea because they don't actually allocate enough RAM to the Windows OS on their servers but I'd be RDC'd into the servers with both PerfMon and SSMS up and running. It hasn't happened to me often but there have been cases where even the Windows Admins couldn't get into one of our production servers. Because I was already in, either I could fix the problem or I could let a Windows Admin take over from my machine to get a server out of trouble. It doesn't take a lot of memory to do such a thing and, when the chips fall on the floor, it's a real lifesaver.
I'd also carry a voice recorder with me to keep a log on (smart phones usually have an app for this). It's a whole lot easier to record your voice than it is to write something down especially if you're in the process of using the keyboard to troubleshoot/repair or whatever. You can also use it to record any verbal orders from someone else so that you don't miss anything.
Of course, a good part of all this should be a part of normal operations. If it's not, plan on making it a part of normal operations (including regular "flop over" and restore tests and stress tests and outage tests, etc) so that no one will view future such sales events as a fire drill.
Hi Jeff,
I told my manager that i cannot handle and I require time. He understood my pain and convinced management on Saturday early morning to cancel the event. It is not going to be all me, I have someone that can help with the issue. In fact I have various teams(network, SAN etc). Thanks a lot for listing out in detail.
That, good Sir, is the sign of someone properly concerned for their company. A lot of people would have said nothing and put the company in harms way in doing so. My hat's off to you for first knowing your limits, advising others of those limits, and then doing something to expand those limits. Well done!
--Jeff Moden
Change is inevitable... Change for the better is not.
August 26, 2018 at 8:27 am
p.s. Hopefully, management will also look at themselves and provide more lead time for such large events.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply