February 28, 2017 at 7:16 am
GilaMonster - Tuesday, February 28, 2017 6:26 AMjasona.work - Tuesday, February 28, 2017 5:37 AMSo, it's packzi day, fat Tuesday...Amazing that a lump of fried dough the size of your fist and filled with jelly or crème can be so bad for you but taste so darn good...
If I wanted to be greedy I go grab a second one off our conference table...Of course, then the sugar would put me out like a light for the rest of the day, or at least make me less useful then usual...
🙂Now I want koeksisters.
Oh those sound good!
February 28, 2017 at 7:26 am
GilaMonster - Tuesday, February 28, 2017 6:26 AMjasona.work - Tuesday, February 28, 2017 5:37 AMSo, it's packzi day, fat Tuesday...Amazing that a lump of fried dough the size of your fist and filled with jelly or crème can be so bad for you but taste so darn good...
If I wanted to be greedy I go grab a second one off our conference table...Of course, then the sugar would put me out like a light for the rest of the day, or at least make me less useful then usual...
🙂Now I want koeksisters.
that is one thing I miss about South Africa. Guess I'll need to make my own...(read attempt to destroy the kitchen)
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
This thing is addressing problems that dont exist. Its solution-ism at its worst. We are dumbing down machines that are inherently superior. - Gilfoyle
February 28, 2017 at 9:32 am
Brandie Tarvin - Monday, February 27, 2017 6:15 AMLynn Pettis - Friday, February 24, 2017 1:19 PMOn something completely different, I was just looking at an internal database built by a developer. If I posted this database schema here you would see Mr. Celko galloping in on his white horse. Every table has a primary key with the name "id".
Actually, I can see his point on this one as I am looking at this database.GAH. That one is my pet peeve. I've actually yelled at developers (regular, report, and BI) for pulling this crap.
That and all the text fields as varchar(50) are strong suggestions that they brought up the table editor and just took the defaults for almost everything.
I'm dealing with a 'situation' where the same thing is happening except that the field is called GUID. Which isn't a GUID. The different ID fields are GIUDs. And the views link back to the GUID field (using aliases that aren't in the other view) and not the ID field.
And they wonder why we mutter to ourselves all day long.
February 28, 2017 at 9:55 am
JustMarie - Tuesday, February 28, 2017 9:32 AMBrandie Tarvin - Monday, February 27, 2017 6:15 AMLynn Pettis - Friday, February 24, 2017 1:19 PMOn something completely different, I was just looking at an internal database built by a developer. If I posted this database schema here you would see Mr. Celko galloping in on his white horse. Every table has a primary key with the name "id".
Actually, I can see his point on this one as I am looking at this database.GAH. That one is my pet peeve. I've actually yelled at developers (regular, report, and BI) for pulling this crap.
That and all the text fields as varchar(50) are strong suggestions that they brought up the table editor and just took the defaults for almost everything.
I'm dealing with a 'situation' where the same thing is happening except that the field is called GUID. Which isn't a GUID. The different ID fields are GIUDs. And the views link back to the GUID field (using aliases that aren't in the other view) and not the ID field.
And they wonder why we mutter to ourselves all day long.
Your post reminded me of this, from Brent Ozar.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
March 1, 2017 at 5:31 am
Phil Parkin - Tuesday, February 28, 2017 9:55 AMJustMarie - Tuesday, February 28, 2017 9:32 AMBrandie Tarvin - Monday, February 27, 2017 6:15 AMLynn Pettis - Friday, February 24, 2017 1:19 PMOn something completely different, I was just looking at an internal database built by a developer. If I posted this database schema here you would see Mr. Celko galloping in on his white horse. Every table has a primary key with the name "id".
Actually, I can see his point on this one as I am looking at this database.GAH. That one is my pet peeve. I've actually yelled at developers (regular, report, and BI) for pulling this crap.
That and all the text fields as varchar(50) are strong suggestions that they brought up the table editor and just took the defaults for almost everything.
I'm dealing with a 'situation' where the same thing is happening except that the field is called GUID. Which isn't a GUID. The different ID fields are GIUDs. And the views link back to the GUID field (using aliases that aren't in the other view) and not the ID field.
And they wonder why we mutter to ourselves all day long.
Your post reminded me of this, from Brent Ozar.
Phil, thanks for giving me something to smile at first thing in the morning. I especially enjoyed the second bad example.
March 1, 2017 at 5:54 am
I'm positive this incident will convince Jeff to move to the could, really and for sure: http://www.theregister.co.uk/2017/03/01/aws_s3_outage/
Also rather shows the vulnerability of IoT devices when you see that several IoT devices ran into problems...
Web / security cams (Nest)
Unknown brand of TV remote
Lighting controller
Gaming mice (!!!)
It'd be interesting to see the e-mails from customers that were affected when these devices quit working. I do have a Nest product (thermostat) but those apparently weren't impacted. But if you had one of their security / web cams, being used as a security cam and because of this it failed to record for several hours, I'd be rather unhappy...
But that's small potatoes compared to the number of fairly well known companies and services that were brought down by this outage. How many millions in revenue was lost because of one company? What is it that you try to avoid when setting up a mission-critical system? Oh yeah, a single point of failure. Well, now Amazon Cloud has become a potential single-point of failure. Sure, they could've avoided it by writing their code / app so it would live distributed across different datacenters, but as the article comments:
For various reasons – from the fact that programmers find distributed computing hard to the costs involved – this redundancy isn't always coded in.
Going to be interesting to see what caused the outage, and how companies react to it going forward...
March 1, 2017 at 6:31 am
Phil Parkin - Tuesday, February 28, 2017 9:55 AMJustMarie - Tuesday, February 28, 2017 9:32 AMThat and all the text fields as varchar(50) are strong suggestions that they brought up the table editor and just took the defaults for almost everything.
I'm dealing with a 'situation' where the same thing is happening except that the field is called GUID. Which isn't a GUID. The different ID fields are GIUDs. And the views link back to the GUID field (using aliases that aren't in the other view) and not the ID field.
And they wonder why we mutter to ourselves all day long.
Your post reminded me of this, from Brent Ozar.
One of the comments on that post sent me to this video of a SQLBits session "Revenge: The SQL!" by Rob Volk (@sql_r). I have some more bad ideas to try now... 🙂
Thomas Rushton
blog: https://thelonedba.wordpress.com
March 1, 2017 at 7:13 am
jasona.work - Wednesday, March 1, 2017 5:54 AMI'm positive this incident will convince Jeff to move to the could, really and for sure: http://www.theregister.co.uk/2017/03/01/aws_s3_outage/
Also rather shows the vulnerability of IoT devices when you see that several IoT devices ran into problems...
Web / security cams (Nest)
Unknown brand of TV remote
Lighting controller
Gaming mice (!!!)It'd be interesting to see the e-mails from customers that were affected when these devices quit working. I do have a Nest product (thermostat) but those apparently weren't impacted. But if you had one of their security / web cams, being used as a security cam and because of this it failed to record for several hours, I'd be rather unhappy...
But that's small potatoes compared to the number of fairly well known companies and services that were brought down by this outage. How many millions in revenue was lost because of one company? What is it that you try to avoid when setting up a mission-critical system? Oh yeah, a single point of failure. Well, now Amazon Cloud has become a potential single-point of failure. Sure, they could've avoided it by writing their code / app so it would live distributed across different datacenters, but as the article comments:
For various reasons – from the fact that programmers find distributed computing hard to the costs involved – this redundancy isn't always coded in.
Going to be interesting to see what caused the outage, and how companies react to it going forward...
But AWS/Azure is only a single point of failure if you assume that they are going to provide you with DR. However, neither service says that. Both offer DR, and by DR, I mean the same thing you'd do on-premises, which is to have a secondary location. So you use the Eastern US data center. You also DR to... somewhere, Western US, Western Europe, Australia, whatever.
More than this outage exposing any cloud provider as a single point of failure, it exposed the fact that relying on any single location for a business that needs to be 24/7/365 is bound to failure. On-premises or in the cloud, the issues are the same.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
March 1, 2017 at 7:33 am
Grant Fritchey - Wednesday, March 1, 2017 7:13 AMjasona.work - Wednesday, March 1, 2017 5:54 AMI'm positive this incident will convince Jeff to move to the could, really and for sure: http://www.theregister.co.uk/2017/03/01/aws_s3_outage/
Also rather shows the vulnerability of IoT devices when you see that several IoT devices ran into problems...
Web / security cams (Nest)
Unknown brand of TV remote
Lighting controller
Gaming mice (!!!)It'd be interesting to see the e-mails from customers that were affected when these devices quit working. I do have a Nest product (thermostat) but those apparently weren't impacted. But if you had one of their security / web cams, being used as a security cam and because of this it failed to record for several hours, I'd be rather unhappy...
But that's small potatoes compared to the number of fairly well known companies and services that were brought down by this outage. How many millions in revenue was lost because of one company? What is it that you try to avoid when setting up a mission-critical system? Oh yeah, a single point of failure. Well, now Amazon Cloud has become a potential single-point of failure. Sure, they could've avoided it by writing their code / app so it would live distributed across different datacenters, but as the article comments:
For various reasons – from the fact that programmers find distributed computing hard to the costs involved – this redundancy isn't always coded in.
Going to be interesting to see what caused the outage, and how companies react to it going forward...
But AWS/Azure is only a single point of failure if you assume that they are going to provide you with DR. However, neither service says that. Both offer DR, and by DR, I mean the same thing you'd do on-premises, which is to have a secondary location. So you use the Eastern US data center. You also DR to... somewhere, Western US, Western Europe, Australia, whatever.
More than this outage exposing any cloud provider as a single point of failure, it exposed the fact that relying on any single location for a business that needs to be 24/7/365 is bound to failure. On-premises or in the cloud, the issues are the same.
I'm find myself wondering if they'll release the real root cause. There's so much marketing and perception involved that I really doubt it. I'm sure an "official statement" will use a lot of words to say nothing and people will forget about it when the next announcement comes along.
Jason, that was one of the better articles I've seen on the topic. Thanks for sharing it. Some of the devices that went down could have caused serious problems. Not having redundancy in place can have real consequences, like the front gate being offline.
March 1, 2017 at 8:20 am
Grant Fritchey - Wednesday, March 1, 2017 7:13 AMjasona.work - Wednesday, March 1, 2017 5:54 AMI'm positive this incident will convince Jeff to move to the could, really and for sure: http://www.theregister.co.uk/2017/03/01/aws_s3_outage/
Also rather shows the vulnerability of IoT devices when you see that several IoT devices ran into problems...
Web / security cams (Nest)
Unknown brand of TV remote
Lighting controller
Gaming mice (!!!)It'd be interesting to see the e-mails from customers that were affected when these devices quit working. I do have a Nest product (thermostat) but those apparently weren't impacted. But if you had one of their security / web cams, being used as a security cam and because of this it failed to record for several hours, I'd be rather unhappy...
But that's small potatoes compared to the number of fairly well known companies and services that were brought down by this outage. How many millions in revenue was lost because of one company? What is it that you try to avoid when setting up a mission-critical system? Oh yeah, a single point of failure. Well, now Amazon Cloud has become a potential single-point of failure. Sure, they could've avoided it by writing their code / app so it would live distributed across different datacenters, but as the article comments:
For various reasons – from the fact that programmers find distributed computing hard to the costs involved – this redundancy isn't always coded in.
Going to be interesting to see what caused the outage, and how companies react to it going forward...
But AWS/Azure is only a single point of failure if you assume that they are going to provide you with DR. However, neither service says that. Both offer DR, and by DR, I mean the same thing you'd do on-premises, which is to have a secondary location. So you use the Eastern US data center. You also DR to... somewhere, Western US, Western Europe, Australia, whatever.
More than this outage exposing any cloud provider as a single point of failure, it exposed the fact that relying on any single location for a business that needs to be 24/7/365 is bound to failure. On-premises or in the cloud, the issues are the same.
No argument there. But I suspect a lot of people do think that they're going to get DR without actually developing for distributed processing, etc.
Which, in such an instance puts them in the same boat as if they had hosted they're software in their own office, on their own datacenter, with a single ISP connection, and had a truck take out the line.
March 1, 2017 at 8:26 am
I hope they release a root cause, as it would be good to know. However, this could, and has happened, with many individual companies.
The issue here isn't so much that a single AWS data center had problems, which is what this appears to be, but that companies didn't expect any outage and didn't plan for any failover. They assumed S3 would always work in their data center. Just like companies that assume their single co-lo or in house data center will always be live can have issues.
March 1, 2017 at 9:22 am
Steve Jones - SSC Editor - Wednesday, March 1, 2017 8:26 AMI hope they release a root cause, as it would be good to know. However, this could, and has happened, with many individual companies.The issue here isn't so much that a single AWS data center had problems, which is what this appears to be, but that companies didn't expect any outage and didn't plan for any failover. They assumed S3 would always work in their data center. Just like companies that assume their single co-lo or in house data center will always be live can have issues.
Assumed is the key word. Likely they expected a Data Center would be back online in a couple minutes, not a matter of hours.
DR is always an interesting topic. Cost, risk, and whether reduced level of service are acceptable all come into play.
Kind of an eye opener to see the ripple effect of how many different things can be impacted by an outage when redundancy is assumed.
What a connected world we depend on.
March 1, 2017 at 10:38 am
Greg Edwards-268690 - Wednesday, March 1, 2017 9:22 AMSteve Jones - SSC Editor - Wednesday, March 1, 2017 8:26 AMI hope they release a root cause, as it would be good to know. However, this could, and has happened, with many individual companies.The issue here isn't so much that a single AWS data center had problems, which is what this appears to be, but that companies didn't expect any outage and didn't plan for any failover. They assumed S3 would always work in their data center. Just like companies that assume their single co-lo or in house data center will always be live can have issues.
Assumed is the key word. Likely they expected a Data Center would be back online in a couple minutes, not a matter of hours.
DR is always an interesting topic. Cost, risk, and whether reduced level of service are acceptable all come into play.
Kind of an eye opener to see the ripple effect of how many different things can be impacted by an outage when redundancy is assumed.
What a connected world we depend on.
When you say "they" are you referring to Amazon or their customers?
It just shocks me that Amazon's little red icon files for the warning part of the dashboard was housed in only one data center. Like the files (tiny as they had to be) couldn't be stored redundantly elsewhere because ... they didn't have money for the extra storage space?
The Amazon dashboard couldn't tell people that the system was down because the "things have failed" stuff was stored in the data center that went down. That's not just bad DR, it's horrible PR.
March 1, 2017 at 10:50 am
Brandie Tarvin - Wednesday, March 1, 2017 10:38 AMGreg Edwards-268690 - Wednesday, March 1, 2017 9:22 AMSteve Jones - SSC Editor - Wednesday, March 1, 2017 8:26 AMI hope they release a root cause, as it would be good to know. However, this could, and has happened, with many individual companies.The issue here isn't so much that a single AWS data center had problems, which is what this appears to be, but that companies didn't expect any outage and didn't plan for any failover. They assumed S3 would always work in their data center. Just like companies that assume their single co-lo or in house data center will always be live can have issues.
Assumed is the key word. Likely they expected a Data Center would be back online in a couple minutes, not a matter of hours.
DR is always an interesting topic. Cost, risk, and whether reduced level of service are acceptable all come into play.
Kind of an eye opener to see the ripple effect of how many different things can be impacted by an outage when redundancy is assumed.
What a connected world we depend on.When you say "they" are you referring to Amazon or their customers?
It just shocks me that Amazon's little red icon files for the warning part of the dashboard was housed in only one data center. Like the files (tiny as they had to be) couldn't be stored redundantly elsewhere because ... they didn't have money for the extra storage space?
The Amazon dashboard couldn't tell people that the system was down because the "things have failed" stuff was stored in the data center that went down. That's not just bad DR, it's horrible PR.
I'd bet it could apply equally to both.
The customers were operating on the "It's Amazon, they never go down, and if they do I'm sure they've got plans in place to minimize the downtime and after all, they promise multiple 9s' of uptime" while Amazon was likely figuring they were on the ball with lots of redundancy across all the systems within a datacenter.
But, regardless of how well you redundancize your systems, there's ALWAYS going to be something that can kill the whole thing in one go.
March 1, 2017 at 12:43 pm
< rant> Why can't people READ the error messages? They may be obscure at times but others actually tell you what the problem resides. </ rant>
Viewing 15 posts - 57,571 through 57,585 (of 66,712 total)
You must be logged in to reply to this topic. Login to reply