September 6, 2012 at 10:06 am
Scott I definitely agree that it's the coder's responsibility. I managed a group of 8-12 programmers for a few years and we used agile practices but we included robust testing, code reviews, etc. but ultimately the quality of the product was the lead programmer responsibility (and he/she was held accountable).
September 6, 2012 at 10:12 am
John Hanrahan (9/6/2012)
Scott I definitely agree that it's the coder's responsibility. I managed a group of 8-12 programmers for a few years and we used agile practices but we included robust testing, code reviews, etc. but ultimately the quality of the product was the lead programmer responsibility (and he/she was held accountable).
Looks like you gave them the resources and tools to do the job correctly. Unfortunately, there are probably just as many managers out there that don't.
September 6, 2012 at 10:21 am
Lynn Pettis (9/6/2012)
John Hanrahan (9/6/2012)
I laugh and laugh. Agile often means (to the managers in charge) get it done and skip whatever you think you can. That may not be the book definition but it's how many companies operate and I bet not a one calls what they do cowboy coding. I will say that the bigger the company I've worked with the better it gets (I worked as a consultant for almost 20 years) as in more testing etc.Then they aren't using Agile as it is supposed to be done. Testing is still an integral part of the development process.
Who's to say it's either's fault. It could be any number of root causes:
- the checking algorithm (the one that validates that the sell price is higher that the price) doesn't get promoted at the same page as the "perform the trade" service because it's too slow (remember this checking has to happen in microseconds). So the thing you tested functionally in QA isn't what actually made it to prod.
- the ranges are data-driven (so the ranges to buy and sell are in a data structure). The ranges were completely sane at the time when it was promoted, but someone then adjusted the table AFTER the fact, but only updated the "buy" price, and not the "sell" price.
- a data caching issue.
- etc...
Without a root cause analysis, it's going to be difficult to assign blame to anyone. I do think you need to consider a LOT more controls when dealing with data not within your control, whether for attacks like injection, or for someone or something trying to "game" your system, or simply that there are lots of other players like you in the market which means things might change outside of your control.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
September 6, 2012 at 11:16 am
... (remember this checking has to happen in microseconds). ...
Trading programs in the Asia Pacific rim have to be careful about leap seconds which are added to the year every so often as those markets are active when the correction is applied.
...
-- FORTRAN manual for Xerox Computers --
September 6, 2012 at 12:54 pm
Anyone from the UK remember BT Phone Day - back in the 1990s sometime, the UK phone company changed all number prefixes to add a 1 - so 071 became 0171 - and 021 became 0121 etc.,
Well I worked for this software company back then, whose clientelle were all insurance brokers - and pretty much survived on their phone lists. They would have a heck of a job changing all those numbers manually, there were a some exceptions, but few enough for some bright spark to have an idea well out of sync with his IQ.
As part of the monthly update - sent out to over 250 brokers - with the insurance rating prices, an extra script was run.
... can you see where this is going ... and yes it did happen.
All instances of '0' in all the phone number field were replaced by '01' - sometimes causing field overflow, and failure of the update - so the person applying the update would attempt to run the update again - which would overwrite the backup the original update had made.
I was a lowly support person at the time and still shudder at the fallout.
The company still exists, though the name has changed a few times.
September 6, 2012 at 12:57 pm
Tom Brown (9/6/2012)
Anyone from the UK remember BT Phone Day - back in the 1990s sometime, the UK phone company changed all number prefixes to add a 1 - so 071 became 0171 - and 021 became 0121 etc.,Well I worked for this software company back then, whose clientelle were all insurance brokers - and pretty much survived on their phone lists. They would have a heck of a job changing all those numbers manually, there were a some exceptions, but few enough for some bright spark to have an idea well out of sync with his IQ.
As part of the monthly update - sent out to over 250 brokers - with the insurance rating prices, an extra script was run.
... can you see where this is going ... and yes it did happen.
All instances of '0' in all the phone number field were replaced by '01' - sometimes causing field overflow, and failure of the update - so the person applying the update would attempt to run the update again - which would overwrite the backup the original update had made.
I was a lowly support person at the time and still shudder at the fallout.
The company still exists, though the name has changed a few times.
Where was QA to test the update?
September 6, 2012 at 1:01 pm
Lynn Pettis (9/6/2012)
Tom Brown (9/6/2012)
Anyone from the UK remember BT Phone Day - back in the 1990s sometime, the UK phone company changed all number prefixes to add a 1 - so 071 became 0171 - and 021 became 0121 etc.,Well I worked for this software company back then, whose clientelle were all insurance brokers - and pretty much survived on their phone lists. They would have a heck of a job changing all those numbers manually, there were a some exceptions, but few enough for some bright spark to have an idea well out of sync with his IQ.
As part of the monthly update - sent out to over 250 brokers - with the insurance rating prices, an extra script was run.
... can you see where this is going ... and yes it did happen.
All instances of '0' in all the phone number field were replaced by '01' - sometimes causing field overflow, and failure of the update - so the person applying the update would attempt to run the update again - which would overwrite the backup the original update had made.
I was a lowly support person at the time and still shudder at the fallout.
The company still exists, though the name has changed a few times.
Where was QA to test the update?
Well I'm sure the developers did their own testing using suitable test data
Phone 01234567 changed to 011234567 - so it all right for release innit?
September 6, 2012 at 1:08 pm
Tom Brown (9/6/2012)
Lynn Pettis (9/6/2012)
Tom Brown (9/6/2012)
Anyone from the UK remember BT Phone Day - back in the 1990s sometime, the UK phone company changed all number prefixes to add a 1 - so 071 became 0171 - and 021 became 0121 etc.,Well I worked for this software company back then, whose clientelle were all insurance brokers - and pretty much survived on their phone lists. They would have a heck of a job changing all those numbers manually, there were a some exceptions, but few enough for some bright spark to have an idea well out of sync with his IQ.
As part of the monthly update - sent out to over 250 brokers - with the insurance rating prices, an extra script was run.
... can you see where this is going ... and yes it did happen.
All instances of '0' in all the phone number field were replaced by '01' - sometimes causing field overflow, and failure of the update - so the person applying the update would attempt to run the update again - which would overwrite the backup the original update had made.
I was a lowly support person at the time and still shudder at the fallout.
The company still exists, though the name has changed a few times.
Where was QA to test the update?
Well I'm sure the developers did their own testing using suitable test data
Phone 01234567 changed to 011234567 - so it all right for release innit?
I think as we all know, nothing can take the place of good planning. In IT, it's not if things go wrong but when. Having a good plan for when things go wrong could prevent this. "Just run the update again" obviously isn't the right thing to do since the update is creating a backup. If you run it again, you just backed up the bad data. I don't think you have to make this kind of mistake to learn the lesson. Just plan better.
September 6, 2012 at 1:09 pm
Tom Brown (9/6/2012)
Lynn Pettis (9/6/2012)
Tom Brown (9/6/2012)
Anyone from the UK remember BT Phone Day - back in the 1990s sometime, the UK phone company changed all number prefixes to add a 1 - so 071 became 0171 - and 021 became 0121 etc.,Well I worked for this software company back then, whose clientelle were all insurance brokers - and pretty much survived on their phone lists. They would have a heck of a job changing all those numbers manually, there were a some exceptions, but few enough for some bright spark to have an idea well out of sync with his IQ.
As part of the monthly update - sent out to over 250 brokers - with the insurance rating prices, an extra script was run.
... can you see where this is going ... and yes it did happen.
All instances of '0' in all the phone number field were replaced by '01' - sometimes causing field overflow, and failure of the update - so the person applying the update would attempt to run the update again - which would overwrite the backup the original update had made.
I was a lowly support person at the time and still shudder at the fallout.
The company still exists, though the name has changed a few times.
Where was QA to test the update?
Well I'm sure the developers did their own testing using suitable test data
Phone 01234567 changed to 011234567 - so it all right for release innit?
Now that takes me back a ways. Had a developer add code in an application to handle memberships. Worked great when that particular 'vendor' was used, but she failed to test for all the others. Totally broke the application and we had to roll back a version until she figured out what she had done wrong.
September 6, 2012 at 4:03 pm
Tom Brown (9/6/2012)
Anyone from the UK remember BT Phone Day - back in the 1990s sometime, the UK phone company changed all number prefixes to add a 1 - so 071 became 0171 - and 021 became 0121 etc.,Well I worked for this software company back then, whose clientelle were all insurance brokers - and pretty much survived on their phone lists. They would have a heck of a job changing all those numbers manually, there were a some exceptions, but few enough for some bright spark to have an idea well out of sync with his IQ.
As part of the monthly update - sent out to over 250 brokers - with the insurance rating prices, an extra script was run.
... can you see where this is going ... and yes it did happen.
All instances of '0' in all the phone number field were replaced by '01' - sometimes causing field overflow, and failure of the update - so the person applying the update would attempt to run the update again - which would overwrite the backup the original update had made.
I was a lowly support person at the time and still shudder at the fallout.
The company still exists, though the name has changed a few times.
Your description of the change is broken - adding a 1 to the front of the area code was for non-London numbers only, the London numbers didn't chage from 01... to 011... but to something beginning 02.
If the algrithm you described was applied to London numbers that would be another source of catastrophe.
Tom
September 7, 2012 at 9:37 am
Scott D. Jacobson (9/6/2012)
John Hanrahan (9/6/2012)
Steve,I'm surprised no one mentioned algile development practices. It seems like agile is another word for minimal testing. That's fine when the consequences of your code are small but maybe not so much when it's a stock trading program. Do you know if they used 'agile' development. In the Seattle area it is the buzzword.
John
I suggest you read up on agile programming. Unit testing and acceptance testing both play big roles in the process. Anyone who uses agile to mean "develop and deploy" with no testing whatsoever isn't doing agile. That's cowboy coding. See: http://en.wikipedia.org/wiki/Agile_software_development#Characteristics
+1.
--Jeff Moden
Change is inevitable... Change for the better is not.
September 7, 2012 at 1:33 pm
L' Eomot Inversé (9/6/2012)
Tom Brown (9/6/2012)
Anyone from the UK remember BT Phone Day - back in the 1990s sometime, the UK phone company changed all number prefixes to add a 1 - so 071 became 0171 - and 021 became 0121 etc.,Well I worked for this software company back then, whose clientelle were all insurance brokers - and pretty much survived on their phone lists. They would have a heck of a job changing all those numbers manually, there were a some exceptions, but few enough for some bright spark to have an idea well out of sync with his IQ.
As part of the monthly update - sent out to over 250 brokers - with the insurance rating prices, an extra script was run.
... can you see where this is going ... and yes it did happen.
All instances of '0' in all the phone number field were replaced by '01' - sometimes causing field overflow, and failure of the update - so the person applying the update would attempt to run the update again - which would overwrite the backup the original update had made.
I was a lowly support person at the time and still shudder at the fallout.
The company still exists, though the name has changed a few times.
Your description of the change is broken - adding a 1 to the front of the area code was for non-London numbers only, the London numbers didn't chage from 01... to 011... but to something beginning 02.
If the algrithm you described was applied to London numbers that would be another source of catastrophe.
They have changed the numbers several times. London changed first, from 01 to 071 and 081 I think around 1990 - before computer systems were all that common, then the whole country changed in 1995 http://www.independent.co.uk/news/uk/bt-braced-for-wrongnumber-barrage-on-phoneday-1615636.html which is the event where the rogue algorithm was used and London was changed 0171 / 0181. And a 3rd change happened around 2000 when London went to 0207 and 0208, by which time we had learned our lesson and got the brokers to do the changes manually.
Maybe you missed it. How long have you lived in Titerrogaka now?
September 8, 2012 at 10:09 am
Tom Brown (9/7/2012)
They have changed the numbers several times. London changed first, from 01 to 071 and 081 I think around 1990 - before computer systems were all that common, then the whole country changed in 1995 http://www.independent.co.uk/news/uk/bt-braced-for-wrongnumber-barrage-on-phoneday-1615636.html which is the event where the rogue algorithm was used and London was changed 0171 / 0181. And a 3rd change happened around 2000 when London went to 0207 and 0208, by which time we had learned our lesson and got the brokers to do the changes manually.Maybe you missed it. How long have you lived in Titerrogaka now?
Yes, you're right. I had misremembered and conflated those two changes, although they were actually 10 years apart.
Tom
September 8, 2012 at 10:16 am
Steve Jones - SSC Editor (9/6/2012)
Scott D. Jacobson (9/6/2012)
John Hanrahan (9/6/2012)
Steve,I'm surprised no one mentioned agile development practices. It seems like agile is another word for minimal testing. That's fine when the consequences of your code are small but maybe not so much when it's a stock trading program. Do you know if they used 'agile' development. In the Seattle area it is the buzzword.
John
I suggest you read up on agile programming. Unit testing and acceptance testing both play big roles in the process. Anyone who uses agile to mean "develop and deploy" with no testing whatsoever isn't doing agile. That's cowboy coding. See: http://en.wikipedia.org/wiki/Agile_software_development#Characteristics
I think that many people view Agile as a way to speed development and if that speed causes a lack of testing, that's OK.
I think it's managers that need to read up more on Agile, CI, and other practices.
I agree with the idea that many people have a dangerous diea of what "Agile" is, and that managers need educating. I made some comments elsewhere in response to the idea that Agile is always a good thing, might as well repeat them here:-
This of course is caused by having utterly incompetent management in charge of deciding how development will be done, and by the fact that when utterly incompetent managers see a shiny new buzzword like "Agile" they go and skim-read enough about it to extract some disconneted misinterpretations that support their lunatic pre-conceived ideas that what technical people call best practise is just a way of making development too expensive and then go an impose those on the development teams under the banner of the shiny new buzzword.
Tom
Viewing 14 posts - 16 through 28 (of 28 total)
You must be logged in to reply to this topic. Login to reply