What is the True Version of Code?

  • In principle I agree with everyrhing you said. In particular this:

    Whether anything has eroded my confidence or not, I just believe in making sure. Every time.

    and this, ideally are constants in all team members regardless of the SCM model:

    Having a VCS does not remove the need for diligence and care...

    And make no mistake, every shop chooses an SCM model whether they realize it or not.

    Don't get me wrong, I'm happy for you that your system works and no-one ever makes changes in Production that have not gone through the correct procedures, I just don't have that environment, and as a consultant on site, working for just one team, it is not part of my duties to re-shape my customers' procedures, so I work in a different way.

    I was brought in, in part, to install TFS and advocate for source control as a way of life, so that was part of my role and it was backed by upper management. It is tough sledding if it goes any other way. I have worked in shops where code was emailed and everything in between what I have setup now. I am grateful to be on the team I am on from that aspect.

    VCS helps a lot, but it's not a magic wand.

    Not at all. Not saying it is by any stretch. It is a tool, a component of an overall strategy, and can make people immensely more effective and efficient in managing code but you're right in that it won't make a careless person diligent or caring.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Phil Parkin (2/1/2016)


    roger.plowman (2/1/2016)


    below86 (2/1/2016)


    roger.plowman (2/1/2016)


    "

    Instant fail.

    You NEVER EVER EVER put a change into production without going through development/QA/approval/whatever.

    NEVER.

    Here is the reason I would NEVER EVER want to be on call for this type of environment. If you want me on call, and it's 3:30 AM and some code needs fixed for a cube that has to be available at 8 AM, then I'm either going around these steps, or you can fire me. 😎

    the VCS topic - Oh how I've missed you, NOT!!!

    If you have to develop new code at 3:30 in the morning the whole development department needs to be fired. This is not a situation where new code *needs* to be created.

    All testing should have been done long before deployment.

    Consider the possibility that a piece of rogue data hits your ETL pipeline and it errors out. The fast solution is to make a change to a proc to handle it and then restart the ETL. The slower solution is to fix the code properly and go through a thorough testing and release process (while explaining to the business why their analytics are out of date - assuming, of course, that you haven't been fired by then).

    Or... 🙂

    You delete the rogue data, then restart the ETL and THEN recode at leisure while also reviewing how the ETL process development missed the rogue data in the first place.

  • roger.plowman (2/2/2016)


    Phil Parkin (2/1/2016)


    roger.plowman (2/1/2016)


    below86 (2/1/2016)


    roger.plowman (2/1/2016)


    "

    Instant fail.

    You NEVER EVER EVER put a change into production without going through development/QA/approval/whatever.

    NEVER.

    Here is the reason I would NEVER EVER want to be on call for this type of environment. If you want me on call, and it's 3:30 AM and some code needs fixed for a cube that has to be available at 8 AM, then I'm either going around these steps, or you can fire me. 😎

    the VCS topic - Oh how I've missed you, NOT!!!

    If you have to develop new code at 3:30 in the morning the whole development department needs to be fired. This is not a situation where new code *needs* to be created.

    All testing should have been done long before deployment.

    Consider the possibility that a piece of rogue data hits your ETL pipeline and it errors out. The fast solution is to make a change to a proc to handle it and then restart the ETL. The slower solution is to fix the code properly and go through a thorough testing and release process (while explaining to the business why their analytics are out of date - assuming, of course, that you haven't been fired by then).

    Or... 🙂

    You delete the rogue data, then restart the ETL and THEN recode at leisure while also reviewing how the ETL process development missed the rogue data in the first place.

    You're a database professional & you are prepared to delete source data, just because you think that it is wrong? In a big system, there is a good chance that you won't have a sufficiently intimate knowledge of all of the data sources to know, in each case, with certainty, that deletion is justified.

    I rarely delete source data. I would rather exclude any data which causes a problem, to get a process running, and then investigate later.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Phil Parkin (2/2/2016)


    roger.plowman (2/2/2016)


    Phil Parkin (2/1/2016)


    roger.plowman (2/1/2016)


    below86 (2/1/2016)


    roger.plowman (2/1/2016)


    "

    Instant fail.

    You NEVER EVER EVER put a change into production without going through development/QA/approval/whatever.

    NEVER.

    Here is the reason I would NEVER EVER want to be on call for this type of environment. If you want me on call, and it's 3:30 AM and some code needs fixed for a cube that has to be available at 8 AM, then I'm either going around these steps, or you can fire me. 😎

    the VCS topic - Oh how I've missed you, NOT!!!

    If you have to develop new code at 3:30 in the morning the whole development department needs to be fired. This is not a situation where new code *needs* to be created.

    All testing should have been done long before deployment.

    Consider the possibility that a piece of rogue data hits your ETL pipeline and it errors out. The fast solution is to make a change to a proc to handle it and then restart the ETL. The slower solution is to fix the code properly and go through a thorough testing and release process (while explaining to the business why their analytics are out of date - assuming, of course, that you haven't been fired by then).

    Or... 🙂

    You delete the rogue data, then restart the ETL and THEN recode at leisure while also reviewing how the ETL process development missed the rogue data in the first place.

    You're a database professional & you are prepared to delete source data, just because you think that it is wrong? In a big system, there is a good chance that you won't have a sufficiently intimate knowledge of all of the data sources to know, in each case, with certainty, that deletion is justified.

    I rarely delete source data. I would rather exclude any data which causes a problem, to get a process running, and then investigate later.

    You delete *staging* data, not original data, of course.

    That's why you stage data, so you can vet it. If data is outside parameters or whatever, of course you delete the data from the staging table. Production code is NOT CHANGED without testing.

    Full stop.

    If you can't change the code you have to delete the staging data (or better, divert it or flag it as not importable or whatever leaves the rogue data out of the production data at the end of the day.

    Of course that may cause cascade failures in other data but that's always the consequence of garbage data.

    Better to exclude bad data than corrupt production.

  • Steve Jones - SSC Editor (2/1/2016)


    ...

    Here's the rub.

    Let's say you have an issue in production with a proc/view/table/function. You want to test it. However development has changed. I can't necessarily make the test in dev because I don't have a valid environment in which to test. Same for test/Qa/staging. What do I do?

    ...

    For the purpose of smoke testing, every change control process needs a Pre-Production environment that matches the schema of production exactly and can be easily wiped down and reloaded. A SQL Server Express instance with empty schemas or a marginal amount of data can suit the purpose.

    For those occasions where I need to quickly determine if the logic of a new stored procedures will work with actual production data, I'll sometimes take the DDL script for the procedure, comment out the "CREATE PROCEDURE.." declaration, etc. and essentially turn it into a SELECT script driven by variables instead of parameters. At that point what I essentially have is a "unit test" script. I'll run that in production under a special account with DB_DATAREADER only access.

    If the deployment script passes the smoke test in pre-production, and the unit test works in production, then I'm 99% confident in the production deployment, and it only takes maybe half an hour of prep time to setup.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • roger.plowman (2/2/2016)


    Better to exclude bad data than corrupt production.

    + 1 million

  • roger.plowman (2/2/2016)


    Phil Parkin (2/2/2016)


    roger.plowman (2/2/2016)


    Phil Parkin (2/1/2016)


    roger.plowman (2/1/2016)


    below86 (2/1/2016)


    roger.plowman (2/1/2016)


    "

    Instant fail.

    You NEVER EVER EVER put a change into production without going through development/QA/approval/whatever.

    NEVER.

    Here is the reason I would NEVER EVER want to be on call for this type of environment. If you want me on call, and it's 3:30 AM and some code needs fixed for a cube that has to be available at 8 AM, then I'm either going around these steps, or you can fire me. 😎

    the VCS topic - Oh how I've missed you, NOT!!!

    If you have to develop new code at 3:30 in the morning the whole development department needs to be fired. This is not a situation where new code *needs* to be created.

    All testing should have been done long before deployment.

    Consider the possibility that a piece of rogue data hits your ETL pipeline and it errors out. The fast solution is to make a change to a proc to handle it and then restart the ETL. The slower solution is to fix the code properly and go through a thorough testing and release process (while explaining to the business why their analytics are out of date - assuming, of course, that you haven't been fired by then).

    Or... 🙂

    You delete the rogue data, then restart the ETL and THEN recode at leisure while also reviewing how the ETL process development missed the rogue data in the first place.

    You're a database professional & you are prepared to delete source data, just because you think that it is wrong? In a big system, there is a good chance that you won't have a sufficiently intimate knowledge of all of the data sources to know, in each case, with certainty, that deletion is justified.

    I rarely delete source data. I would rather exclude any data which causes a problem, to get a process running, and then investigate later.

    You delete *staging* data, not original data, of course.

    That's why you stage data, so you can vet it. If data is outside parameters or whatever, of course you delete the data from the staging table. Production code is NOT CHANGED without testing.

    Full stop.

    If you can't change the code you have to delete the staging data (or better, divert it or flag it as not importable or whatever leaves the rogue data out of the production data at the end of the day.

    Of course that may cause cascade failures in other data but that's always the consequence of garbage data.

    Better to exclude bad data than corrupt production.

    Ah right. I'm OK with this. The 'bad' data I am referring to causes the ETL to fail between Extract and Staging.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • I'm interested in the why. I like source control, use it all the time for code. Why isn't it used as naturally/often for the database? Is it really that hard? Or is it just because we're not taught to do it in the beginning?

  • Andy Warren (2/2/2016)


    I'm interested in the why. I like source control, use it all the time for code. Why isn't it used as naturally/often for the database? Is it really that hard? Or is it just because we're not taught to do it in the beginning?

    My take...

    This: http://www.sqlservercentral.com/Forums/FindPost1757481.aspx

    In short, the database deployment model lends itself to scripting an object out of the database, changing it, then compiling the changed version back into the database whereas the build-and-deploy model followed by compiled application languages are more rigid about this point. It is convenient for database developers to make a change right on the deployed version, i.e. the hosted database, and there is comfort in the fact that the change is immediately live and will be backed up. Data can also act as code when it is configuration-data or metadata and the same thing applies...convenience. It takes a mind-shift and discipline to go to a VCS first and from my experience we (humans) naturally start with the most convenient choice available gravitating towards it as the path of least resistance until we are taught the reasons why those choices may be a bad idea and may inevitably turn out to be a path of most resistance in the longer run. The teaching part is what I contend is not happening early enough in the academic circles, whether it be higher education or on-the-job-training.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Orlando Colamatteo (2/2/2016)


    Andy Warren (2/2/2016)


    I'm interested in the why. I like source control, use it all the time for code. Why isn't it used as naturally/often for the database? Is it really that hard? Or is it just because we're not taught to do it in the beginning?

    My take...

    This: http://www.sqlservercentral.com/Forums/FindPost1757481.aspx

    In short, the database deployment model lends itself to scripting an object out of the database, changing it, then compiling the changed version back into the database whereas the build-and-deploy model followed by compiled application languages are more rigid about this point. It is convenient for database developers to make a change right on the deployed version, i.e. the hosted database, and there is comfort in the fact that the change is immediately live and will be backed up. Data can also act as code when it is configuration-data or metadata and the same thing applies...convenience. It takes a mind-shift and discipline to go to a VCS first and from my experience we (humans) naturally start with the most convenient choice available gravitating towards it as the path of least resistance until we are taught the reasons why those choices may be a bad idea and may inevitably turn out to be a path of most resistance in the longer run. The teaching part is what I contend is not happening early enough in the academic circles, whether it be higher education or on-the-job-training.

    +1

    The fundamental issue is that we can take the code from the server and look at it. That's not easy in a .DLL or .EXE. It can be done, but it's easier to go to the VCS or filesystem and work from there.

    Essentially, we can "decompile" code from an instance with a right click, modify, which is a fundamental design mistake from SQL Server.

  • Andy Warren (2/2/2016)


    I'm interested in the why. I like source control, use it all the time for code. Why isn't it used as naturally/often for the database? Is it really that hard? Or is it just because we're not taught to do it in the beginning?

    I think it's simply a result of history - the different dates at which SQL and procedural languages emerged led to different dates at which their users became interested in VCS.

    Relational Database programming didn't take off until quite recently; in fact there are nowhere near as many varieties of SQL now as there were of FORTRAN in 1962, and it was well 1987 before we had QUEL and more than two versions of SQL (equivalent to number of prigramming languages at a higher level than assembly languages in about 1957). Proper use of VCS isn't fully established in the procedural (etc.) programming world even now, and didn't really take off until the mid 80s (although some of us had had versions of it a decade or more earlier). So perhaps it's just that VCS takes time to become recognised as useful even by a few, more time to become recognised by a large number of people, and for ever and a day to be accepted by everyone. In the SQL field we are probably 10 years beyond the stage of VCS acceptance that procedural programming had reached by 1987, but there probably wasn't enough SQL around to have anyone think about VCS for it until about 30 years after the procedural world had largely accepted it. So we've taken about a decade less to get to where we are on VCS than the procedural community took - not of course because we are cleverer, but because the idea of VCS for software was already very well known by the time SQL people started to look at it.

    Tom

  • I tend to agree with Jeff's aproach to this - nothing gets into production without going through proper authorisation and being recorded in the VCS.

    However, I agree with those who say it isn't always easy. But then I also agree with those who say it isn't always easy to write code that performs adequately, or even code that works, and I don't see that lack of easiness as a reason not to always attempt it.

    In general, the game is much more complex than most here appear from their comments to think.

    First, there often isn't only one "golden" system or only one production system. If you have a lot of customers, your software may have a lot of versions currently in poduction. Not only different versions in the sense that SQL Server 2012 and Sequel Server Server 2014 are both in use as production systems, but also in the sense that different editions of each providing different features and limitations are also in production. If I provide fix to a production system I want it applied everywhere that it is relevant which means it has to have been adequately checked that it won't damage any production system running the version of the software to which the fix applies. I also want it to be shown as a merge requirement for every system in VCS which is a later and not yet unsupported version of the system to which the fix applies (anyone who thinks that versions cease to be supported in the same sequence as they were created has never heard of any really nasty release validation errors) and also for every system in VCS which is a decendant of an applicable system and is still in development prior to release. Sorting out the merge requirements is usually not difficult - often the later release already solves the problem for which the fix was created, or an identical source change can be used, or the feature that needed the fix is not supported in the newer release - but that doesn't have to be considered when verifying that the release is safe for the "golden" system for which it is intended and therefor fall all productions systems which use that version.

    Secondly, there's the question of how the fix is realised; for SQL and languages like javascript and vbs (and many more) this is pretty easy, one is just replacing some text. For compiled languages it may be more difficult, particularly is the transformation from source to object produces a massive monolith from lots of small parts (that's why dynamic lin-loading eventually won the Unix loader war) - that used to dive people to produce binary patches; but to do it right those patches have to produce the same effect (at the level of functionality) as the source modification that is to go into the VCS, and the VCS must also record exactly what the patch is (so that if the patch "steals" space a later patch can't "steal" the same space). If the transformation generates small modules that are linked at run time or at load time a replacement module can be used, which is a much nicer situation.

    I've worked with systems with muliple supported versions and multiple divergent developments both in the days of monlithic build and binary patches and in more modern times; both in "ancient" and in modern times VCS has been a life-saver - even when we had to invent our own VCS because there was no useful software available for it.

    The thing is, once you are faced with the incredible complexity that such a set-up entails, the pain and difficulty of having to put every fix through the proper verification system ceases to be important, because if you don't do that you will soon be USCWAP (S for Slimy, of course, we're all oh so correct here).

    Tom

  • Steve, I wonder if calling it a design mistake is correct, or fair. It's a different product with a different purpose. I like the ability to make small changes (via VCS or not) such that I know the overall impact is focused and measurable. I bet devs would too! I think that drives a lot of the micro service adoption. Less surface area at risk when you make a change.

    My cynical take is that there isn't enough bang for the buck for most orgs to worry about deploying the db directly from source control. I've seen lots of places where we would update VCS daily based on production because we did want the history and the safety net. Those changes went through QA and were deployed via change management processes, just not directly from VCS. Perfect? Not at all. But reasonable, at least to me!

    Not saying its a great reason, but to do more than that requires it to be fairly painless/cheap or its a non-starter. I also wonder if its just a lack of experience on my part so that I could reduce the pain/cost to the point that it wasn't a reason to not do it.

  • From a software engineering perspective, I think it's a design mistake. We are storing the source code along with the compiled code, when there's no reason to do so. We don't do this in most systems, though we have done this in web servers with HTML/ASP/JS files.

    In those cases, as with SQL, we have no shortage of mistakes made. Not always, not even often, but regularly. I think when we look back in the history of our deployments, in almost any organization, we find regular mistakes being made because of poor engineering practices. That's where changing things on the server causes issues.

    Not to say that people don't make mistake when they write C#/VB/Java/Perl/etc code, but they at least recompile things and there are chances to have the compiler, as well as any additions (static analysis, code coverage, testing, etc) catch issues.

    The bang for the buck may be true, but that's the same process for unregulated, untested C# code. It might not be a big deal in many cases.

    My point would be that better software engineering practices could be enforced in the database from the vendor. Not that this would prevent bugs or mistakes, but it allows one less place for silly, fast, stupid mistakes.

  • From the perspective of a DBA, I'm sure most of us would prefer that stored procedures an views be stored in clear text and that we have the option to script out the DDL.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

Viewing 15 posts - 31 through 45 (of 99 total)

You must be logged in to reply to this topic. Login to reply