Sunday, April 26, 2015

Adaptive Value Driven Development

The SDLC process my development group at work follows is based in Agile, but is not Scrum, XP, Lean or any other implementations of Agile.  Instead, we have created our own implementations that focuses on adaptation over time.  So over time we have added, evaluated and kept any practices that added value to arrive at the current incarnation of our process.

The process works within a particular context.  The group is small, less than 5 senior engineers. No one works remotely.  We are an internal tools team whose focus is on resiliency, reliability and rapid release and deployment.

Example Service Summary:
  • 100% uptime (0 downtime) over the past 22 months.
  • 80 releases (~90 features) to production over those same 22 months.
  • 800,000 requests served over the past 3 months.
  • 0 dropped requests due to deployments or unavailable up-stream systems.
This is context in which this process was applied.

The Core

The core of our process is focused on adaptation/evolution.  We iterate on our process in the same way we iterate on our code, to evolve it in response to the requirements placed on it.  We look to refine, add, or remove our practices using iteration.  The central question we ask of every process we use is "Does it provide more value than it's cost?".   That leads us into the definition of value, but at the same time, a short digression on process.

Any process used including an SDLC is there to facilitate work.  No practice or process has value in and of itself.   It only has value if it facilitates work. Value in an SDLC is measured by the ability for a practice to contribute or facilitate in the creation and delivery of maintainable, extensible software in a timely manner.  Any practice whose cost exceeds its value should be changed (to reduce cost or increase value) or removed.  This includes any so called "best practices".  Try out practices, and measure their success, and when you have data make a decision.  Just because a practice is on someone's list of best practices doesn't make it so.

The evaluation of practices is not a one time calculation.  All practices should continue to be challenged.  Not necessarily in every iteration, but whenever the cost/value proposition may have changed.  The SDLC should continue to evolve and change to suit the needs of the team and the software they develop.

The Tenets

Deliver customer value (new feature and fixes) as soon as possible, without service interruptions. Provide the ability to potentially deploy after every fix without a maintenance window.

All engineers are required to be active participants in shepherding the ongoing evolution of the SDLC.  They must collaborate, express opinions, and engage in technical dialectics all in an effort to create the best software they can.

Accountability and responsibility is placed on individuals, never the group.  This is true for projects, project work, questioning or proposing changes to the process.

Be pragmatic, there will always be exceptions to the rule.  Follow the process as much as possible, but realize there will be exceptions.  These exceptions should be explicit (raised up to the team lead or manager) and then dealt with in the most pragmatic way possible.  Exceptions should be exceptional, not the norm.  They should not happen often.

All things being equal, the fewer lines of code, the better.  Always favor implementations that yield smaller code sizes, but are still extensible and maintainable.   This is done as a result of using any and all programming techniques to reduce code size including meta programming,  and mixed paradigm programming (object oriented, functional, aspect).  This leads to fewer defects, less code to manage, and less time in code updates.  Do not confuse code maintainability/readability with technique familiarity or coding to a lesser skill level.

Collaboration is not encouraged, it is required.  The team should be a team, not just a collection of individuals sitting close together.  That means regular communication throughout the day.  This means being open to this kind of communication, and learning how to accept and manage interruption.

Never implicitly accrue technical debt.  The completion of every piece of work involves rigorous refactoring.  In exceptional cases you may have to accrue technical debt, but do so with knowledge aforethought. Pay off accrued technical debt as soon as possible.

Some of Our Current Practices

Keep in mind all these practices can have an exception, but excepting from them should be explicit and should engage the team lead or manager.  Since they are exceptions they shouldn't happen often.

Group Ownership. Everyone works on everything.  Engineers should not regularly pick up cards for the same areas of code.  Ideally, they should pick cards that represent areas they are least familiar with.  It is the responsibility of the engineer to get help from those that have more experience in unfamiliar areas as needed in order to do the work in a reasonable time frame.

Everyone get's an opportunity to be a project lead. As new projects come in engineers are picked to be the project lead.  They are accountable and responsible for the success of the project.  This means they must engage the stakeholders to understand what the software should do, get answers for all the open questions, break down the work into cards, and all the due diligence required for the success of the project.

All work tracks back to customer value. Projects are broken down into cards.  All cards must represent customer facing features (value), or defects.  They must also describe the acceptance criteria based on customer value.

Automate the validation of acceptance criteria. We practice ATDD so the acceptance criteria becomes the acceptance test which is co-developed with the implementation.  A feature is done when there is a working system with that particular feature implemented and the automated acceptance test developed and passes.  As an aside we don't actually use Selenium or any other typical acceptance test framework.  In a normal acceptance test the server would actually start up and the test would then execute as black box calls against the system.  We found there is more value in retaining the control provided by the unit test framework, especially when needing to validate particular exception/error conditions.  So we simulate a network call by actually doing a function call to the top of the stack.  The test then exercises all other ancillary systems such as databases, file systems, SOAP or REST APIs, etc.  This allows us to leverage mocks or dynamic patching to exercise error conditions or more complex scenarios.

Leave clean implementations with a minimum of code. Refactoring is required with every card.  The goal is to have the smallest code size that is still extensible and maintainable. This doesn't mean we pack as much on a single line as possible.  What is does mean is we use any and all techniques to reduce code size.   Engineers are expected to learn and grow their skills, this includes meta-programming, functional programming, object oriented programming, aspect oriented programming  as well as any idioms and techniques specific to the language.  We never code "down" or restrict the use of particular language abilities.  Closures, function wrapping, dynamic class and method manipulation, dispatch programming, DSLs, in some cases even dynamic patching.  These techniques and others are all tools to be leveraged as long as their use is appropriate.

Smaller work items, tighter cycles, less risk. Card are created to be completed in about 1 to 1.5 days as much as possible.  This means the project lead does the job of breaking down feature to granular pieces that can be completed in that time frame, AND provide value to the customer.  Cards that take too long can be noted and managed explicitly.  The lead or manager can decide to accept the longer time frame, or punt on the card and move on earlier and with less lost investment.  The team can also switch to a higher priority much easier; either by finishing the card (1-2 days), or simple losing the investment (1-2 days).  Less risk, more flexibility.  The cost is more work by the lead in breaking down the cards properly, however this also generally leads to a more well defined solution.

You Ain't Gonna Need It Yet (YAGNI).  Write only the minimal implementation that is required in order to meet the acceptance criteria which means a working system.  This doesn't mean ignoring proper design and code layout and putting everything in one function or object.  It means only writing code that is used in meeting the acceptance criteria.  New code for new acceptance criteria as well as, refactors for acceptance criteria that already have implementations.

Track development cadence, understand the sources of variance. Our iterations are similar to Sprints, except they are dissociated from projects.  The reason we dissociate from project timelines is because we constantly have new projects coming in, or finishing.  Planning is always happening, implementation is always happening, all aspects of our SDLC are always happening.  We do not work in phases.  Our iterations are the work week, and we calculate our velocity, our total work completed for the week.  We then keep a running 12 week mean and standard deviation.  When doing planning we are able to use the mean and 1 standard deviation to do a relatively accurate time to completion for a project assuming we have already broken down all the work.  The standard deviation reflects variance in what the team is spending their time on.  For example, this analysis may reveal the team is getting frequent interruptions in the form of questions that may be better addressed to the manager.

Make releases cheap and easy.  We do not use feature branches.  All work and commits are done to HEAD.  Every build is a release candidate as long as the build is successful.  The build should always be successful, build breaks are treated as top priority for fixing.  Our builds tag the repository, automatically add any completed tickets to our ChangeLog, run through our automated acceptance test suite, becomes packaged, and the package is made available from our software repository for installation to any QA server.  The version of the package is the same as the tag in the repository so we retain an audit trail allowing us to track back from installed product to repository tag at any time.

System and software design for 100% uptime.  All software should be deployable without a maintenance window.  In only exceptional cases should we need to take an outage, and generally it should not be a full outage.  This means we design from the start for high availability (Active/Active) because it's relatively easy and there is generally no reason not to.

Summary

We have been very successful as a small development group.  We write internal REST services and web applications, but the services do get a significant amount of traffic and are required to be a cut above when it comes to reliability and stability.  The success we have had in the amount, quality and cadence of releases can be attributed to our process.


§