Continued from part 1 and part 2

In Part 2 I covered the project management spiral of doom (also known as the project management triangle, which are different shapes but the same outcome). In Part 3 I'm going to provide another real life example of one type technical debt: architectural.

These are both summations of situations for clients, so the situation is straight out of Dragnet: the names are made up but the problems are real.

For our purposes, we'll return to the client General Industry Design Knowledge, Inc. or GIDK for short. As a quick reminder, GIDK is a global brand with multi-billion dollar revenue.

GIDK likes to cut corners in the technology space, and they tend to do so without analyzing the repercussions of the decisions to do so. One of these decisions related to infrastructure and application development. Specifically, the fact that the architecture of multiple environments (Dev, QA, UAT, and Production) did not match each other. This wasn't truly a problem until a new code release and a major cutover from a recently purchased rival went live.

The production impact was that people were unable to use their GIDK products from time to time because the 'dial home' feature to validate the license was periodically unavailable. The problem existed for more than two weeks and could not be reproduced in any of the other environments, nor during local dev testing. Constantly at this point the company was calling their managed services provider and complaining that the MSP had broken something. At the end of that two weeks, the infrastructure began getting compared and the inconsistencies mapped out.

So what was the final problem? Gee, I don't know - was it the difference in the environments? In fact, it was. In production the application was scaled to multiple nodes, where it wasn't anywhere else, and a load balancer was set up in front of the nodes using a least-connections config without sticky sessions. This was important because the sessions themselves were non-portable between nodes (a different piece of technical debt for another time) and without sticky sessions enabled. End users would authenticate to a node, get a token, use the token for secure communication to validate the license and get an auth failure, because the token wasn't valid on an alternate node.

The MSP was then called to account for why this set-up existed. In writing was the statement that the non-prod environments should match production, and the response was that GIDK didn't think that was necessary at the cost, and thought that the other environments could be scaled back.

You may be thinking "Aha! It was a management decision, not an architectural one! I see the word cost in there!" Smart thinking, and good for you - but wrong. While cost probably was the driving factor, it impacted the architecture of the infrastructure. Architectural technical debt, like all technical debt, can be driven from many factors including cost, and ultimately it was the architecture that caught up to GIDK.

There is one more part to this series on the way: technical debt due to personnel. In it I'll review the cost of poor hiring decisions on technical debt.

If you want to hear me speak about this, I'll be at DevSecCon Boston 2017 and DevSecCon London 2017 giving a talk called "The Harsh Reality of Technical Debt"