By Frank Boshoff, October 26, 2017
In preparation for the NGSIS Platform Modernization cutover next year, I’ve been spending time investigating operational models for the new system. This is more of a human process model than a purely technical one, and is important because we’re moving from a single machine to 119 smaller ones, which means there is more that can go wrong (but the overall benefit is still worthwhile).
In ITS, production issues are addressed as promptly as possible, but building and releasing new applications takes longer than desired, much, much longer. Troubleshooting is a constant challenge – often because many of our applications are “special snowflakes” – made up of different components, different versions of components, different staff that know each component…and these staff are in different departments that have different priorities.
This amounts to what I’ll loosely term “friction”. It feels like inertia and everything seems to take far more effort than it should. How we’ve ended up in this situation is hardly relevant, instead, I’m wondering why we can’t do better, wondering what an ideal system might be (a system consisting of interacting components including people, processes and technology). A system that has high throughput and low latency. Adding more staff isn’t always the answer until we know that we’re doing the best we can with the staff we have.
To strive to do better, we could adopt a new maxim: “Frictionless interaction” (with apologies to the Physics majors and their frictionless inclines).
The first challenge is to identify what type of a system we are dealing with. The Cynefin framework (pronounced Kun-ne-vin) helps one make sense of a situation so that one can respond appropriately  . Based on Cynefin, introducing a new IT platform for ROSI can be considered a complex endeavor, and action must be taken to move it to a complicated endeavor and thereby improve the chances of success. This is what Enterprise Architects do. The architecture artifacts result in a plan for implementation. The plan for ROSI is currently in progress with a target date for production in May 2018.
With ROSI implementation underway, one of the next steps is to prepare ITS managers and staff to support and operate the new ROSI platform, in order to provide the service levels required by the University. The processes often cross three ITS departments: EASI, ISEA and EIS, requiring handovers from one staff member to another, and this is often where latencies occur.
Handovers should occur for two reasons:
Latencies occur for a number of reasons, for example:
Quality can be addressed by coaching and training, including peer-to-peer knowledge transfer. Unpredictability can be reduced by smart management but requires visibility of the work as a first step. Kanban is an effective method to visualize tasks and their progress, resulting in cycle time – the amount of time it takes for a task to cycle through the Kanban stages . When a new high priority task arrives, it is possible to identify the tasks impacted by the new work. Visualizing the work helps reduce the potential chaos caused by unpredictability, but most importantly, determines whether we’re doing the best with the staff we have.
Bottlenecks must be identified and managed (not necessarily eliminated – that might be too simple and could result in chaos). The throughput of the entire process will hinge on the throughput of the bottleneck. There is no point in optimizing in one department if the overall process is not improved directly (this is known as local optimization vs. global optimization).
If multiple handovers occur in a process, latencies can accumulate, causing tasks to take hours or days when they could take minutes to accomplish. To reduce latency, staff need to be available to respond to handovers. Skills must be shared to avoid bottlenecks – we have far too many occurrences of a single person being the only one who can accomplish a task, and this creates enormous latency in our processes.
Quality has a profound effect on a team’s output. Incomplete handovers must be reduced to a minimum, preferably zero (don’t handover stuff that doesn’t work). The organizations that get it right enjoy up to a 200x lead time improvement over the worst performers, and 60x better quality .
It seems worthwhile to invest effort in reducing latency and improving throughput. Frictionless interaction. Just imagine.