Human performance is a major factor in overall system performance
Humans are increasingly the bottleneck for system performance
Human factors engineering design drives human performance and thus system performance
Why care about humans?
In many system development efforts, the focus is on the capabilities of the technology: How fast can the jet fly? How accurately can the rifle fire?
We can talk about the horsepower of the engines and the boring of the rifle until the cows come home, but without a human pressing the throttle or pulling the trigger, neither technology is doing anything. A major mistake many systems engineering efforts experience is neglecting the impact of the human on the performance of the system.
A great example is the FIM-92 Stinger Man Portable Air Defense System. Stinger had a requirement to hit the target 60% of the time, which was met easily in developmental testing. However, put in the hands of actual soldiers, it only hit the target 30% of the time. An Army report found that the system suffered from several shortcomings including poor usability and a lack of consideration for the capabilities of the intended user population. The technology hit the mark, but the system as a whole failed1.
Let’s illustrate with a more everyday example. I play ice hockey and use a professional composite stick. I would guess that my fastest slap shot clocks in at around 50 mph. A pro using the exact same stick could easily break 100 mph. Clearly the technology isn’t any different, I just don’t have the same level of skill. The performance is the combination of the technology and the human using it.
Once we acknowledge that fact, it’s clear that we must understand the capabilities and limitations of the users to understand how the system is going to work in the real world. Most human factors models capture this interaction in one way or another. My preferred model for most systems is the FAA human factors interaction model, shown below. This model shows a continuous loop. The human takes in information through sensory capabilities, makes a decision, and translates that decision into actions to the system; then, the system takes those inputs, responds appropriately, and updates the displays for the loop to repeat.
This just drives home the point that system performance is driven by both technology and human performance. But, simply accounting for human performance is the bare minimum. In most cases we can go much further, designing the human-technology interactions to enhance the performance of the human and thus the integrated system.
The human bottleneck
A related model, often used by the military, is the OODA loop: Observe, Orient, Decide, Act. In any competition from ice hockey to strategy games to aerial dogfights, an entity that can execute the OODA loop faster and more accurately than their opponent, all other factors being equal, will win. This is a useful paradigm for exploring human performance in complex systems.
Systems developers have paid more and more attention to the OODA loop in recent decades, as computer technologies have significantly sped up the loop. We have more ability to collect and act upon information than ever before, to the point that it can be overwhelming if not managed effectively. We’ve come a long way from WWII cockpits with dial gauges and completely manual controls to point-and-click control of otherwise-autonomous aircraft. Computers used to require tedious manual programming with careful planning for even relatively simple tasks, and lots of waiting around for programs to finish running. Now, computers can complete tasks nearly instantaneously2 and are often idle waiting for the human’s next command. Automation has taken over many simpler tasks, and can do them better and more reliably than a human.
In short, it’s not the technology delaying the OODA loop; the human is the bottleneck.
The role of human factors engineering
Even selecting the very best humans and providing them with the very best training can only improve performance so much, and that’s a pretty costly approach. The solution is obvious: engineer superhumans. However, effective human factors engineering can support and enhance human performance.
Human factors engineering (HFE) is a broad and multidisciplinary field that addresses any interface between human and technology. Depending on the needs of the system, this could be as simple as ensuring that displays are clearly readable. For advanced systems with autonomous capabilities, HFE supports effective functional allocation among the technology and human elements of the system, maximizing the value of both; the technology handles the things that don’t require human decision making to allow the user to focus on the tasks that do require uniquely human capabilities. Effective human interfaces support the human’s tasks by presenting the right information at the right time in the most useful manner, allowing the human sensory and cognitive components to work speedily and accurately. That’s followed by intuitive controls for transmitting the human’s decision back to the technology.
The OODA loop is sped up when the human gets the right information presented in an effective and timely manner and can act on that information also in an effective and timely manner. When the human is the bottleneck, any HFE design improvements that support human performance have a direct corresponding impact on system performance. In order to have the biggest impact, the HFE effort must be initiated early on when those allocation and design decisions have not yet been made. Additionally, the human must be captured in all system architectural, behavioral, and simulation models.
The Stinger example demonstrates the risk of pushing off human factors engineering, and that was for a relatively straightforward system. To enhance the OODA loop and maintain a competitive edge in advanced modern systems, HFE is a must. System performance is the product of technology and human performance, and HFE is essential for ensuring the human aspect of that equation.
Failures and errors happen frequently. A part breaks, an instruction is misunderstood, a rodent chews through a power cord. The issue gets noticed, we respond to correct it, we clean up any impacts, and we’re back in business.
Occasionally, a catastrophic loss occurs. A plane crashes, a patient dies during an operation, an attacker installs ransomware on the network. We often look for a single cause or freak occurrence to explain the incident. Rarely, if ever, are these accurate.
Chapter 10 of the INCOSE Systems Engineering Handbook covers “Specialty Engineering”. Take a look at the table of contents below. It’s a hodge-podge of roles and skillsets with varying scope.
There doesn’t seem to be rhyme or reason to this list of items. Training Needs Analysis is a perfect example. There’s no doubt that it’s important, but it’s one rather specific task and not a field unto itself. If you’re going to include this activity, why not its siblings Manpower Analysis and Personnel Analysis?
On the other hand, some of the items in this chapter are supposedly “integral” to the engineering process. This is belied by the fact that they’re shunted into this separate chapter at the end of the handbook. In practice, too, they’re often organized into a separate specialty engineering group within a project.
This isn’t very effective.
Many of these roles really are integral to systems engineering. Their involvement early on in each relevant process ensures proper planning, awareness, and execution. They can’t make this impact if they’re overlooked, which often happens when they’re organizationally separated from the rest of the systems engineering team. By including them in the specialty engineering section along with genuinely tangential tasks, INCOSE has basically stated that these roles are less important to the success of the project.
The solution is simple: re-evaluate and remove, or at least re-organize, this section of the handbook.
The actual systems engineering roles should be integrated into the rest of the handbook. Most of them already are mentioned throughout the document. The descriptions of each role currently in the specialty engineering section can be moved to the appropriate process section. Human systems integration, for example, might fit into “Technical Management Processes” or “Cross-Cutting Systems Engineering Methods”.
The tangential tasks, such as Training Needs Analysis, should be removed from the handbook altogether. These would be more appropriate as a list of tools and techniques maintained separately online, where it can be updated frequently and cross-referenced with other sources.
Of course, the real impact comes when leaders internalize these changes and organize their programs to effectively integrate these functions. That will come with time and demonstrated success.
The 737 is an excellent airplane with a long history of safe, efficient service. Boeing’s cockpit philosophy of direct pilot control and positive mechanical feedback represents excellent human factors1. In the latest generation, the 737 Max, Boeing added a new component to the flight control system which deviated from this philosophy, resulting in two fatal crashes. This is a case study in the failure of human factors engineering and systems engineering.
The 737 Max and MCAS
You’ve certainly heard of the 737 Max, the fatal crashes in October 2018 and March 2019, and the Maneuvering Characteristics Augmentation System (MCAS) which has been cited as the culprit. Even if you’re already familiar, I highly recommend these two thorough and fascinating articles:
Darryl Campbell at The Verge traces the market pressures and regulatory environment which led to the design of the Max, describes the cockpit activities leading up to each crash, and analyzes the information Boeing provided to pilots.
Gregory Travis at IEEE Spectrum provides a thorough analysis of the technical design failures from the perspective of a software engineer along with an appropriately glib analysis of the business and regulatory environment.
Typically I’d caution against armchair analysis of an aviation incident until the final crash investigation report is in. However, given the availability of information on the design of the 737 Max, I think the engineering failures are clear even as the crash investigations continue.
The most glaring, obvious, and completely inexplicable design choice was a lack of redundancy in the MCAS sensor inputs. Gregory Travis blames “inexperience, hubris, or lack of cultural understanding” on the part of the software team. That certainly seems to be the case, but it’s nowhere near the whole story.
There’s a team whose job it is to understand how the various aspects of the system work together: systems engineering2. One essential job of the systems engineer is to understand all of the possible interactions among system components, how they interact under various conditions, and what happens if any part (or combination of parts) fails. That last part is addressed by hazard analysis techniques such as failure modes, effects, and criticality analysis (FMECA).
The details of risk management may vary among organizations, but the general principles are the same: (1) Identify hazards, (2) categorize by severity and probability, (3) mitigate/control risk as much as practical and to an acceptable level, (4) monitor for any issues. These techniques give the engineering team confidence that the system will be reasonably safe.
On its own, the angle of attack (AoA) sensor is an important but not critical component. The pilots can fly the plane without it, though stall-protection, automatic trim, and autopilot functions won’t work normally, increasing pilot workload. The interaction between the sensor and flight control augmentation system, MCAS in the case of the Max, can be critical. If MCAS uses incorrect AoA information from a faulty sensor, it can push the nose down and cause the plane to lose altitude. If this happens, the pilots must be able to diagnose the situation and respond appropriately. Thus the probability of a crash caused by an AoA failure can be notionally figured as follows:
P(AoA sensor failure) × P(system unable to recognize failure) × P(system unable to adapt to failure) × P(pilots unable to diagnose failure) × P(pilots unable to disable MCAS) × P(pilots unable to safely fly without MCAS)
AoA sensors can fail, but that shouldn’t be much of an issue because the plane has at least two of them and it’s pretty easy for the computers to notice a mismatch between them and also with other sources of attitude data such as inertial navigation systems. Except, of course, that the MCAS didn’t bother to cross-check; the probability of the Max failing to recognize and adapt to a potential AoA sensor failure was 100%. You can see where I’m going with this: the AoA sensor is a single point of failure with a direct path through the MCAS to the flight controls. Single point of failure and flight controls in the same sentence ought to give any engineer chills.
The next link in our failure chain is the pilots and their ability to recognize, diagnose, and respond to the issue. This implies proper training, procedures, and understanding of the system. From the news coverage, it seems that pilots were not provided sufficient information on the existence of MCAS and how to respond to its failure. Systems and human factors engineers, armed with a hazard analysis, should have known about and addressed this potential contributing factor to reduce the overall risk.
Finally, there’s the ability of the pilots to disable and fly without MCAS. The Ethiopian Airlines crew correctly diagnosed and responded to the issue but the aerodynamic forces apparently prevented them from manually correcting it. The ability to override those forces, plus the time it takes to correct the flight path, should have been part of the FMECA analysis.
I have no specific knowledge of the hazard analyses performed on the 737 Max. Based on recent events, it seems that the risk of this type of failure was severely underestimated or went unaddressed. Either one is equally poor systems engineering.
Cockpit human factors
An inaccurate hazard analysis, though inexcusable, could be an oversight. Compounding that, Boeing made a clear design decision in the cockpit controls which is hard to defend.
In previous 737 models, pilots could quickly override automatic trim control by yanking back on the yoke, similar to disabling cruise control in a car by hitting the brake. This is great human factors and it fit right in with Boeing’s cockpit philosophy of ensuring that the human was always in ultimate control. This function was removed in the Max.
As both the Lion Air and Ethiopian Airlines crew experienced, the aerodynamic forces being fed into the yoke are too strong for the human pilots to overcome. When MCAS directs the nose to go down, the nose goes down. Rather than simply control the airplane, Max pilots first have to disable the automated systems. Comparisons to HAL are not unwarranted.
Boeing is developing a fix for MCAS. It will include redundancy in AoA sensor inputs, not activating MCAS if the sensors disagree, MCAS activating only once per high-angle indication (i.e. not continuously activating after the pilots have given contrary commands), and limiting the feedback forces into the control yoke so that they aren’t stronger than the pilots. This functionality should have been part of the system to begin with.
Along with these fixes, Boeing is likely3 also re-conducting a complete hazard analysis of MCAS and other flight control systems. Boeing and the FAA should not clear the type until the hazards are completely understood, controlled, quantified, and deemed acceptable.
Many news stories frame the 737 Max crashes in terms of the market and regulatory pressures which resulted in the design. While I don’t disagree, these are not an excuse for the systems engineering failures. The 737 Max is a valuable case study for engineers of all types in any industry, and for systems engineers in high-risk industries in particular.
A system lexicon is a simple tool which can have a big impact on the success of the system. It aligns terminology among technical teams, the customer, subcontractors, support personnel, and end users. This creates shared understanding and improves consistency. Read on to learn how to implement this powerful tool on your program.
Successful systems are created by engineers who understand and design to the ultimate objectives of the project. When we lose sight of those objectives we start making design decisions based on the wrong criteria and thus create sub-optimal designs. Scope creep, group think, and simple convenience are frequent causes of this type of variation. An effective design assessment tool is a touchstone by which we can evaluate the effectiveness of ongoing design decisions and keep the focus on the optimal solution.
The application of human systems integration (HSI) throughout a project results in improved system performance, reduced lifecycle cost, reduced development risk, and no increase in development cost when executed effectively.