• Home
  • Category: Systems Engineering

Agile SE Part 3: Agile Contracts and the Downfall of Requirements

Welcome to a series on Agile Systems Engineering, exploring the practical aspects of this emerging approach. If you didn’t see it already, check out Part 1: What is Agile, Anyway? and Part 2: What’s Your Problem?

The antithesis of agile

Requirements are a poor way to acquire a system. They’re great in theory, but frequently fail in practice. Writing good requirements is hard, much harder than you’d think if you’ve never had the opportunity. Ivy Hooks gives several examples of good and bad requirements in the paper “Writing Good Requirements“. Poor requirements can unnecessarily constrain the design, be interpreted incorrectly, and pose challenges for verification. Over-specification results in spending on capabilities that aren’t really needed while under-specification can result in a final product that doesn’t provide all of the required functions.

If writing one requirement is hard, try scaling it up to an entire complex system. Requirements-based acquisition rests on the assumption that specification and statement of work are complete, consistent, and effective. That requires a great deal of up-front work with limited opportunity to correct issues found later. A 2015 GAO report found that “DoD often does not perform sufficient up-front requirements analysis”, leading to “cost, schedule, and performance problems”.

And that’s just the practical issue. The systematic issue with requirements is that the process of analyzing and specifying requirements is time consuming. One of the more recent DoD acquisition buzzphrases is “speed of relevance”. Up-front requirements are antithetical to this goal. If it takes months or even years just to develop those requirements, the battlefield will have evolved before a contract can be issued. Add years of development and testing, and then we’re deploying last-generation technology geared to meeting a past need. That’s the speed of irrelevance.

Agile promises a better approach to deliver capabilities faster. But we have to move away from large up-front requirements efforts.

Still from Back to the Future Part 2 with subtitles changed to: Requirements? Where we're going, we don't need requirements!

Agile contracting

Traditional requirements-based acquisition represents a fixed scope, with up-front planning to estimate the time and cost required to accomplish that scope. Pivoting during the development effort (for example, as we learn more about what is required to accomplish the mission) requires re-planning with significant cost and schedule impacts. The Government Accountability Office (GAO) conducts annual reviews of Major Defense Acquisition Programs (MDAPs). The most recent report analyzing 85 MDAPs found that they have experienced over 54 percent total cost growth and 29 percent schedule growth, resulting in an average delay of more than 2 years.

Defense acquisition leaders talk about delivering essential capabilities faster and then continuing to add value with incremental deliveries, which is a foundational Agile and Dev*Ops concept. But you can’t do that effectively under a fixed-scope contract where the emphasis is on driving to that “complete” solution.

The opposite of a fixed-scope contract is a value stream or capacity of work model. Give the development teams broad objectives and let them get to work. Orient the process around incremental deliveries, prioritize the work that will provide the most value soonest, and start getting those capabilities to the field.

triangle with vertices labeled "SCOPE", "COST", and "TIME", the center is the word "QUALITY"
Project Management Triangle

“But wait,” you say, “doesn’t the project have to end at some point?” That’s the best part of this model. The developer’s ‘fixed’ cost and schedule keeps getting renewed as long as they’re providing value. The contractor is incentivized to delivery quality products and to work with the customer to prioritize the backlog, or the customer may choose not to renew the contract. The customer has flexibility to adjust funding profiles over time, ramping up or down based on need and funding availability. If the work reaches a natural end point—any additional features wouldn’t be worth the cost or there is no longer a need for the product—the effort can be gracefully wrapped up.

You may be familiar with the project management triangle1. Traditional approaches try to fix all of the aspects, and very often fail. Agile approaches provide guardrails to manage all of the aspects but otherwise allow the effort to evolve organically.

Agile requirements

The most important aspect of agile approaches is that they shift requirements development from an intensive up-front effort to an ongoing, collaborative effort. The graphic below illustrates the difference between traditional and agile approaches. With traditional approaches, the contractor is incentivized to meet the contractual requirements, whether or not the system actually delivers value to the using organization or is effective to the end user.

Block diagram showing acquisition models. Traditional acquisition model includes using organization defining need to acquisition organization writing requirements for contractor organization delivering system to using organization deploying system to end users. Agile acquisition model includes using organization defining need to acquisition organization creating an agile contract to contractor, iterative feedback between contractor and end users, collaboration among all groups, the contractor continuous delivery to using organization which then deploys to end users.

In an agile model, the development backlog will be seeded with high-level system objectives. Requirements are developed through collaboration among the stakeholders and the development is shaped by iterative user feedback. The agile contract may have a small set of system requirements or constraints. For example, it may be a requirement for the system to comply with an established architecture or interface, meet particular performance requirements, or adhere to relevant standards. The key is that the provided set of requirements are as minimal as possible.

The requirements discovery, analysis, and development process is collaborative, iterative, and ongoing. It really isn’t extremely different from a traditional requirements decomposition, as requirements still have to be traceable from top-level objectives. A key difference is that the decomposition happens closer to the development, both in time and organization. The rationale and mission context for a requirement won’t get lost because the development team is involved in the process, so they understand the drivers behind the features they’ll be implementing.

I’m getting ahead of myself, though! In the next installment of this series we’ll look at cross-functional development teams, the role of Product Owner, and scaling up to a large project.

What are your experiences with agile contracts and agile requirements? Share your best practices, horror stories, and pitfalls to avoid below.

Agile SE Part Two: What’s Your Problem?

Welcome to a series on Agile Systems Engineering exploring the practical aspects of this emerging approach. If you didn’t see it already, check out Part 1: What is Agile, Anyway?

A faster horse

“If I had asked people what they wanted, they would have said faster horses.”

Apocryphally attributed to Henry Ford1

When people trot out that quote they’re often trying to make the point that seeking user feedback will only constrain the design because our small-minded <sneer>users</sneer> cannot possibly think outside the box. I disagree with that approach. User feedback is valuable information. It should not constrain the design, but it is essential to be able to understand and empathize with your users. They say “faster horse”? It’s your job to generalize and innovate on that desire to come up with a car. The problem with the “singular visionary” approach is that for every wildly successful visionary there are a dozen more with equally innovative ideas that didn’t find a market.

Sometimes, your research will even lead you to discover something totally unexpected which changes your whole perspective on the problem.

Here’s a great, real-world example from a Stanford Hacking for Defense class:

Customer ≠ user

Team Aqualink was tasked by their customer (the chief medical officer of the Navy SEALS) to build a biometric monitoring kit for Navy divers. These divers face both acute and long-term health impacts due to the duration and severe conditions inherent in their dives. A wearable sensor system would allow divers to monitor their own health during a dive and allow Navy doctors to analyze the data afterwords.

Team Aqualink put themselves in the flippers of a SEAL dive team (literally) and discovered something interesting: many of the dives were longer than necessary because the divers lacked a good navigation system. The medical concerns were, at least partially, really a symptom. What the divers truly wanted was GPS or another navigational system that worked at depth. Solving that root cause would alleviate many of the health concerns and improve mission performance, a much broader impact.

The customer was trying to solve the problem they saw without a deeper understanding of the user’s needs. That’s not a criticism of the customer. Truly understanding user needs is hard and requires substantial effort by engineers well-versed in user requirements discovery.

In the US DoD, the Joint Capabilities Integration and Development System (JCIDS) process is intended to identify mission capability gaps and potential solutions. The Initial Capabilities Document (ICD), Capability Development Document (CDD), and Key Performance Parameters (KPPs) are the basis for every materiel acquisition. This process suffers from the same shortcoming as the biometric project: it’s based on data that is often removed from the everyday experiences of the user. But once requirements are written, it’s very hard to change them even if the development team uncovers groundbreaking new insights.

The Bradley Fighting Vehicle

Still capture from The Pentagon Wars (1998)

The Bradley Fighting Vehicle was lampooned in the 1998 movie The Pentagon Wars2. By contrast, the program to replace the Bradley is being held up as an example of a new way of doing business.

Instead of determining the requirements from the outset, the Army is funding five companies for an 18-month digital prototyping effort. The teams were given a set of nine desired characteristics for the vehicle and will have the freedom to explore varying designs in a low-cost digital environment. The Army realizes that the companies may have tools, experiences, and concepts to innovate in ways the Army hasn’t considered. The Army is defining the problem space and stepping back to allow the contractors to explore the solution space.

Requirements myopia

System engineering for the DoD is built around requirements. The aforementioned JCIDS process defines the need. Based on that need, the acquisition command defines the requirements. The contractor bids and develops to those requirements. The test commands evaluate the system against those requirements. In theory, since those requirements tie back to warfighter needs, if we met the requirements we must have met the need.

But, there’s a gap. In the proposal process, contractors evaluate the scope of work and estimate how much effort will be required to complete the work. Sometimes this is based on concrete data from similar efforts in the past. Other times, it’s practically a guess. If requirements are incompletely specified, there could be significant latitude for interpretation. Even really good requirement sets cannot adequately capture the actual, boots-on-the-ground mission and user needs.

So, the contractor has bid a certain cost to complete the work based on their understanding of the requirements provided. If they learn more information about the user need but meeting that need would drive up the cost, they have three options:

  1. Ask the customer for a contractual change and more money to develop the desired functionality
  2. Absorb the additional costs
  3. Build to the requirement even if it isn’t the best way to meet the need (or doesn’t really meet it at all)

Obviously none of these solutions are ideal. Shelling out much more than originally budgeted reflects poorly on the government program office, who has to answer to Congress for significant overruns. Contractors will absorb some additional development cost from a “management reserve” fund built into their bid, but that amount is pretty limited. In many cases, we end up with option 3.

This is heavily driven by incentive structures. Contractors are evaluated and compensated based on meeting the requirements. Therefore, the contractor’s success metrics and leadership bonuses are built around requirements. Leaders put pressure on engineers to meet requirement metrics and so engineers are incentivized to prioritize the metrics over system performance. DoD acquisition reforms such as Human Systems Integration (HSI) have attempted to force programs to do better, but have primarily resulted in more requirements-focused bureaucracy and rarely the desired outcome.

I call this “requirements myopia”: a focus on meeting the requirements rather than delivering value.

Refocusing on value

It doesn’t make sense to get rid of requirements entirely, but we can adapt our approach based on the needs of each acquisition. I touched on this briefly in an earlier article, Agile Government Contracts.

One major issue: if we don’t have requirements, how will we know when the development is “done”? Ponder that until next time, because in the next post in this series we’ll dive into some of the potential approaches.

What are your experiences with requirements, good or bad? Thoughts on the “faster horse”, Team Aqualink’s pivot, or the Optionally Manned Fighting Vehicle (OMFV) prototyping effort? Sound off below!

Agile SE Part One: What is Agile, Anyway?

Welcome to a new series on Agile Systems Engineering exploring the practical aspects of this emerging approach.

What is “Agile”?

Agile is a relatively new approach to software development based on the Agile Manifesto and Agile Principles. These documents are straightforward. I will sum them up as stating that development should be driven by what is most valuable to the customer and that our projects should align around delivering value.

Yes, I’ve obnoxiously italicized the word value as if it were in the glossary of a middle school textbook. That’s because value is the essence of this discussion.

Little-a Agile

With a little-a, “agile” is the ability to adapt to a changing situation. This means collaboration to understand the stakeholder needs and the best way to satisfy those needs. It means changing the plan when the situation (or your understanding of the situation) changes. It means understanding what is valuable to the customer, focusing on delivering that value, and minimizing non-value add effort.

Big-A Agile

With a big-A, “Agile” is a software development process that aims to fulfill the agile principles. There are actually several variants that fall under the Agile umbrella such as Scrum, Kanban, and Extreme Programming. Each of these have techniques, rituals, and processes that supposedly lead to delivery of a quality product by helping teams focus on value-added work.

“Cargo Cult” Agile

“Agile” has become the hot-new-thing, buzzword darling of the U.S. defense industry. Did I mean Big-A or Little-a? It hardly matters. As contractors have rushed to promote their “new” development practices, they have trampled the distinction. The result is Cargo Cult Agile: following the rituals of an Agile process and expecting that the project will magically become more efficient and effective as a result. I wrote about this previously, calling it agile-in-name-only and FrAgile.

This isn’t necessarily the fault of contractors. They want to follow the latest best practices from commercial industry to most effectively meet the needs of their customers. But as anyone who has worked in the defense industry can tell you, the pace of change is glacial due to a combination of shear bureaucratic size and byzantine regulations. Most contracts just don’t support agile principles. For example, the Manifesto prioritizes “working software over comprehensive documentation” and one of the Principles is that “working software is the primary measure of progress”; but, most defense contracts require heaps of documentation that are evaluated as the primary measure of progress.

The upshot is that, to most engineers in the defense industry, “Agile” is an annoying new project management approach. Project management is already the least enjoyable part of our job, an obstacle to deal with so that we can get on with the real work. Now we have to learn a new way of doing things that may not be the most effective way to organize our teams and has no real impact on the success of the program. This has resulted in an undeserved bad taste for many of us.

If this is your experience with Agile, please understand that this is not the true intent and practice. The rest of this series will talk about how we achieve real agility.

Agile Systems Engineering

So far, I’ve only mentioned Agile as a software development approach. Of course, we’re here because Agile is being appropriated to all types of engineering, especially as “Agile Hardware Development” and “Agile Systems Engineering”. Some people balk at this; how can a software process be applied to hardware and systems? Here, the distinction between little-a agile and big-A Agile is essential. Software agile development evangelists have taken the values in the Manifesto and Principles and created Agile processes and tools that realize them.

It’s incumbent upon other engineering disciplines to do the same. We must understand the agile values, envision how they are useful in our context (type of engineering, type of solution, customer, etc.), and then craft or adapt Agile processes and tools that make sense. Where many projects and teams go wrong is trying to shoehorn their needs into an Agile process that is a poor fit, and then blaming the process.

Stay Tuned

In the rest of this series we’ll explore how agile SE can provide customer value, how our contracts can be crafted to enable effective Agile processes, and what those processes might look like for a systems engineering team. Stay tuned!

Have you worked on a project with “Cargo Cult Agile”? Have you adapted agile principles effectively in your organization? What other resources are out there for Agile systems engineering? Share your thoughts in the comments below.

The Operations Concept: Developing and Using an OpsCon

  • An Operations Concept is more detailed than a Concept of Operations
  • It is a systems engineering artifact that describes how system use cases are realized
  • It is versatile and serves many uses across the project
  • There is no set format, though there are some best practices to consider

Concept of Operations (ConOps)

Let start by talking about the OpsCon’s better-known big brother, the ConOps.

Read More

Human Factors Design Drives System Performance

Bottom Line Up Front:

  • Human performance is a major factor in overall system performance
  • Humans are increasingly the bottleneck for system performance
  • Human factors engineering design drives human performance and thus system performance

Why care about humans?

In many system development efforts, the focus is on the capabilities of the technology: How fast can the jet fly? How accurately can the rifle fire?

We can talk about the horsepower of the engines and the boring of the rifle until the cows come home, but without a human pressing the throttle or pulling the trigger, neither technology is doing anything. A major mistake many systems engineering efforts experience is neglecting the impact of the human on the performance of the system.

A great example is the FIM-92 Stinger Man Portable Air Defense System. Stinger had a requirement to hit the target 60% of the time, which was met easily in developmental testing. However, put in the hands of actual soldiers, it only hit the target 30% of the time. An Army report found that the system suffered from several shortcomings including poor usability and a lack of consideration for the capabilities of the intended user population. The technology hit the mark, but the system as a whole failed1.

Let’s illustrate with a more everyday example. I play ice hockey and use a professional composite stick. I would guess that my fastest slap shot clocks in at around 50 mph. A pro using the exact same stick could easily break 100 mph. Clearly the technology isn’t any different, I just don’t have the same level of skill. The performance is the combination of the technology and the human using it.

System performance = technology performance * human performance

Once we acknowledge that fact, it’s clear that we must understand the capabilities and limitations of the users to understand how the system is going to work in the real world. Most human factors models capture this interaction in one way or another. My preferred model for most systems is the FAA human factors interaction model, shown below. This model shows a continuous loop. The human takes in information through sensory capabilities, makes a decision, and translates that decision into actions to the system; then, the system takes those inputs, responds appropriately, and updates the displays for the loop to repeat.

This just drives home the point that system performance is driven by both technology and human performance. But, simply accounting for human performance is the bare minimum. In most cases we can go much further, designing the human-technology interactions to enhance the performance of the human and thus the integrated system.

The human bottleneck

A related model, often used by the military, is the OODA loop: Observe, Orient, Decide, Act. In any competition from ice hockey to strategy games to aerial dogfights, an entity that can execute the OODA loop faster and more accurately than their opponent, all other factors being equal, will win. This is a useful paradigm for exploring human performance in complex systems.

Systems developers have paid more and more attention to the OODA loop in recent decades, as computer technologies have significantly sped up the loop. We have more ability to collect and act upon information than ever before, to the point that it can be overwhelming if not managed effectively. We’ve come a long way from WWII cockpits with dial gauges and completely manual controls to point-and-click control of otherwise-autonomous aircraft. Computers used to require tedious manual programming with careful planning for even relatively simple tasks, and lots of waiting around for programs to finish running. Now, computers can complete tasks nearly instantaneously2 and are often idle waiting for the human’s next command. Automation has taken over many simpler tasks, and can do them better and more reliably than a human.

In short, it’s not the technology delaying the OODA loop; the human is the bottleneck.

The role of human factors engineering

Even selecting the very best humans and providing them with the very best training can only improve performance so much, and that’s a pretty costly approach. The solution is obvious: engineer superhumans. However, effective human factors engineering can support and enhance human performance.

Human factors engineering (HFE) is a broad and multidisciplinary field that addresses any interface between human and technology. Depending on the needs of the system, this could be as simple as ensuring that displays are clearly readable. For advanced systems with autonomous capabilities, HFE supports effective functional allocation among the technology and human elements of the system, maximizing the value of both; the technology handles the things that don’t require human decision making to allow the user to focus on the tasks that do require uniquely human capabilities. Effective human interfaces support the human’s tasks by presenting the right information at the right time in the most useful manner, allowing the human sensory and cognitive components to work speedily and accurately. That’s followed by intuitive controls for transmitting the human’s decision back to the technology.

The OODA loop is sped up when the human gets the right information presented in an effective and timely manner and can act on that information also in an effective and timely manner. When the human is the bottleneck, any HFE design improvements that support human performance have a direct corresponding impact on system performance. In order to have the biggest impact, the HFE effort must be initiated early on when those allocation and design decisions have not yet been made. Additionally, the human must be captured in all system architectural, behavioral, and simulation models.

The Stinger example demonstrates the risk of pushing off human factors engineering, and that was for a relatively straightforward system. To enhance the OODA loop and maintain a competitive edge in advanced modern systems, HFE is a must. System performance is the product of technology and human performance, and HFE is essential for ensuring the human aspect of that equation.

A Functional Team is NOT an Integrated Product Team

“My name is Inigo Montoya. You won a government contract. Prepare to deliver CDRLs.”

TL;DR: An Integrated Product Team (IPT) is a cross-functional group. If everyone on the team has the same background, that’s a functional or discipline team. There’s a difference.

Read More

The Swiss cheese model: Designing to reduce catastrophic losses

Failures and errors happen frequently. A part breaks, an instruction is misunderstood, a rodent chews through a power cord. The issue gets noticed, we respond to correct it, we clean up any impacts, and we’re back in business.

Occasionally, a catastrophic loss occurs. A plane crashes, a patient dies during an operation, an attacker installs ransomware on the network. We often look for a single cause or freak occurrence to explain the incident. Rarely, if ever, are these accurate.

Read More

It’s time to get rid of specialty engineering: A criticism of the INCOSE Handbook

Chapter 10 of the INCOSE Systems Engineering Handbook covers “Specialty Engineering”. Take a look at the table of contents below. It’s a hodge-podge of roles and skillsets with varying scope.

Table of contents for the Specialty Engineering section of the INCOSE handbook.
Table of contents for the Specialty Engineering section of the INCOSE handbook.

There doesn’t seem to be rhyme or reason to this list of items. Training Needs Analysis is a perfect example. There’s no doubt that it’s important, but it’s one rather specific task and not a field unto itself. If you’re going to include this activity, why not its siblings Manpower Analysis and Personnel Analysis?

On the other hand, some of the items in this chapter are supposedly “integral” to the engineering process. This is belied by the fact that they’re shunted into this separate chapter at the end of the handbook. In practice, too, they’re often organized into a separate specialty engineering group within a project.

This isn’t very effective.

Many of these roles really are integral to systems engineering. Their involvement early on in each relevant process ensures proper planning, awareness, and execution. They can’t make this impact if they’re overlooked, which often happens when they’re organizationally separated from the rest of the systems engineering team. By including them in the specialty engineering section along with genuinely tangential tasks, INCOSE has basically stated that these roles are less important to the success of the project.

The solution

The solution is simple: re-evaluate and remove, or at least re-organize, this section of the handbook.

The actual systems engineering roles should be integrated into the rest of the handbook. Most of them already are mentioned throughout the document. The descriptions of each role currently in the specialty engineering section can be moved to the appropriate process section. Human systems integration, for example, might fit into “Technical Management Processes” or “Cross-Cutting Systems Engineering Methods”.

The tangential tasks, such as Training Needs Analysis, should be removed from the handbook altogether. These would be more appropriate as a list of tools and techniques maintained separately online, where it can be updated frequently and cross-referenced with other sources.

Of course, the real impact comes when leaders internalize these changes and organize their programs to effectively integrate these functions. That will come with time and demonstrated success.

The Boeing 737 Max crashes represent a failure of systems engineering

The 737 is an excellent airplane with a long history of safe, efficient service. Boeing’s cockpit philosophy of direct pilot control and positive mechanical feedback represents excellent human factors1. In the latest generation, the 737 Max, Boeing added a new component to the flight control system which deviated from this philosophy, resulting in two fatal crashes. This is a case study in the failure of human factors engineering and systems engineering.

The 737 Max and MCAS

You’ve certainly heard of the 737 Max, the fatal crashes in October 2018 and March 2019, and the Maneuvering Characteristics Augmentation System (MCAS) which has been cited as the culprit. Even if you’re already familiar, I highly recommend these two thorough and fascinating articles:

  • Darryl Campbell at The Verge traces the market pressures and regulatory environment which led to the design of the Max, describes the cockpit activities leading up to each crash, and analyzes the information Boeing provided to pilots.
  • Gregory Travis at IEEE Spectrum provides a thorough analysis of the technical design failures from the perspective of a software engineer along with an appropriately glib analysis of the business and regulatory environment.

Typically I’d caution against armchair analysis of an aviation incident until the final crash investigation report is in. However, given the availability of information on the design of the 737 Max, I think the engineering failures are clear even as the crash investigations continue.

Hazard analysis

The most glaring, obvious, and completely inexplicable design choice was a lack of redundancy in the MCAS sensor inputs. Gregory Travis blames “inexperience, hubris, or lack of cultural understanding” on the part of the software team. That certainly seems to be the case, but it’s nowhere near the whole story.

There’s a team whose job it is to understand how the various aspects of the system work together: systems engineering2. One essential job of the systems engineer is to understand all of the possible interactions among system components, how they interact under various conditions, and what happens if any part (or combination of parts) fails. That last part is addressed by hazard analysis techniques such as failure modes, effects, and criticality analysis (FMECA).

The details of risk management may vary among organizations, but the general principles are the same: (1) Identify hazards, (2) categorize by severity and probability, (3) mitigate/control risk as much as practical and to an acceptable level, (4) monitor for any issues. These techniques give the engineering team confidence that the system will be reasonably safe.

FAA Safety Risk Management Process flowchart and Risk Categorization Matrix table
FAA Safety Risk Management Process and Risk Categorization Matrix from FAA Order 8040.4B, Safety Risk Management Policy.

On its own, the angle of attack (AoA) sensor is an important but not critical component. The pilots can fly the plane without it, though stall-protection, automatic trim, and autopilot functions won’t work normally, increasing pilot workload. The interaction between the sensor and flight control augmentation system, MCAS in the case of the Max, can be critical. If MCAS uses incorrect AoA information from a faulty sensor, it can push the nose down and cause the plane to lose altitude. If this happens, the pilots must be able to diagnose the situation and respond appropriately. Thus the probability of a crash caused by an AoA failure can be notionally figured as follows:

P(AoA sensor failure) × P(system unable to recognize failure) × P(system unable to adapt to failure) × P(pilots unable to diagnose failure) × P(pilots unable to disable MCAS) × P(pilots unable to safely fly without MCAS)

AoA sensors can fail, but that shouldn’t be much of an issue because the plane has at least two of them and it’s pretty easy for the computers to notice a mismatch between them and also with other sources of attitude data such as inertial navigation systems. Except, of course, that the MCAS didn’t bother to cross-check; the probability of the Max failing to recognize and adapt to a potential AoA sensor failure was 100%. You can see where I’m going with this: the AoA sensor is a single point of failure with a direct path through the MCAS to the flight controls. Single point of failure and flight controls in the same sentence ought to give any engineer chills.

The next link in our failure chain is the pilots and their ability to recognize, diagnose, and respond to the issue. This implies proper training, procedures, and understanding of the system. From the news coverage, it seems that pilots were not provided sufficient information on the existence of MCAS and how to respond to its failure. Systems and human factors engineers, armed with a hazard analysis, should have known about and addressed this potential contributing factor to reduce the overall risk.

Finally, there’s the ability of the pilots to disable and fly without MCAS. The Ethiopian Airlines crew correctly diagnosed and responded to the issue but the aerodynamic forces apparently prevented them from manually correcting it. The ability to override those forces, plus the time it takes to correct the flight path, should have been part of the FMECA analysis.

I have no specific knowledge of the hazard analyses performed on the 737 Max. Based on recent events, it seems that the risk of this type of failure was severely underestimated or went unaddressed. Either one is equally poor systems engineering.

Cockpit human factors

An inaccurate hazard analysis, though inexcusable, could be an oversight. Compounding that, Boeing made a clear design decision in the cockpit controls which is hard to defend.

In previous 737 models, pilots could quickly override automatic trim control by yanking back on the yoke, similar to disabling cruise control in a car by hitting the brake. This is great human factors and it fit right in with Boeing’s cockpit philosophy of ensuring that the human was always in ultimate control. This function was removed in the Max.

As both the Lion Air and Ethiopian Airlines crew experienced, the aerodynamic forces being fed into the yoke are too strong for the human pilots to overcome. When MCAS directs the nose to go down, the nose goes down. Rather than simply control the airplane, Max pilots first have to disable the automated systems. Comparisons to HAL are not unwarranted.

In summary

Boeing is developing a fix for MCAS. It will include redundancy in AoA sensor inputs, not activating MCAS if the sensors disagree, MCAS activating only once per high-angle indication (i.e. not continuously activating after the pilots have given contrary commands), and limiting the feedback forces into the control yoke so that they aren’t stronger than the pilots. This functionality should have been part of the system to begin with.

Along with these fixes, Boeing is likely3 also re-conducting a complete hazard analysis of MCAS and other flight control systems. Boeing and the FAA should not clear the type until the hazards are completely understood, controlled, quantified, and deemed acceptable.

Many news stories frame the 737 Max crashes in terms of the market and regulatory pressures which resulted in the design. While I don’t disagree, these are not an excuse for the systems engineering failures. The 737 Max is a valuable case study for engineers of all types in any industry, and for systems engineers in high-risk industries in particular.

System lexicons and why your project needs one

A system lexicon is a simple tool which can have a big impact on the success of the system. It aligns terminology among technical teams, the customer, subcontractors, support personnel, and end users. This creates shared understanding and improves consistency. Read on to learn how to implement this powerful tool on your program.

Read More