Boeing’s Annus Horribilis

…continues with today’s failure of Starliner to get to the ISS. Stories from Eric Berger , Marina Koren, and Tim Fernholz.

The question is whether or not they’ll insist on another uncrewed test flight before putting in crew. Given the cost of an Atlas V, I’d be inclined to say no, given that this failure wouldn’t have resulted in loss of crew. Worst case for a crewed flight would be failure to dock, which would just mean a mission abort back to White Sands. But as we all know, SAFETY IS THE HIGHEST PRIORITY.

[Late-morning update]

Loren Grush has another take.

[Update mid-afternoon]

Here’s a story from Emilee Speck in Orlando.

47 thoughts on “Boeing’s Annus Horribilis”

  1. But under the terms of CCtCap, wouldn’t Boeing be on the hook for the extra Atlas V, rather than NASA?

    (I know, I know…stop laughing.)

  2. “ The source of the problem was a glitch in the Starliner’s internal clock, which caused the vehicle to register a different time than it actually was. ”

    Damn Daylight Saving Time strikes again!

    1. I actually had to fix the Linux time files for DST USA rules for an embedded micro-controller thanks to an act of Congress back in 2007!

      P.I.T.A.

  3. I say they should salvage what they can from the mission. They’ve got a capsule full of ISS supplies that will be returning to Earth, and space food that’s actually flown into space is bound to have some serious resale value on Ebay.

    I have no idea if they’ll attempt a crew for the next mission because having a failure due to a software bug always raises the issue of software quality control, review, and the like. How many other bugs might be lurking in the code is typically an open-ended question.

  4. Daylight Saving Time. Aka “clock bothering”, sums it up perfectly. Thank God we don’t have it in Queensland, Australia.

    Seriously, if you can’t get the clock right, what else is wrong?

    1. Well, clock calculation problems raise a few obvious issues about design and testing, but it also raises an issue regarding the approach to the controls system, or perhaps the software side of mission architecture.

      Taking it as a given that there’s a tight-maneuver, precise control mode for use near the ISS, what enables that mode? Obviously Boeing allowed a simple mission clock to set it, yet they knew they’re not going to fly anywhere near the ISS unless they’ve had ongoing ground communications over the previous several hours.

      A human has to give permission for the approach, so their software should have required a signal from either a crewman or ground control before enabling the special maneuvering mode that consumed so much fuel.

      You could argue that letting a clock function do what it did would be akin to a bit of automated airliner code that says:

      If (ETA – current_local_time is less than 3 minutes) then
      lower landing gear and flaps

      1. You mean, of course, aluminium*
        Thanks for the link. Learned something new about Sir Humphry:

        Sir Humphry Davy (who, you may recall, “abominated gravy, and lived in the odium of having discovered sodium”)

        *WordPress spell-checker can bugger off.

  5. I really think we need to trust Boeing on this. After all, they have a proven commitment to quality, unlike those crazy pirates down in Hawthorne. Those Space Ex newbies don’t spend nearly enough money to be trusted.

  6. If they don’t get the Starliner back intact, it may open a real can of worms with respect to FAA/AST and the “learning period.” There are undoubtedly people out there who will say that this was an incident that could have resulted in the deaths of crew members, and didn’t only because none was on board – this time. “Therefore” the time has come to issue a regulation covering the safety of spaceflight participants. With the House in Democrat hands, and most of the best space Members gone, I wouldn’t put it past them to try to amend the CSLAA yet again…

    1. The counterargument would be that if a crew was on board they’d have simply turned off the MCAS system.

      *looks around furtively*

  7. I’m waiting for someone to remind everyone that we learn more from failure than success and that this isn’t really a setback.

    1. I don’t in fact think it’s that much of a setback. They learned something from it, crew wouldn’t have been lost, and they should go ahead with the next crew mission on it once they understand what went wrong and how to prevent it.

        1. Deriving cause from effect isn’t easy. See also Wayne Hale’s tweet from the article:

          https://twitter.com/waynehale/status/1208040666460241920

          Those of old enough remember know the subconscious mental angst of T-31.

          It’s not just software that suffers this problem. Hardware can too. And when its part of a complex custom integrated circuit, it can cost a fortune to fix if found late, since broken parts have to be replaced. Back in the day (1982ish) when I was at DEC/SEG I developed a simulation device known as a ‘trick box’, to go along with CPU’s I-Box, M-Box, E-Box, etc. The trick box was not to be implemented in silicon, but was used to provide outside asynchronous stimulus for parts of the design that were typically not simulated but could occur in real life. Like dynamic RAM read or write stalls due to refresh cycles. It also generated “fake” interrupts, just to cause enough commotion to see if it lead to confusion. Invariably it did. Saved DEC a fortune and later Alliant too. Using pseudo-random number generators with specific seeds (for repeatability) is now part and parcel of semiconductor design verification. But back in 1982 I got the idea from a book I had read in college back in the 70’s. No it wasn’t a text book. It was “The Adolescence of P1”. I stole the same mechanism P1 used for cracking System-360 storage protect. So I defer to that author and his sources for the original idea.

          1. I had a press pass to STS-1 and remember the incident well. In the most common news footage of the launch I can be seen standing just to the left of the countdown sign, watching through binoculars. Immediately to my left, leaping up and down and shouting is, I think, the late space artist Daniel Gauthier. Somewhere around here I’ve got a pocket notebook recording my subjective impressions of he days and hours leading up to the launch. Jerry Pournelle was there too, but I didn’t know him yet.

          2. There was this 20-something skinny kid who, along with 2 friends, were hanging out at the overpass of 405 and US1 after having arrived in a traffic jam at 5:00 AM and ditching their shared Renault ‘Le Car’ (driven continuously about 3 days earlier from New England) about a half mile south on US1. Binoculars as well and needed at that distance. Then the big nothing. Followed by a day trip to Disney World, then a trip back to the Cape, an overnight camp out along the grassy interchange as said same location. That following morning was not disappointing.

          3. I’ll just point out that the span from the first Shuttle launch (which Rush sampled) to the last Shuttle launch is as long as the span from the death of Glenn Miller in 1944 to the formation of Iron Maiden and Motorhead in 1975.

          4. I drove down from New England in a Datsun B210 pickup. It was parked in the lot at the press site and had some pitting after the launch, making me wonder what I may have inhaled. I camped out at the press site that night, so I could be sure to be there in the morning. At one point, somebody, possibly Miles O’Brien, nudged me awake so I would quit snoring long enough for him to do a stand-up.

  8. If this was a NASA spacecraft, there’s precedent that they’d clear it to fly with a crew. When the unmanned Apollo 6 Saturn V mission experienced severe pogoing, they fixed the problem and flew a crew on Apollo 8. Likewise, when the last Shuttle Enterprise glide test experienced pilot induced oscillations, they fixed the problem, skipped any additional glide tests, and launched a crew on STS-1.

    Since this isn’t a NASA capsule, they may require another unmanned test flight before clearing it for crew. I tend to doubt it unless the investigation shows other problems or there’s a problem with the reentry and landing.

  9. Saying they don’t yet know what caused it, and blaming the clock sounds awfully convienient to me. I hope they get it back in one piece so they can find out what did cause it. And getting a busted ship back is what impresses me.

    1. I assume it was a fairly easy determination because a flight-mode kicked in that wasn’t supposed to kick in, and they just looked at what set the bit. But of course I’m speculating. They will still get a good orbital test of maneuvering and other systems, and then a re-entry and landing test, so all they’re really losing is a test of approach and docking. In theory, that phase will work right or it won’t whether or not a crew is on board, and even with a crew, they’ll have some time to work their way through any problems before they have to give up and return for landing.

      So unless they encounter further issues, I’m with Rand in thinking they’ll likely put a crew on the next one.

  10. I think it is fair to ask what else might be wrong in the software.
    I’m good with pronouncing and spelling Element 13 either way. I’ve been to the U.S. enough to have started on learning American 101.
    Also I know not to lift the “bonnet” to look at the engine in a car nor to put my bags in the “boot”. The “tailplane” is the horizontal stabilizer and the “fin” is the vertical stabilizer.
    As I said to a friend when we were in Wyoming in 1996 when buying a motorglider, the language is similar enough that you can be fooled into thinking you can understand the locals and they can understand you. That was the trip where I walked into a sporting goods store in Billings to buy a pair of hiking boots. When I opened my mouth in response to the sales assistant’s “Can I help you , Sir?” the next I got was “you aren’t from around here, are ya?” in a very suspicious manner. When he found out I was from Australia he became quite friendly and assured me his attitude was only because he mistook me for someone from “back East”(That has happened a couple of times) and these they were getting quite a few of “those people” moving into town, but assured me that the the City fathers had the problem in hand.
    I can also drive either side of the road.

    1. We can all drive on either side of the road. We just choose not to for health reasons. ^_^

      Space.com has an update with slightly more information. They think the clock problem might have kept an antenna from pointing at a TDRS relay satellite, too.

      They’re going to try and land it tomorrow morning.

  11. As Jim Bridenstein pointed out, having the vehicle outside of its normal planned regime also yields tons of important data on the vehicle, the ground-to-spacecraft systems and even the team response to anomalies. It’s the expensive way to go, obviously its cheaper and easier to repeat if the anomalies can be simulated on the ground. But real data also means real value. Keep flying.

  12. To me, the malfunction shortly after vehicle sep, due to the mission elapsed timer taking the wrong thing from the LV memory (I’m going to bet they hard-coded that location) is a huge red flag. There obliviously is not verification, nor is there fault tolerance. To me, this indicates major issues with both their software verification testing, and design. SpaceX got zinged hard by NASA for having insufficiently robust software for the first cargo Dragon mission, and rightfully so IMHO, but what we are seeing now appears to be orders of magnitude worse from Boeing.

    What bothers me most, though, like with the Starliner pad abort test, is you had NASA indicating approval of the results *before* fully understanding the root causes. Don’t know means don’t know, so why on earth were they trying to spin this before they knew what had happened? That’s malfeasance, at best.

    Ah well, at least they remembered to put the pins in the parachutes this time.

    1. …is a huge red flag. There obliviously is not verification, nor is there fault tolerance. To me, this indicates major issues with both their software verification testing, and design. SpaceX got zinged hard by NASA for having insufficiently robust software for the first cargo Dragon mission, and rightfully so IMHO, but what we are seeing now appears to be orders of magnitude worse from Boeing.

      Lest my previous posts become mistaken for gung ho for Boeing, I agree with all you have said and Mike Borgelt’s point as well. You don’t know what you *should* know when you failed to look. There is also the “horizon” effect. Finding that bug that lies *just* beyond the limit of simulation. The idea being that no catastrophic bugs can present past that limit. But to be robust enough to allow for the assumption that there will be bugs that show up that were past the limit but can be mitigated with the procedures at hand and fixed afterwards. I think the 1201 and 1202 alarms that occurred on Apollo 11 during lunar descent were a good case in point of this.

  13. BBC.com story

    Boeing has fired its chief executive, Dennis Muilenburg, in a bid to restore confidence in the firm after two deadly crashes involving its 737 Max plane.

Comments are closed.