Who watches the watchmen on software testing? SpaceX’s control issue might have been found with better testing, but the test case writer didn’t start with a big enough perturbation for the problem to appear. It’s also not clear that the tester software is sufficiently good to tease out problems with the control software. That’s especially true if the same people are writing the control software and the tester software.
The rest of the entry reads like technobabble from a movie like Failsafe. Nevertheless, this is the $64 billion question that can make SpaceX another of Musk’s successes or ground his Mars colonization plans altogether.
There are ways to manage to get high fidelity to desired specifications. One is to have independent testing from designers. Another is to have test plans that are vetted by a second independent verifier. A third is to have multiple independent testers.
Testers can boil the ocean seeking test scenarios. Tests need to hit all regimes that are likely to be encountered, but need to do so economically. A good choice is a fractional factorial design that tests all the regimes for each variable, but not every variable cross every other variable. Deciding what needs to be tested is as important as passing the tests chosen.
It’s still a problem if testers are testing the wrong model. If the control software and the test software both have the same error, then there will be a false negative in testing even if every possible scenario is tested.
One thing to do is test the testers by introducing errors into the design on purpose and seeing if the testers can find them. This can give a hint about how many unknown errors there are depending on how many known errors are not found through testing.
This is what I do in my day job at Optimal Auctions for our auction software that has been used to buy and sell over $100 billion in cost of goods sold.
I asked SpaceX this question when I toured SpaceX before their first two launches. I expressed confidence that they were getting this better after their first launch.
I don’t see much change in culture with the release of their latest flight review that Rand noted today. Their current culture and methodology may be enough to get them to orbit. With only 8 anomalies they detected with only one fatal, they are in good shape. Actually flying hardware (or in my case holding an auction) can give additional confidence that the test plan accurately models the flight hardware. If they do succeed, flawless results are great for their business but they create new problems; they can also reduce vigilance by the testers.
For testing success, remember Andy Grove’s dictum: “Only the paranoid survive.”