@Ken: I see a lot of analysis going on including predictive. The results so far seem to bear this out. There is never going to be enough analysis to satisfy everyone.
Right now their record stands at 3 losses out of 7 launches across their product line. That’s not enough “analysis” to satisfy me, at any rate. And as I tried to emphasize above, it’s not just analysis. Testing, analysis, simulation, and design all have to be involved, and all have to be informed by an understanding of launch vehicles — how they succeed, and how they fail.
But yes, they’ve lowered costs. And they’ve done a wonderful job meeting schedule. Maybe NASA could learn something from them….
Right now their record stands at 3 losses out of 7 launches across their product line.
I get very tired of the disingenuousness of this kind of statement. I expect it from morons like “abreakingwind” or Constellation partisans, but I expect better from you. It implies that the failures were random, and that the reliability of (say) the Falcon 9 is only 4/7ths, when the losses were early, on test flights, and the last four flights have been consecutive successes. Not to mention that Falcon 9 is a different “product” than Falcon 1, and it is two for two.
@Larry: Errors happen. We have to deal with them as best we can.
If your point is that the imperfectibility of software is an excuse not to exercise effective software control, then I demur.
You’re not talking about effective software control. You’re thinking that perfection in software is possible.
“Testing can establish the presence of errors, but cannot guarantee their absence.” [E. W. Dijkstra]
Those of us who have worked as professional programmers know this to be true. It’s a fundamental, unescapable reality of life.
@Larry: Using the “fly it and see what breaks” approach, Lockheed’s Skunk Works went from blank paper to flying the Blackbird (A-12 Oxcart version) in about 3 years.
Using the “let’s analyze it to death” approach, the F-22 took over 20 years to become operational and that’s with massive overruns and other associated problems.
I’m not sure you’re stating this correctly. The draft USAF requirements for the ATF were issued in 1981. There was a study phase leading to a downselect and a Dem/Val phase starting in 1986, so we were ‘cutting metal’ by the mid-1980s. The YF’s were flying in summer 1990. IOC was delayed as the Clinton administration dragged their feet in the post-Cold-War environment. It was only the advent of the Bush (43) administration that got us to IOC. I think it’s foolish to blame this on “analyzing it to death”. Given the extraordinary capabilities of the F-22, I think you’d have to say the design effort was also extraordinary — in fact the F-22 team was awarded the Collier Trophy in 2006. (And I’d have to say SpaceX is a worthy contender for the Trophy, too, despite all my grousing…)
But I don’t want to take anything away from Kelly Johnson (in fact, he and his team won the Collier Trophy in 1963 for the SR-71). It’s a legendary aircraft. Dangerous as hell (12 out of 32 lost in accidents), leaked JP-7 like a sieve, but nothing could touch it….
But note the A-12 and SR-71 were two different aircraft, the latter a significant mod of the former — longer, heavier, slower, but more capable. The A-12 was about five years in development (1957-62) and was cancelled before IOC, in December 1966, mainly due to cost overruns. The A-12 did fly some missions in 1967 and 1968, and participated in a fly-off against the SR-71 in 1967. The SR-71, meanwhile, entered service in 1964 (7 years, not 3, after development of the A-12 began). I don’t know how up-to-date the FAS website is, but it is still showing three operational SR-71s assigned to Edwards AFB.
@Larry: You’re not talking about effective software control. You’re thinking that perfection in software is possible.
Where did I say that?
What I did say is that there are effective modern practices of software specification, development, verification, and control that minimize unreliability. And if you don’t follow these practices, and you screw up and upload the wrong flight trajectory, it’s your fault.
Those of us who have worked as professional programmers know this to be true. It’s a fundamental, unescapable reality of life.
Is that why my copy of Firefox crashes every few days?
@Rand: I get very tired of the disingenuousness of this kind of statement. I expect it from morons like “abreakingwind” or Constellation partisans, but I expect better from you. It implies that the failures were random, and that the reliability of (say) the Falcon 9 is only 4/7ths, when the losses were early, on test flights, and the last four flights have been consecutive successes. Not to mention that Falcon 9 is a different “product” than Falcon 1, and it is two for two.
Everyone who has made it this far in the thread understands that the numbers are (0,0,0,1,1) for Falcon 1 and (1,1) for Falcon 9. We’re just discussing what that means. I look at the Falcon 1 failures and I see a pattern of not just design flaws, but flaws in the process of designing the vehicle. I would argue that it is you who is insisting that the failures have no systemic meaning — that they are just weird (“random”?) events. However, I won’t back away from using randomness as a model because that’s what reliability estimates mean. We don’t get to restart the clock on Shuttle failure rate just because we’ve fixed the O-ring problem. We don’t get to restart the mishap rate on F-16s with every block upgrade. It doesn’t work like that; that’s not what the numbers mean.
If there is a systemic problem with SpaceX’s design practice, it is perfectly legitimate to lump these two vehicles together. Indeed, you seem to want to do this, citing “four successes in a row” — even though in the next sentence you say Falcon 1 and Falcon 9 are two different beasts entirely. But if you just take the four successes and ignore the three failures, you’re cooking the data.
And you seem to want to read a lot into the (1,1) record of the Falcon 9. And I have pointed out that this is either disingenuous or thoughtless.
Suppose I have a coin that I know to be unfair. In fact, I suspect that it is much more likely to come up heads [launch success] than tails [launch failure]. If I flip the coin twice and get two consecutive heads, what is my 90% confidence interval on the probability of tails [failure] ? I’ll tell you: the 90% confidence interval is that pfail falls between 2.55% and 77.6%. The 50% confidence value is that pfail is 29.3%. I’m sorry you don’t like the numbers, but those are the facts. Two successes in a row just doesn’t mean that much. I mean, it’s better than three failures in a row, as Falcon 1 started, and it’s better than the single success of Ares I-X, but it absolutely does not “prove” the system is reliable. Like I said above, call me when Falcon 9 reaches 7 successes with no failures. And I will buy you a Diet Coke when they reach 34 successes with no failures.
My “four successes” was in the context of your lumping them all together. I do consider Falcon 9 a different vehicle. Is it your contention that two successes with no failures tells us no more about the reliability than would have two failures with no successes?
I would contend that we now know that the design has been validated, in the sense that there are no fundamental flaws with it that prevent it from flying as designed when manufactured and operated to specification. That is all that I’m reading into its record, and I think it’s a fair reading. I have assigned no number to its reliability.
Once you’ve got your rocket flying right, isn’t the biggest part of keeping it that way eliminating faulty design changes, changes in procedure, subsequent errors in assembly, and problems caused by changes in the environment?
Using the Shuttle disasters as examples, if it wasn’t it the factors I mention, wouldn’t there have been no Shuttles lost?
I posted the above without seeing Rand’s 1:57 comment which says the same thing
Is it your contention that two successes with no failures tells us no more about the reliability than would have two failures with no successes?
No, why would you think that?
I told you what it means: it means the 90% confidence interval on the failure probability is (2.55%, 77.6%). If Falcon 9 had had one success and one failure, then the 90% confidence interval would have been (22.4%, 97.5%), which is considerably worse.
I would contend that we now know that the design has been validated, in the sense that there are no fundamental flaws with it that prevent it from flying as designed when manufactured and operated to specification.
Flying is better than not flying, no argument there.
It is certainly the case, however, that the functionality of the system when operating at specification limits has not been validated by these flight tests. At best we see two cases that are *probably* (in a quantifiable sense) within a standard deviation of nominal conditions. It remains to be seen how the system performs over time. As any vehicle accumulates launch experiences, the experience band widens in a predictable way. On a particular day in 1986, Challenger discovered a hard limit in launch temperature. I wish SpaceX luck. And I hope they fix the systemic problems with their design process.
Also, specific claims have been made about the reliability of the SpaceX launch vehicles. These claims are not verifiable in two test launches. Getting back to the trigger for this thread, Ric Locke made claims about the superiority of the Soyuz design practice. It seems entirely proper that those claims be explored rationally, in particular claims about the reliability of launch vehicles developed according to different “design cultures”.
“Is that why my copy of Firefox crashes every few days?”
The snarky answer to this is: No, that’s Adobe’s fault. 😀
SpaceX’s solution to “My vendors are crap” is to design and make the key components themselves.
their record stands at 3 losses out of 7 launches
Without further info is a major mischaracterization. When they replaced all the aluminum nuts with steel; that alone made all the following F1 and F9 essentially a new vehicle. When they changed the software to allow for the residual thrust at separation, again it is a new vehicle from that point forward. This will be true in the future as well with any changes they make.
Three losses followed by four success would be a better representation of the facts.
SpaceX itself is a different company. 100 employees in the early years verses 1200 and growing now. Some of those new employees bring aboard some of the institutional knowledge you demand. Growing a company like that is not something anybody can do.
The main thing SpaceX has done is made a liar out of all those that said it can’t be done. Every company has to mature. You might want to see if they’re hiring consultants.
Three losses where they learned things would be even better.
@Ken: again it is a new vehicle from that point forward.
By your reasoning, then, if SpaceX lost one vehicle, say, every ten flights, then as long as they made the appropriate engineering fixes afterwards they could could continue to claim a pristine 100% launch reliability?
In your mind, does Shuttle have a 100% launch reliability? I’m just trying to understand your reasoning here.
I’m just trying to understand your reasoning here.
I’m having a Joe Wilson moment here…
My point is it’s not just about statistics. A point others have made as well and you are too intelligent not to get that.
Proof is in the pudding they say. I expect they will have failures in the future… so does Elon. But you seem to be looking at a few issues and ignoring a great deal more. You don’t know the expertise they are applying, yet are willing to say what they aren’t. You write about reasonable things and then apply them in an unreasonable way.
That’s not all bad but it is presumptuous.
@Ken: My point is it’s not just about statistics.
Well, as someone who sometimes gets paid to advise on statistical matters, I’m biased towards quantitative reasoning. But here are some points:
I’ve already said and will willingly repeat, every successful launch justifies popping the champagne corks. You are right that it’s not just about nudging the stats one way or another.
I’ll also repeat that two successes is better than two failures.
But I will also maintain that two successes is pretty weak evidence if you’re trying to assert that reliability is demonstrated.
OTOH, let me restate what I think Rand was trying to say: the two successes demonstrate capability quite clearly, even if reliability is still in question. And that is admirable, and I would applaud if SpaceX received the Collier Trophy this year.
My presumptuousness in this thread was prompted by Ric’s gleeful (and, I contend, ignorant) assertion that the Soyuz was “the most reliable launch vehicle in the world”. Some people have taken umbrage that I opted to challenge this assertion and its application to SpaceX, and to explore the meaning of SpaceX’s record, and whether it represents a compromise between reliability and cost.
And yes, I am inferring some problems in their design process based on scant data, because they have had very few launches and several highly public, avoidable failures. In turn the defense of SpaceX seems to amount to (1) no, their design process was not defective, (2) they’ve fixed it anyway, and (3) their two successes with Falcon 9 erase any doubts.
This is the aerospace equivalent of the “broken tool” defense, a famous legal paradigm. When defending a client accused of borrowing a tool and returning it broken, lawyers use a three-layered defense: (1) the tool was never borrowed, (2) the tool was broken when the defendant took possession of it, and (3) the tool was returned in perfect condition.
I’d say that the evidence is that SpaceX has a system that will get to space more cheaply than has been done in the past, and that at this stage there’s no reason to claim that they won’t be able to follow the same trend in improving reliability as their operating experience increases that others have previously achieved.
their two successes with Falcon 9 erase any doubts
Who is saying that? Your critique is reasonable. It’s just the assertion without sufficient evidence that not just I question. For example, you really don’t know why they chose not to include baffles originally. They were wrong and fixed it. You can’t assume what consideration they gave before testing.
Reliability depends on a number of things. Quality control during manufacturing a major component. Assuming you get things right in the design you also have to keep them right. Here your statistical controls make sense along with the level of inspection they’ve exhibited.
Development is not so clear cut. People disagree. Testing is the only way to settle issues. I’ve worked for companies where problems persisted because they didn’t take the SpaceX approach. They will not only get results their way, it will be a safer vehicle for humans because they put more emphasis on real world testing than analysis which can hide a lot of faulty assumptions. Testing is just one part of assuring reliability, but it’s the one you can have the most confidence in.
Using aluminum nuts caused a failure. That failure mode no longer exists. It then becomes invalid to lump that failure statistically with a rocket that no longer has that failure mode. That’s why development is considered apart from operations. Sometimes they overlap. They will be making further changes and operating as well. These are risks that everybody takes. Hopefully it’s done with eyes fully open. Frankly, I’d trust SpaceX much more than many others. With 20/20 hindsight you say their mistakes were avoidable. I’d say they learned and matured and will continue to mature.
I think you have a valid point about avoiding mistakes, but I think SpaceX is very much on the right track.
@Ken: Who is saying that?
I think several people in this thread have come close to that. But what I was referring to was a press conference Elon Musk gave after one of the Falcon 9 flights (not sure which one). The reporter on hand invited Musk to say whether their launch success would silence critics. So, okay, there’s an air gap between “silence critics” and “erase doubt”. And to Elon’s credit, he didn’t take the bait. But it doesn’t seem to deter people who should know better from crediting SpaceX’s sparse and troubled flight experience with more than is warranted.
And I should reiterate that SpaceX fans like to say “four successes in a row”, but that ignores SpaceX’s failures to recover boosters. Yes, I understand they say this is not a flight objective, but to my eye this still falls in the pattern of not learning from previous launch vehicle experience. And that’s a difference between your point of view and mine: you see all these failures and shortcomings as unrelated and in that sense “fixable” and ephemeral, while I see them as part of a larger pattern that I don’t see changing.
That failure mode no longer exists. It then becomes invalid to lump that failure statistically with a rocket that no longer has that failure mode.
Well, no, that is not the way it works. I asked rhetorically for you to clarify but now I will just say that this is not the meaning of reliability. You don’t get to restart the clock just because you’ve made an engineering change. Do you really think Shuttle is 100% reliable because the O-ring and foam issues have been addressed? Perhaps you are just one of those people who doesn’t believe in “reliability”, just on general principles. That’s a lifestyle, too, I guess.
Perhaps the most important reason we don’t reset the clock with every engineering change is that the temptation to cook the books is so strong. It is very difficult to come up with objective criteria for trimming data out of a reliability analysis. Of course, partisans can invent any number of subjective reasons for trimming failures to make the statistics look better, but all that gives you is meaningless garbage.
Another reason we don’t restart the clock and throw away early failures is that it makes it difficult to quantify the reliability growth of a system. More on this below.
Several folks have tried to make this argument, that the SpaceX failures don’t mean anything in particular and don’t have anything to do with the system reliability. Just as life is what happens while you’re waiting for something better to come along, unreliability is what happens while you’re perfecting your system. Soyuz, the system that triggered my part of this conversation, is up in the 1000s in terms of launch numbers, and is still undergoing engineering changes (I think), and is still seeing about a 2-3% loss rate. Now, the loss rate seems to have improved over the years, but it doesn’t seem to be headed under 1% anytime soon. OTOH, there are some empirical laws that suggest that with sustained engineering effort invested in a reliability growth program, the reliability will continue to grow. Try googling “Duane plot”. It’s too soon to make a Duane plot for SpaceX, but now that I’m curious, I will compute one for Soyuz.
In fact, if you go to the NIST site which is the first to come up when you google “Duane plot”, you will see an example of tracking improvements in reliability with a Duane plot. NOTE in particular that we do not reset the clock with every engineering change. And note how the reliability improves with time.
bbbeard, rereading through this thread I’m left wondering just what your point is outside of that some commentators in the media make claims and put questions in ways to try to create media stories.
You’ve utterly littered your comments with strawmen. Reread your first paragraph in the above comment and tell me what the point of it was.
Everyone here knows that a high failure rate over the first few launches is nothing unusual.
You’re trying to argue that SpaceX got too much wrong in the first few flights, but hell, as an organisation that’s new to the launch business most people would’ve been surprised if they hadn’t had failures.
At this point I’m wondering, if they had achieved a 100% success rate, you’d be telling us all that this bode poorly for their future as all rockets will sometimes fail, and by not having the opportunity to learn from failures they’d get too cocky and this arrogance would cost lives in the future.
I think you find fault because you want to find fault.
You don’t get to restart the clock just because you’ve made an engineering change.
What an absurd statement. You’re saying their is no difference in reliability between any of the vehicles. That ignores reality. While you can put a number on the all seven attempts, it’s a meaningless number if you ignore what they’ve learned and changed.
That’s like saying on average I weigh a hundred pounds because at one time I was a five pound baby. Totally meaningless.
@Andrew: You’ve utterly littered your comments with strawmen. Reread your first paragraph in the above comment and tell me what the point of it was.
Ken asked “Who is saying that?” in response to my characterization “(3) their two successes with Falcon 9 erase any doubts” and I was answering the “who” part.
My substantive criticism on this point (for which (3) was a proxy) is that many people is this thread have pointed to the two Falcon 9 successes as though that were a counterargument to concerns about the overall reliability of SpaceX vehicles. If you don’t understand that two successes in two tries is very weak evidence, after all the different explanations I have presented, including the computed confidence intervals, I suspect you won’t respond to rephrasings of that argument. What I would suggest is that you return to my posts where I discuss the confidence intervals for the (0,2) sampling plan and see if you can understand the point.
You’re trying to argue that SpaceX got too much wrong in the first few flights, but hell, as an organisation that’s new to the launch business most people would’ve been surprised if they hadn’t had failures.
You’re missing the point. The point is not that they’ve gotten “too much wrong”. It’s not even that three failures in a row bodes ill for the future. It’s what they’ve gotten wrong. They’ve gotten things wrong for which common risk mitigations are well known across the industry. That’s what bodes ill for their future.
if they had achieved a 100% success rate, you’d be telling us all that this bode poorly for their future as all rockets will sometimes fail, and by not having the opportunity to learn from failures they’d get too cocky and this arrogance would cost lives in the future.
Okay, now you’re creating straw men (“… what bounces off me sticks to you” as the childhood playground taunt goes). But the one grain of truth in that straw man is that I would be more impressed if they had found new ways to fail launches. Then we could all learn from their failures. Instead we sit and watch the launch from the sidelines and afterwards say, “gosh, didn’t we lose rockets in ’68 and ’72 because of X and then start doing Y? Why didn’t they do Y?”
I think you find fault because you want to find fault.
There is a fine line between a troubleshooter and a troublemaker, isn’t there? 😉
@ken: What an absurd statement. You’re saying the[re] is no difference in reliability between any of the vehicles. That ignores reality. While you can put a number on the all seven attempts, it’s a meaningless number if you ignore what they’ve learned and changed.
No, no, no — I’m not saying there is no difference in reliability between the vehicles. The estimated reliability is growing with every successful launch. That’s what the downward slope of the SpaceX Learning Curve means. The problem is that you don’t like the numbers. I am saying you don’t get to reset the clock after every test-fix cycle and continue to claim a 100% success rate because that sounds better. To me this is a really obvious point. It’s simply not how engineers do reliability growth management (RGM). If you don’t believe me, consult the DoD RAM Guide. Look at Appendix C, in particular the section of the guide dealing with test-fix-test RGM plans. I should also point out we usually use a Duane plot to quantify reliability growth, but until SpaceX loses another vehicle, their Duane plot is singular.
I don’t know what your industry experience is, or what your particular expertise is, but your attitude exemplifies the know-nothingness that I’m concerned about at SpaceX. You mean well, you formulate intelligent arguments, you sound well-educated, but you just can’t seem to accept that thousands of really smart guys — heck, call ’em rocket scientists — have been working launch vehicles for fifty years and have learned a lot about how to do things. One of those things is reliability modeling. I’m telling you how things are done, what they mean, and you respond that you would like to reinvent the wheel because it makes more sense to you to do it your way. I imagine (yes, I have to imagine, because SpaceX has not been responsive to inquiry) that similar folks sat around a conference table at SpaceX discussing why you put baffles in propellant tanks — and succumbed to arguments that software would be much lighter than baffles. And the problem is not so much that they made a bad decision, it’s that their corporate culture was defective. Is it better now? You can say that, but I don’t see much evidence that it is.
The problem is that you don’t like the numbers.
I’m fine with the numbers. I just think reliability is only meaningful in operations. During development you’ve got too many variables to have a meaningful number. That final number should be predictive, but it really is not because you have a combination of too many variables and not enough cases.
Do you really expect they will lose 42% of all future launches? I’m not ignoring the downward curve. I’m saying 42% is either meaningful or it’s not. Once you get to operations you have enough randomness that statistical methods becoming meaningful. Otherwise, it’s just playing with numbers in a non random environment.
I’d love to play a roulette wheel that had manufacturing flaws that made it not uniformly random.
Too many variables isn’t really the proper way to say it. Actually, more random variables help statistically. I suppose it’s a question of, is the sample representative? The main problem is we have too few cases to say.
@Ken: I just think reliability is only meaningful in operations.
Sounds like a novel theory. Do you have a cite for that?
If what you’re saying reflects the design philosophy of SpaceX, it’s no wonder they have reliability problems. 😉
Do you really expect they will lose 42% of all future launches? I’m not ignoring the downward curve. I’m saying 42% is either meaningful or it’s not.
No. I don’t think you understand what the 43% means.
Referring back to the discussion of reliability growth management in the DoD RAM Guide, in a test-fix-test cycle, you expect the reliability to continue to grow. So I don’t expect the reliability to remain at 43%. That’s just where it is right now; 43% is the “maximum likelihood estimate” of the failure probability, right now. In ten launches it will be better (unless they stop improving their vehicles), so no one I know would predict that the failure rate will remain 43% forever. In fact it’s probably better than the maximum likelihood estimate right now, it’s just that we don’t have a robust way to quantify that. (But that doesn’t make the 43% meaningless, it just means its a biased estimator). Once we have enough data, we can set aside the maximum likelihood estimator and make a better estimate based on the power-law improvement of reliability.
The main problem is we have too few cases to say.
I think I’ve said that.
“Sounds like a novel theory. Do you have a cite for that?”
One method they use to develop engines is to run to destruction.
I think there’s a good argument that failure rates should exclude failures during development ie. prior to commissioning.
In ten launches it will be better (unless they stop improving their vehicles)
This is exactly my argument. Assume they do stop improving their vehicle. I’m asserting the vehicle today if launched ten times would not fail 43% of the time. So rather than just being a point on a declining scale, it is in fact not meaningful because it does not actually describe the “maximum likelihood estimate” but is in fact higher. With no improvements or changes at all the true failure rate may actually be say 20% or less. I would agree that the numbers you produce would tend toward an accurate number, but that isn’t saying they are truly meaningful as they are.
Most people weigh less than a ton. They also weigh less than a megaton. Are those very meaningful statements?
@Ken: So rather than just being a point on a declining scale, it is in fact not meaningful because it does not actually describe the “maximum likelihood estimate” but is in fact higher.
Perhaps you should brush up on your statistics before you post more. “Maximum likelihood estimate” has a specific technical meaning of which you appear to be unaware. See http://en.wikipedia.org/wiki/Maximum_likelihood, in particular the section “Discrete distribution, continuous parameter space”.
I am aware that M.L.E. is a technical term with a specific meaning. The problem with statistics is it allows some to avoid common sense. You said…
In ten launches it will be better (unless they stop improving their vehicles)
I offered you some common sense regarding unless. Would you like to address that rather than looking down your nose?
@ken: I am aware that M.L.E. is a technical term with a specific meaning.
you said/I said:
…it is in fact not meaningful because it does not actually describe the “maximum likelihood estimate” but is in fact higher.
to which I replied to the effect that you don’t seem to understand the meaning of the term. Sorry if that sounded nasal-superior to you but this is not a democratic issue. If you want to rephrase your comment I’d be more than happy to let go of it.
Now, having said that, the MLE is definitely not the only estimator. If you want to offer another estimator besides your calibrated gut feeling about SpaceX I’d be glad to discuss it but right now I see nothing quantitative forthcoming.
The problem with statistics is it allows some to avoid common sense.
The problem with lack of statistics is that it enables all sorts of nonsense.
I offered you some common sense regarding unless. Would you like to address that rather than looking down your nose?
Well, my common sense regarding your proposal (about SpaceX doing the next ten flights without any further development) is untestable. I don’t see what there is to comment on.
You’re right that I wasn’t speaking precisely. But the thing about precision is it doesn’t mean accurate. Any type of math tends to be precise because that’s the nature of numbers. But a wrong assumption leads to inaccuracy no matter how many decimal places you calculate to.
Is SpaceX success 4/7 or 2/2? Which better represents the future?
You are right that we need to wait and find out. Place yer bets…
Okay, here’s a completely gut feel bet: I bet that SpaceX will see at least one more failure in the next ten flights (of Falcon 1/9/27, or whatever), even with improvements being made continuously. Stakes? the usual: one 16 oz bottle of Coca-Cola, winner’s choice as to subspecies (I like Cherry Zero). Only for Ken, everybody! Because he has stuck it out this far…. Willing to take this bet?
Dammit, serves me right I guess for only being third most dogged kid on the block.
Elevens a winner for one soda? Yer on.
Failure defined as not getting payload to orbit. Incidentals don’t count.
Failure defined as not getting payload to orbit. Incidentals don’t count.
Agreed. I’m assuming Rand will be around to referee and put us in contact.
I’m inclined to rate Falcon9 at 3/4 reliability at this time. Falcon9 has had 2 launches out of 2. Falcon1 had early failures, the causes of which have been fixed, making them unrepresentative of current vehicles. Then 2 successes. Taking the commonality, such that the successes of Falcon1 before Falcon9 apply somewhat to the latter, count 3 successes for Falcon9. Then since we have such a small sample throw in a failure for the unknown. I expect this rating to improve with more flights.
In any rocket I’m inclined to disregard the first 20% of launches for assessing reliability once the sample set gets large enough. I presume the early failures are corrected for, and if not will be reflected in later launches.
@peterh et al.: In any rocket I’m inclined to disregard the first 20% of launches for assessing reliability once the sample set gets large enough.
I find that when making Duane plots for reliability growth management estimates something like this rule of thumb is needed. Otherwise the trend becomes much less robust and the uncertainty of the estimated RGM parameters is unnecessarily large. Whether this has any bearing on the likelihood of seeing massive jumps in MTBF after the first flights, I can’t say. Your mileage may vary.
As promised, I have created a Duane plot for the Soyuz Voskhod family. Here is the link. In a Duane plot, we show the cumulative MTBF versus time (i.e. launches) on a log-log plot. Duane’s hypothesis is that a system undergoing systematic RGM efforts (e.g. a test-analyze-fix-test cycle) should show a linear trend on a log-log plot, i.e. the MTBF is given by a power law. Only after the tenth failure (on 03 Dec 1971) does the Soyuz MTBF show a systematic improvement. I wonder if that’s when the Soviets got serious about improving the Soyuz.
A key parameter derived from the Duane plot is the log-log slope, i.e. the exponent in the power law. A high slope means you’re doing very well, a low slope, not so much. Here is the advice offered by NIST:
“The reliability improvement slope for virtually all reliability improvement tests will be between .3 and .6. The lower end (.3) describes a minimally effective test – perhaps the cross-functional team is inexperienced or the system has many failure mechanisms that are not well understood. The higher end (.6) approaches the empirical state of the art for reliability improvement activities.”
In accordance with this rule of thumb, I put lines for slope=0.3 and slope=0.6 on the plot. (“SOTA” = state of the art, for those unfamiliar with the jargon.)
By this standard, it would appear that the Soviet team was at the low end of the empirical range, with a slope of 0.341+/-0.012, based on starting the RGM at the 10th flight.
For comparison, here is the Duane plot for the Delta. In this case the cutoff is not so clear, in part because there have been fewer launches (which is why the data appears somewhat coarser). I chose to start the regression at the second failure. But the power-law trend seems fairly clear, with slope 0.314+/-0.037 (overlapping the Soyuz confidence interval). Eyeballing it, it looks like you would get similar results starting at the third or fourth failure.
Note that, as Ken has pointed out, the instantaneous value of the MTBF is higher than the cumulative MTBF in an improving system. I have printed both CMTBF and IMTBF estimates on the charts. These estimates assume that the Duane law with the estimated parameters continues up to the present launch count.
Homework: Anyone care to put together a Duane plot for SpaceX? You can start with 3 losses in flights 1-3 if you want, or take Peter’s guesstimate of MTBF=4 after flight 2 for just the Falcon 9. Then add the slope = 0.3 and 0.6 lines to try to bracket their growth. I would advise against starting the failure rate at zero for the F9 first two flights, because this is a log-log plot!
Then try to compute how likely Ken is to win his bet….
Disclaimer: the flight numbers are small right now… so uncertainties are correspondingly large. Note that the Duane plot regression requires failures to feed the calculation. As it happens, most launch vehicles have not had a problem providing data. And as I said above, your mileage may vary. 😉
@Ken: I see a lot of analysis going on including predictive. The results so far seem to bear this out. There is never going to be enough analysis to satisfy everyone.
Right now their record stands at 3 losses out of 7 launches across their product line. That’s not enough “analysis” to satisfy me, at any rate. And as I tried to emphasize above, it’s not just analysis. Testing, analysis, simulation, and design all have to be involved, and all have to be informed by an understanding of launch vehicles — how they succeed, and how they fail.
But yes, they’ve lowered costs. And they’ve done a wonderful job meeting schedule. Maybe NASA could learn something from them….
Right now their record stands at 3 losses out of 7 launches across their product line.
I get very tired of the disingenuousness of this kind of statement. I expect it from morons like “abreakingwind” or Constellation partisans, but I expect better from you. It implies that the failures were random, and that the reliability of (say) the Falcon 9 is only 4/7ths, when the losses were early, on test flights, and the last four flights have been consecutive successes. Not to mention that Falcon 9 is a different “product” than Falcon 1, and it is two for two.
@Larry: Errors happen. We have to deal with them as best we can.
If your point is that the imperfectibility of software is an excuse not to exercise effective software control, then I demur.
You’re not talking about effective software control. You’re thinking that perfection in software is possible.
“Testing can establish the presence of errors, but cannot guarantee their absence.” [E. W. Dijkstra]
Those of us who have worked as professional programmers know this to be true. It’s a fundamental, unescapable reality of life.
@Larry: Using the “fly it and see what breaks” approach, Lockheed’s Skunk Works went from blank paper to flying the Blackbird (A-12 Oxcart version) in about 3 years.
Using the “let’s analyze it to death” approach, the F-22 took over 20 years to become operational and that’s with massive overruns and other associated problems.
I’m not sure you’re stating this correctly. The draft USAF requirements for the ATF were issued in 1981. There was a study phase leading to a downselect and a Dem/Val phase starting in 1986, so we were ‘cutting metal’ by the mid-1980s. The YF’s were flying in summer 1990. IOC was delayed as the Clinton administration dragged their feet in the post-Cold-War environment. It was only the advent of the Bush (43) administration that got us to IOC. I think it’s foolish to blame this on “analyzing it to death”. Given the extraordinary capabilities of the F-22, I think you’d have to say the design effort was also extraordinary — in fact the F-22 team was awarded the Collier Trophy in 2006. (And I’d have to say SpaceX is a worthy contender for the Trophy, too, despite all my grousing…)
But I don’t want to take anything away from Kelly Johnson (in fact, he and his team won the Collier Trophy in 1963 for the SR-71). It’s a legendary aircraft. Dangerous as hell (12 out of 32 lost in accidents), leaked JP-7 like a sieve, but nothing could touch it….
But note the A-12 and SR-71 were two different aircraft, the latter a significant mod of the former — longer, heavier, slower, but more capable. The A-12 was about five years in development (1957-62) and was cancelled before IOC, in December 1966, mainly due to cost overruns. The A-12 did fly some missions in 1967 and 1968, and participated in a fly-off against the SR-71 in 1967. The SR-71, meanwhile, entered service in 1964 (7 years, not 3, after development of the A-12 began). I don’t know how up-to-date the FAS website is, but it is still showing three operational SR-71s assigned to Edwards AFB.
@Larry: You’re not talking about effective software control. You’re thinking that perfection in software is possible.
Where did I say that?
What I did say is that there are effective modern practices of software specification, development, verification, and control that minimize unreliability. And if you don’t follow these practices, and you screw up and upload the wrong flight trajectory, it’s your fault.
Those of us who have worked as professional programmers know this to be true. It’s a fundamental, unescapable reality of life.
Is that why my copy of Firefox crashes every few days?
@Rand: I get very tired of the disingenuousness of this kind of statement. I expect it from morons like “abreakingwind” or Constellation partisans, but I expect better from you. It implies that the failures were random, and that the reliability of (say) the Falcon 9 is only 4/7ths, when the losses were early, on test flights, and the last four flights have been consecutive successes. Not to mention that Falcon 9 is a different “product” than Falcon 1, and it is two for two.
Everyone who has made it this far in the thread understands that the numbers are (0,0,0,1,1) for Falcon 1 and (1,1) for Falcon 9. We’re just discussing what that means. I look at the Falcon 1 failures and I see a pattern of not just design flaws, but flaws in the process of designing the vehicle. I would argue that it is you who is insisting that the failures have no systemic meaning — that they are just weird (“random”?) events. However, I won’t back away from using randomness as a model because that’s what reliability estimates mean. We don’t get to restart the clock on Shuttle failure rate just because we’ve fixed the O-ring problem. We don’t get to restart the mishap rate on F-16s with every block upgrade. It doesn’t work like that; that’s not what the numbers mean.
If there is a systemic problem with SpaceX’s design practice, it is perfectly legitimate to lump these two vehicles together. Indeed, you seem to want to do this, citing “four successes in a row” — even though in the next sentence you say Falcon 1 and Falcon 9 are two different beasts entirely. But if you just take the four successes and ignore the three failures, you’re cooking the data.
And you seem to want to read a lot into the (1,1) record of the Falcon 9. And I have pointed out that this is either disingenuous or thoughtless.
Suppose I have a coin that I know to be unfair. In fact, I suspect that it is much more likely to come up heads [launch success] than tails [launch failure]. If I flip the coin twice and get two consecutive heads, what is my 90% confidence interval on the probability of tails [failure] ? I’ll tell you: the 90% confidence interval is that pfail falls between 2.55% and 77.6%. The 50% confidence value is that pfail is 29.3%. I’m sorry you don’t like the numbers, but those are the facts. Two successes in a row just doesn’t mean that much. I mean, it’s better than three failures in a row, as Falcon 1 started, and it’s better than the single success of Ares I-X, but it absolutely does not “prove” the system is reliable. Like I said above, call me when Falcon 9 reaches 7 successes with no failures. And I will buy you a Diet Coke when they reach 34 successes with no failures.
My “four successes” was in the context of your lumping them all together. I do consider Falcon 9 a different vehicle. Is it your contention that two successes with no failures tells us no more about the reliability than would have two failures with no successes?
I would contend that we now know that the design has been validated, in the sense that there are no fundamental flaws with it that prevent it from flying as designed when manufactured and operated to specification. That is all that I’m reading into its record, and I think it’s a fair reading. I have assigned no number to its reliability.
Once you’ve got your rocket flying right, isn’t the biggest part of keeping it that way eliminating faulty design changes, changes in procedure, subsequent errors in assembly, and problems caused by changes in the environment?
Using the Shuttle disasters as examples, if it wasn’t it the factors I mention, wouldn’t there have been no Shuttles lost?
I posted the above without seeing Rand’s 1:57 comment which says the same thing
Is it your contention that two successes with no failures tells us no more about the reliability than would have two failures with no successes?
No, why would you think that?
I told you what it means: it means the 90% confidence interval on the failure probability is (2.55%, 77.6%). If Falcon 9 had had one success and one failure, then the 90% confidence interval would have been (22.4%, 97.5%), which is considerably worse.
I would contend that we now know that the design has been validated, in the sense that there are no fundamental flaws with it that prevent it from flying as designed when manufactured and operated to specification.
Flying is better than not flying, no argument there.
It is certainly the case, however, that the functionality of the system when operating at specification limits has not been validated by these flight tests. At best we see two cases that are *probably* (in a quantifiable sense) within a standard deviation of nominal conditions. It remains to be seen how the system performs over time. As any vehicle accumulates launch experiences, the experience band widens in a predictable way. On a particular day in 1986, Challenger discovered a hard limit in launch temperature. I wish SpaceX luck. And I hope they fix the systemic problems with their design process.
Also, specific claims have been made about the reliability of the SpaceX launch vehicles. These claims are not verifiable in two test launches. Getting back to the trigger for this thread, Ric Locke made claims about the superiority of the Soyuz design practice. It seems entirely proper that those claims be explored rationally, in particular claims about the reliability of launch vehicles developed according to different “design cultures”.
“Is that why my copy of Firefox crashes every few days?”
The snarky answer to this is: No, that’s Adobe’s fault. 😀
SpaceX’s solution to “My vendors are crap” is to design and make the key components themselves.
their record stands at 3 losses out of 7 launches
Without further info is a major mischaracterization. When they replaced all the aluminum nuts with steel; that alone made all the following F1 and F9 essentially a new vehicle. When they changed the software to allow for the residual thrust at separation, again it is a new vehicle from that point forward. This will be true in the future as well with any changes they make.
Three losses followed by four success would be a better representation of the facts.
SpaceX itself is a different company. 100 employees in the early years verses 1200 and growing now. Some of those new employees bring aboard some of the institutional knowledge you demand. Growing a company like that is not something anybody can do.
The main thing SpaceX has done is made a liar out of all those that said it can’t be done. Every company has to mature. You might want to see if they’re hiring consultants.
Three losses where they learned things would be even better.
@Ken: again it is a new vehicle from that point forward.
By your reasoning, then, if SpaceX lost one vehicle, say, every ten flights, then as long as they made the appropriate engineering fixes afterwards they could could continue to claim a pristine 100% launch reliability?
In your mind, does Shuttle have a 100% launch reliability? I’m just trying to understand your reasoning here.
I’m just trying to understand your reasoning here.
I’m having a Joe Wilson moment here…
My point is it’s not just about statistics. A point others have made as well and you are too intelligent not to get that.
Proof is in the pudding they say. I expect they will have failures in the future… so does Elon. But you seem to be looking at a few issues and ignoring a great deal more. You don’t know the expertise they are applying, yet are willing to say what they aren’t. You write about reasonable things and then apply them in an unreasonable way.
That’s not all bad but it is presumptuous.
@Ken: My point is it’s not just about statistics.
Well, as someone who sometimes gets paid to advise on statistical matters, I’m biased towards quantitative reasoning. But here are some points:
I’ve already said and will willingly repeat, every successful launch justifies popping the champagne corks. You are right that it’s not just about nudging the stats one way or another.
I’ll also repeat that two successes is better than two failures.
But I will also maintain that two successes is pretty weak evidence if you’re trying to assert that reliability is demonstrated.
OTOH, let me restate what I think Rand was trying to say: the two successes demonstrate capability quite clearly, even if reliability is still in question. And that is admirable, and I would applaud if SpaceX received the Collier Trophy this year.
My presumptuousness in this thread was prompted by Ric’s gleeful (and, I contend, ignorant) assertion that the Soyuz was “the most reliable launch vehicle in the world”. Some people have taken umbrage that I opted to challenge this assertion and its application to SpaceX, and to explore the meaning of SpaceX’s record, and whether it represents a compromise between reliability and cost.
And yes, I am inferring some problems in their design process based on scant data, because they have had very few launches and several highly public, avoidable failures. In turn the defense of SpaceX seems to amount to (1) no, their design process was not defective, (2) they’ve fixed it anyway, and (3) their two successes with Falcon 9 erase any doubts.
This is the aerospace equivalent of the “broken tool” defense, a famous legal paradigm. When defending a client accused of borrowing a tool and returning it broken, lawyers use a three-layered defense: (1) the tool was never borrowed, (2) the tool was broken when the defendant took possession of it, and (3) the tool was returned in perfect condition.
I’d say that the evidence is that SpaceX has a system that will get to space more cheaply than has been done in the past, and that at this stage there’s no reason to claim that they won’t be able to follow the same trend in improving reliability as their operating experience increases that others have previously achieved.
their two successes with Falcon 9 erase any doubts
Who is saying that? Your critique is reasonable. It’s just the assertion without sufficient evidence that not just I question. For example, you really don’t know why they chose not to include baffles originally. They were wrong and fixed it. You can’t assume what consideration they gave before testing.
Reliability depends on a number of things. Quality control during manufacturing a major component. Assuming you get things right in the design you also have to keep them right. Here your statistical controls make sense along with the level of inspection they’ve exhibited.
Development is not so clear cut. People disagree. Testing is the only way to settle issues. I’ve worked for companies where problems persisted because they didn’t take the SpaceX approach. They will not only get results their way, it will be a safer vehicle for humans because they put more emphasis on real world testing than analysis which can hide a lot of faulty assumptions. Testing is just one part of assuring reliability, but it’s the one you can have the most confidence in.
Using aluminum nuts caused a failure. That failure mode no longer exists. It then becomes invalid to lump that failure statistically with a rocket that no longer has that failure mode. That’s why development is considered apart from operations. Sometimes they overlap. They will be making further changes and operating as well. These are risks that everybody takes. Hopefully it’s done with eyes fully open. Frankly, I’d trust SpaceX much more than many others. With 20/20 hindsight you say their mistakes were avoidable. I’d say they learned and matured and will continue to mature.
I think you have a valid point about avoiding mistakes, but I think SpaceX is very much on the right track.
@Ken: Who is saying that?
I think several people in this thread have come close to that. But what I was referring to was a press conference Elon Musk gave after one of the Falcon 9 flights (not sure which one). The reporter on hand invited Musk to say whether their launch success would silence critics. So, okay, there’s an air gap between “silence critics” and “erase doubt”. And to Elon’s credit, he didn’t take the bait. But it doesn’t seem to deter people who should know better from crediting SpaceX’s sparse and troubled flight experience with more than is warranted.
And I should reiterate that SpaceX fans like to say “four successes in a row”, but that ignores SpaceX’s failures to recover boosters. Yes, I understand they say this is not a flight objective, but to my eye this still falls in the pattern of not learning from previous launch vehicle experience. And that’s a difference between your point of view and mine: you see all these failures and shortcomings as unrelated and in that sense “fixable” and ephemeral, while I see them as part of a larger pattern that I don’t see changing.
That failure mode no longer exists. It then becomes invalid to lump that failure statistically with a rocket that no longer has that failure mode.
Well, no, that is not the way it works. I asked rhetorically for you to clarify but now I will just say that this is not the meaning of reliability. You don’t get to restart the clock just because you’ve made an engineering change. Do you really think Shuttle is 100% reliable because the O-ring and foam issues have been addressed? Perhaps you are just one of those people who doesn’t believe in “reliability”, just on general principles. That’s a lifestyle, too, I guess.
Perhaps the most important reason we don’t reset the clock with every engineering change is that the temptation to cook the books is so strong. It is very difficult to come up with objective criteria for trimming data out of a reliability analysis. Of course, partisans can invent any number of subjective reasons for trimming failures to make the statistics look better, but all that gives you is meaningless garbage.
Another reason we don’t restart the clock and throw away early failures is that it makes it difficult to quantify the reliability growth of a system. More on this below.
Several folks have tried to make this argument, that the SpaceX failures don’t mean anything in particular and don’t have anything to do with the system reliability. Just as life is what happens while you’re waiting for something better to come along, unreliability is what happens while you’re perfecting your system. Soyuz, the system that triggered my part of this conversation, is up in the 1000s in terms of launch numbers, and is still undergoing engineering changes (I think), and is still seeing about a 2-3% loss rate. Now, the loss rate seems to have improved over the years, but it doesn’t seem to be headed under 1% anytime soon. OTOH, there are some empirical laws that suggest that with sustained engineering effort invested in a reliability growth program, the reliability will continue to grow. Try googling “Duane plot”. It’s too soon to make a Duane plot for SpaceX, but now that I’m curious, I will compute one for Soyuz.
In fact, if you go to the NIST site which is the first to come up when you google “Duane plot”, you will see an example of tracking improvements in reliability with a Duane plot. NOTE in particular that we do not reset the clock with every engineering change. And note how the reliability improves with time.
bbbeard, rereading through this thread I’m left wondering just what your point is outside of that some commentators in the media make claims and put questions in ways to try to create media stories.
You’ve utterly littered your comments with strawmen. Reread your first paragraph in the above comment and tell me what the point of it was.
Everyone here knows that a high failure rate over the first few launches is nothing unusual.
You’re trying to argue that SpaceX got too much wrong in the first few flights, but hell, as an organisation that’s new to the launch business most people would’ve been surprised if they hadn’t had failures.
At this point I’m wondering, if they had achieved a 100% success rate, you’d be telling us all that this bode poorly for their future as all rockets will sometimes fail, and by not having the opportunity to learn from failures they’d get too cocky and this arrogance would cost lives in the future.
I think you find fault because you want to find fault.
You don’t get to restart the clock just because you’ve made an engineering change.
What an absurd statement. You’re saying their is no difference in reliability between any of the vehicles. That ignores reality. While you can put a number on the all seven attempts, it’s a meaningless number if you ignore what they’ve learned and changed.
That’s like saying on average I weigh a hundred pounds because at one time I was a five pound baby. Totally meaningless.
@Andrew: You’ve utterly littered your comments with strawmen. Reread your first paragraph in the above comment and tell me what the point of it was.
Ken asked “Who is saying that?” in response to my characterization “(3) their two successes with Falcon 9 erase any doubts” and I was answering the “who” part.
My substantive criticism on this point (for which (3) was a proxy) is that many people is this thread have pointed to the two Falcon 9 successes as though that were a counterargument to concerns about the overall reliability of SpaceX vehicles. If you don’t understand that two successes in two tries is very weak evidence, after all the different explanations I have presented, including the computed confidence intervals, I suspect you won’t respond to rephrasings of that argument. What I would suggest is that you return to my posts where I discuss the confidence intervals for the (0,2) sampling plan and see if you can understand the point.
You’re trying to argue that SpaceX got too much wrong in the first few flights, but hell, as an organisation that’s new to the launch business most people would’ve been surprised if they hadn’t had failures.
You’re missing the point. The point is not that they’ve gotten “too much wrong”. It’s not even that three failures in a row bodes ill for the future. It’s what they’ve gotten wrong. They’ve gotten things wrong for which common risk mitigations are well known across the industry. That’s what bodes ill for their future.
if they had achieved a 100% success rate, you’d be telling us all that this bode poorly for their future as all rockets will sometimes fail, and by not having the opportunity to learn from failures they’d get too cocky and this arrogance would cost lives in the future.
Okay, now you’re creating straw men (“… what bounces off me sticks to you” as the childhood playground taunt goes). But the one grain of truth in that straw man is that I would be more impressed if they had found new ways to fail launches. Then we could all learn from their failures. Instead we sit and watch the launch from the sidelines and afterwards say, “gosh, didn’t we lose rockets in ’68 and ’72 because of X and then start doing Y? Why didn’t they do Y?”
I think you find fault because you want to find fault.
There is a fine line between a troubleshooter and a troublemaker, isn’t there? 😉
@ken: What an absurd statement. You’re saying the[re] is no difference in reliability between any of the vehicles. That ignores reality. While you can put a number on the all seven attempts, it’s a meaningless number if you ignore what they’ve learned and changed.
No, no, no — I’m not saying there is no difference in reliability between the vehicles. The estimated reliability is growing with every successful launch. That’s what the downward slope of the SpaceX Learning Curve means. The problem is that you don’t like the numbers. I am saying you don’t get to reset the clock after every test-fix cycle and continue to claim a 100% success rate because that sounds better. To me this is a really obvious point. It’s simply not how engineers do reliability growth management (RGM). If you don’t believe me, consult the DoD RAM Guide. Look at Appendix C, in particular the section of the guide dealing with test-fix-test RGM plans. I should also point out we usually use a Duane plot to quantify reliability growth, but until SpaceX loses another vehicle, their Duane plot is singular.
I don’t know what your industry experience is, or what your particular expertise is, but your attitude exemplifies the know-nothingness that I’m concerned about at SpaceX. You mean well, you formulate intelligent arguments, you sound well-educated, but you just can’t seem to accept that thousands of really smart guys — heck, call ’em rocket scientists — have been working launch vehicles for fifty years and have learned a lot about how to do things. One of those things is reliability modeling. I’m telling you how things are done, what they mean, and you respond that you would like to reinvent the wheel because it makes more sense to you to do it your way. I imagine (yes, I have to imagine, because SpaceX has not been responsive to inquiry) that similar folks sat around a conference table at SpaceX discussing why you put baffles in propellant tanks — and succumbed to arguments that software would be much lighter than baffles. And the problem is not so much that they made a bad decision, it’s that their corporate culture was defective. Is it better now? You can say that, but I don’t see much evidence that it is.
The problem is that you don’t like the numbers.
I’m fine with the numbers. I just think reliability is only meaningful in operations. During development you’ve got too many variables to have a meaningful number. That final number should be predictive, but it really is not because you have a combination of too many variables and not enough cases.
Do you really expect they will lose 42% of all future launches? I’m not ignoring the downward curve. I’m saying 42% is either meaningful or it’s not. Once you get to operations you have enough randomness that statistical methods becoming meaningful. Otherwise, it’s just playing with numbers in a non random environment.
I’d love to play a roulette wheel that had manufacturing flaws that made it not uniformly random.
Too many variables isn’t really the proper way to say it. Actually, more random variables help statistically. I suppose it’s a question of, is the sample representative? The main problem is we have too few cases to say.
@Ken: I just think reliability is only meaningful in operations.
Sounds like a novel theory. Do you have a cite for that?
If what you’re saying reflects the design philosophy of SpaceX, it’s no wonder they have reliability problems. 😉
Do you really expect they will lose 42% of all future launches? I’m not ignoring the downward curve. I’m saying 42% is either meaningful or it’s not.
No. I don’t think you understand what the 43% means.
Referring back to the discussion of reliability growth management in the DoD RAM Guide, in a test-fix-test cycle, you expect the reliability to continue to grow. So I don’t expect the reliability to remain at 43%. That’s just where it is right now; 43% is the “maximum likelihood estimate” of the failure probability, right now. In ten launches it will be better (unless they stop improving their vehicles), so no one I know would predict that the failure rate will remain 43% forever. In fact it’s probably better than the maximum likelihood estimate right now, it’s just that we don’t have a robust way to quantify that. (But that doesn’t make the 43% meaningless, it just means its a biased estimator). Once we have enough data, we can set aside the maximum likelihood estimator and make a better estimate based on the power-law improvement of reliability.
The main problem is we have too few cases to say.
I think I’ve said that.
“Sounds like a novel theory. Do you have a cite for that?”
One method they use to develop engines is to run to destruction.
I think there’s a good argument that failure rates should exclude failures during development ie. prior to commissioning.
In ten launches it will be better (unless they stop improving their vehicles)
This is exactly my argument. Assume they do stop improving their vehicle. I’m asserting the vehicle today if launched ten times would not fail 43% of the time. So rather than just being a point on a declining scale, it is in fact not meaningful because it does not actually describe the “maximum likelihood estimate” but is in fact higher. With no improvements or changes at all the true failure rate may actually be say 20% or less. I would agree that the numbers you produce would tend toward an accurate number, but that isn’t saying they are truly meaningful as they are.
Most people weigh less than a ton. They also weigh less than a megaton. Are those very meaningful statements?
@Ken: So rather than just being a point on a declining scale, it is in fact not meaningful because it does not actually describe the “maximum likelihood estimate” but is in fact higher.
Perhaps you should brush up on your statistics before you post more. “Maximum likelihood estimate” has a specific technical meaning of which you appear to be unaware. See http://en.wikipedia.org/wiki/Maximum_likelihood, in particular the section “Discrete distribution, continuous parameter space”.
I am aware that M.L.E. is a technical term with a specific meaning. The problem with statistics is it allows some to avoid common sense. You said…
In ten launches it will be better (unless they stop improving their vehicles)
I offered you some common sense regarding unless. Would you like to address that rather than looking down your nose?
@ken: I am aware that M.L.E. is a technical term with a specific meaning.
you said/I said:
…it is in fact not meaningful because it does not actually describe the “maximum likelihood estimate” but is in fact higher.
to which I replied to the effect that you don’t seem to understand the meaning of the term. Sorry if that sounded nasal-superior to you but this is not a democratic issue. If you want to rephrase your comment I’d be more than happy to let go of it.
Now, having said that, the MLE is definitely not the only estimator. If you want to offer another estimator besides your calibrated gut feeling about SpaceX I’d be glad to discuss it but right now I see nothing quantitative forthcoming.
The problem with statistics is it allows some to avoid common sense.
The problem with lack of statistics is that it enables all sorts of nonsense.
I offered you some common sense regarding unless. Would you like to address that rather than looking down your nose?
Well, my common sense regarding your proposal (about SpaceX doing the next ten flights without any further development) is untestable. I don’t see what there is to comment on.
You’re right that I wasn’t speaking precisely. But the thing about precision is it doesn’t mean accurate. Any type of math tends to be precise because that’s the nature of numbers. But a wrong assumption leads to inaccuracy no matter how many decimal places you calculate to.
Is SpaceX success 4/7 or 2/2? Which better represents the future?
You are right that we need to wait and find out. Place yer bets…
Okay, here’s a completely gut feel bet: I bet that SpaceX will see at least one more failure in the next ten flights (of Falcon 1/9/27, or whatever), even with improvements being made continuously. Stakes? the usual: one 16 oz bottle of Coca-Cola, winner’s choice as to subspecies (I like Cherry Zero). Only for Ken, everybody! Because he has stuck it out this far…. Willing to take this bet?
Dammit, serves me right I guess for only being third most dogged kid on the block.
Elevens a winner for one soda? Yer on.
Failure defined as not getting payload to orbit. Incidentals don’t count.
Failure defined as not getting payload to orbit. Incidentals don’t count.
Agreed. I’m assuming Rand will be around to referee and put us in contact.
I’m inclined to rate Falcon9 at 3/4 reliability at this time. Falcon9 has had 2 launches out of 2. Falcon1 had early failures, the causes of which have been fixed, making them unrepresentative of current vehicles. Then 2 successes. Taking the commonality, such that the successes of Falcon1 before Falcon9 apply somewhat to the latter, count 3 successes for Falcon9. Then since we have such a small sample throw in a failure for the unknown. I expect this rating to improve with more flights.
In any rocket I’m inclined to disregard the first 20% of launches for assessing reliability once the sample set gets large enough. I presume the early failures are corrected for, and if not will be reflected in later launches.
@peterh et al.: In any rocket I’m inclined to disregard the first 20% of launches for assessing reliability once the sample set gets large enough.
I find that when making Duane plots for reliability growth management estimates something like this rule of thumb is needed. Otherwise the trend becomes much less robust and the uncertainty of the estimated RGM parameters is unnecessarily large. Whether this has any bearing on the likelihood of seeing massive jumps in MTBF after the first flights, I can’t say. Your mileage may vary.
As promised, I have created a Duane plot for the Soyuz Voskhod family. Here is the link. In a Duane plot, we show the cumulative MTBF versus time (i.e. launches) on a log-log plot. Duane’s hypothesis is that a system undergoing systematic RGM efforts (e.g. a test-analyze-fix-test cycle) should show a linear trend on a log-log plot, i.e. the MTBF is given by a power law. Only after the tenth failure (on 03 Dec 1971) does the Soyuz MTBF show a systematic improvement. I wonder if that’s when the Soviets got serious about improving the Soyuz.
A key parameter derived from the Duane plot is the log-log slope, i.e. the exponent in the power law. A high slope means you’re doing very well, a low slope, not so much. Here is the advice offered by NIST:
“The reliability improvement slope for virtually all reliability improvement tests will be between .3 and .6. The lower end (.3) describes a minimally effective test – perhaps the cross-functional team is inexperienced or the system has many failure mechanisms that are not well understood. The higher end (.6) approaches the empirical state of the art for reliability improvement activities.”
In accordance with this rule of thumb, I put lines for slope=0.3 and slope=0.6 on the plot. (“SOTA” = state of the art, for those unfamiliar with the jargon.)
By this standard, it would appear that the Soviet team was at the low end of the empirical range, with a slope of 0.341+/-0.012, based on starting the RGM at the 10th flight.
For comparison, here is the Duane plot for the Delta. In this case the cutoff is not so clear, in part because there have been fewer launches (which is why the data appears somewhat coarser). I chose to start the regression at the second failure. But the power-law trend seems fairly clear, with slope 0.314+/-0.037 (overlapping the Soyuz confidence interval). Eyeballing it, it looks like you would get similar results starting at the third or fourth failure.
Note that, as Ken has pointed out, the instantaneous value of the MTBF is higher than the cumulative MTBF in an improving system. I have printed both CMTBF and IMTBF estimates on the charts. These estimates assume that the Duane law with the estimated parameters continues up to the present launch count.
Homework: Anyone care to put together a Duane plot for SpaceX? You can start with 3 losses in flights 1-3 if you want, or take Peter’s guesstimate of MTBF=4 after flight 2 for just the Falcon 9. Then add the slope = 0.3 and 0.6 lines to try to bracket their growth. I would advise against starting the failure rate at zero for the F9 first two flights, because this is a log-log plot!
Then try to compute how likely Ken is to win his bet….
Disclaimer: the flight numbers are small right now… so uncertainties are correspondingly large. Note that the Duane plot regression requires failures to feed the calculation. As it happens, most launch vehicles have not had a problem providing data. And as I said above, your mileage may vary. 😉