How Many NASA Engineers…

…does it take to screw in a bolt? From comments here:

Take a generic piece of Criticality 2 hardware:

1) First there needs to be a released, CM-controlled drawing signed off by (among others) a stress analyst who does calculations to ensure that the bolt is not being over- or under-torqued. The drawing must be referenced later by the technician to verify the proper torque range of the bolt.

2) Then a project engineer needs to write a Task Performance Sheet (TPS) that is no fewer than 4 pages long that documents, in excruciating detail, which bolt to tighten, what tools to use, the exact locations of every piece of hardware involved throughout the entire process. The part numbers, serial numbers, and lot numbers of every part involved are recorded on the TPS. (The work instruction document defining the TPS process is 55 pages)

3) The TPS needs to be signed by the Project Engineer, his/her manager, and two Quality Engineers (who designate “Mandatory Inspection Points – MIPs – where a Quality Assurance Specialist needs to monitor the process); additional signatures (e.g. stress or materials experts) may be needed depending on the job. Then a Quality Assurance Specialist looks over the paper, approves it, and sends it to the Quality Assurance Records Center (QARC) where it is scanned, copied, and then placed in a basket to be worked.

4) Oh, we need the bolt, too. The bolt has to meet certain quality and reliability specifications, so it is purchased from an approved vendor and is most likely a MIL-spec part. When the vendor ships the part, it must be traceable by lot or serial number and accompanied by a Certificate of Conformance (CoC). The Receiving department will open the package, inspect the parts and make sure the CoC is present. Then some percentage of bolts from that lot of bolts will go to the Receiving Inspection and Test Facility (RITF) and be tested to ensure that the bolts actually meet the MIL Specs (in spite of the CoC being present). Then the RITF report is attached to the lot of bolts, with the CoC, and they all go to bonded storage.

5) The Project Engineer takes the TPS to the bond room, and someone pulls the bolt off the shelf, then a QAS makes sure that the proper part was pulled and that the CoC and RITF report are indeed attached. The parts are labeled and bagged and the Project Engineer is called to pick up the paper and part.

6) These get walked to the work area, then the Project Engineer rounds up two QASs and a union technician who has received special training on how to tighten bolts (no joke). The technician gathers the calibrated tools.

7) The technician tightens the bolt and records the tightening torque on the TPS. The QAS and NT QAS stamp the TPS to verify that they witnessed the bolt being torqued. (While the bolt is actually being tightened, 3-4 people are present watching.)

8) The Project Engineer and one of the QASs will take the hardware back to the bond room or wherever it needs to go. If the hardware is going back to the bond room, it has to be cleaned and sealed in a bag first.

9) The Project Engineer takes the TPS back to the quality office, where one or two QASs will go through the document and make sure that all of the required information was recorded and each step in the process was stamped or signed by all of the required people. Then the QAS will stamp the TPS “closed” and send it back to the QARC office, who will scan and copy it again.

I’m not going to debate the wisdom of any of these steps; any one of them are defensible in some instance. But I count around a dozen people immediately involved in the process and in general I’d say it takes a couple of days, assuming none of the required people find something they view as amiss. I’ll also point out that this is the process, as I understand it, as of today; every few weeks someone will get a wild hair up their ass and add another requirement.

In all fairness, though: I’m fairly confident that most of these people don’t make $100k a year. If they do, I need to have a talk with 4 of my managers about my salary 🙂

Every one of those procedures evolved as a response to some kind of mishappening, and they’ve accreted over decades, but if you want to know why NASA programs cost so much and take so long, there you go. And despite all of that, they destroyed two orbiters that cost a couple billion each to build, and shut down the program for years. So even when failure isn’t an option, failures occur. What is needed is an attitude that failures must be allowed for the program to succeed. The other related attitude that’s required is that what we’re doing is important, which allows the taking of risk.

[Update a couple minutes later]

I would note that one of the reasons that SpaceX can avoid a lot of this quality acceptance stuff is that they manufacture so much in house, and are vertically integrated, as a result of the fact that they couldn’t find contractors who were responsive to their needs in terms of price and schedule. The traditional NASA/AF way has bred a culture among the lower subcontractor tiers that isn’t useful for those trying to lower costs. We need to replace the existing infrastructure with more nimble players. The growing new space industry will help make that happen, but it won’t happen overnight.

31 thoughts on “How Many NASA Engineers…”

  1. There’s an old aviation saying that “when the paperwork weighs as much as the plane, the plane is ready to fly.” I guess you can multiply that by at least a factor of 10 for a NASA spacecraft.

    It’s quite likely that most if not all of those procedures were “written in blood” as the saying about aviation regulations goes. Part of the problem is that when a plane crashes, you can usually get to the wreckage to determine the cause. It’s a lot harder with rockets and spacecraft to determine the cause of the failure. The result is mountains of paperwork.

    Remember the stories from the 1980s about $800 toilet seats, coffee makers and the like? They cost that much for a reason – overspecification combined with limited production qualtities result in high prices. As a private airplane owner, I can afirm that the only thing cheap on an airplane is the air in the tires. Any certified part for a plane (e.g. an oil filter) will cost several times as much as the identical part made for cars simply due to the cost of getting approval to use it on an airplane.
    By keeping all of their production in-house, SpaceX is able to eliminate a lot of the costs for their rockets while maintaining complete control of the production process. Any company that depends on a lot of outside suppliers will have a very hard time competing against a company like SpaceX.

  2. I find it rather amusing that your blogging software replaced the digit eight followed by a close parenthesis with a ‘smiley face.’ I wasn’t sure if it was the page or my browser, so I checked in IE as well.

  3. Something is doomed with this kind of overhead. I’m reminded of the comparison of NASA to the military, where the military has the advantage of an occasional ‘cleansing’ war to clear out the chaff, bureaucrats and the like who thrive on process rather than results. I’m also of the opinion that the idea of ‘spreading responsibility’ with multiple signoffs and the like only helps to a (very low # of signatures) point. After that, people will assume that someone else would have caught any problems.

    Also on the emoticon: Not just any smiley face, a smiley face with sunglasses!

  4. The whole time I was reading those requirements, I was seeing scenes from “CSI”, “NCIS”, etc. in my head. Seems to me that there’s an even more stringent “chain of custody” in parts that go into a space vehicle than in evidence for capital criminal cases, even on TV shows…

  5. That’s an impressive list of nonsense. Clearly, that nonsense is meant to lead to “install the correct part, correctly”. That could be done just as accurately with far simpler procedures.

    The relevant questions is, what are the institutional incentives to producing simple procedures that are also just as safe and effective? Evidence suggests that NASA’s internal incentives are for maximum CYA, with little regard for simplicity, cost, time etc. Smaller commercial companies (those with a significant chance of failure and/or large growth) have internal incentives to reduce costs, increase simplicity/efficiency, and reduce time to delivery. The employees understand that if these incentives are accomplished they will a) continue to have a job and b) enjoy financial success.

    It is worth noting that it is only the failure of some small commercial companies that produces, by elimination, efficient, successful ones.

  6. a union technician who has received special training on how to tighten bolts (no joke). The technician gathers the calibrated tools.

    No joke, indeed. But, for that tiny proportion of installations where proper torque is vital, that actually sounds right.

    Without a calibrated torque wrench (or equivalent tools) you can be off by quite a bit; and the “average mechanical guy” is liable to think he can wing it, or “give it a little extra” or the like.

    The “special training” should be mostly “no, seriously, do it *exactly* with the reading on the wrench, in one go – and if you screw up, back it out and try again from scratch” – but it’s actually probably necessary.

    (But again, only for that tiny, tiny percentage of bolts where exact torque matters.)

  7. I’ve seen the “cool sunglasses smiley” on lots of different blogs. It’s what you get when you type an 8 followed by a ). 8)

  8. Agree with Sigivald about torque, and sadly we had a situation this past year we had a problem with this. A torque requirement was given but depending on which qualified technician you talked, they may have measured running torque, brakeaway torque, or both. Turned out the value was intended to be brakeaway torque, yet that was found to be inadequate. Better results were found when the technicians used, apparently incorrectly, running torque.

  9. It seems to me that if the torque is that critical, you have an unrobust design. +++++

    If it’s for ‘holding something in’, like retaining a solar panel or whatever that can’t take maximum torque, one can switch to retaining clips, then the clips can be ‘maximum hand tight’ without crushing the precious piece.

    Or if the reason is because you’re bolting in a material that’s too thin to retain a full set of threads, there’s a device called a ‘nut’.

    And if the reason the clips and nuts were deleted was because some joker thought that was what ‘reducing part counts’ was all about, please restrain me before you tell me.

  10. The keys are here: “Every one of those procedures evolved as a response to some kind of mishappening, and they’ve accreted over decades…” Indeed. Getting complex payloads to space and then doing complex missions with them there began 50+ years ago as an *extremely* low-margin enterprise, because the technology started out just barely good to do space missions at all, never mind do them with reasonable margin against minor goofs and failures.

    NASA’s current culture originated in what it took to do Apollo successfully despite razor-thin margins with a largely new organization with highly skilled and motivated staff. The culture has evolved since to deal with its transition to an entrenched bureaucracy with staff of wildly varying skill and motivation levels largely via this accretion of new layers of management and procedures.

    This culture has mostly NOT evolved to use fifty years of accumulated technology improvements to increase operating margins and move operations more toward the aircraft-like – still rigorous, but not insanely so; modern aircraft fly safely with affordable support procedures and staffs. NASA still tends to push new space exploration systems to the razor’s edge of performance margin, then addresses reliability issues by throwing armies of people and mountains of procedure at potential failure causes. The short explanation is that it’s an organizational cultural thing; the organizational reasons for this failure to evolve would make a mid-sized book…

    The fascinating question in this context is, how much of SpaceX’s remarkable success to date is a result of using the available technology to design systems with more robust margins, and how much is a result of a very motivated and skilled staff that so far gets the little things right even when nobody’s documenting every step? Both factors are visibly at work at SpaceX, of course, but the mix ratio is of interest. Robust margins are sustainable for the long term, while startup-company levels of dedication at some point cool down.

    Put another way, how well and how affordably will SpaceX make the transition to assuring the same level of evident quality for the dozenth Falcon 9 launch? For the hundredth? This is the real question, the one that will determine whether SpaceX will still be something special five-ten years from now, or whether they’ll be just another large aerospace contractor. Not that the latter is anything to sneer at for stockholders, mind – but the former has a far larger upside if they can achieve it.

  11. It seems NASA is OK with parodying itself by not only having TPS reports, but retaining the TPS acronym in our post-Office Space era.

  12. This is a perfect example of how a large organisation will, unchallenged, ossify completely.

    Read “The Rules of The Game” by Andrew Gordon…. How the Royal Navy went from the dynamic cut-throats of Nelson to an entity where an Admiral said “I am an Admiral – they do not pay me to think”. A large part of this was an attempt to proscribe everything in a rule book…. to prevent anything bad happening.

    I worked in a software development environment where this kind of check boxing went on. The main effect was that all inspection reports were faked by management. Quality over control became no quality control at all.

    Another thought – how many times has safety been compromised by this kind of system? For Apollo 13, they didn’t pull the faulty O2 tank because the procedures for replacing it were so complex that it would have caused a massive delay. So they flew with a defective tank….

  13. Anybody know how the Russians go about tightening the bolt? (This is a serious question, not facetious.)

  14. And if the reason the clips and nuts were deleted was because some joker thought that was what ‘reducing part counts’ was all about, please restrain me before you tell me.

    Keep in mind that they often spend $100,000 or more per kg (for example, the cost of the ISS minus Shuttle costs was in this neighborhood). Merely removing a single 5 gram nut could be valued about $500 in this light.

  15. Merely removing a single 5 gram nut could be valued about $500 in this light.

    Now, evaluate that cost against the cost of following those insane procedures. Figure out how many hours of labor were involved at each step and the loaded cost per hour. I think it would easily exceed $500.

    Back in 1992, I was at a meeting at Headquarters Air Force Space Command. There were over 15 of us there including 3 full colonels and a high ranking civil service type. Those idiot colonels argued for over 20 minutes about a $500 expenditure. Finally, the civil service type offered to pay the $500 out of his pocket to shut the colonels up. They’d wasted far more than $500 of our time for nothing.

  16. To answer Jim Bennett’s question:
    The Russians have the same person doing the job for the last 40+ years. I heard that there was one woman that would sew all of the thermal blankets onto the vehicles manually. Who needs a procedure when you’ve been doing the same thing for decades.

    But, the corollary is guys at Michoud who had to spit into the batch to extend the cure time because NASA changed the formulation to suit EPA standards.

  17. I withdraw my earlier comment about the Michoud workers spitting into the mix, I went back to look it up and couldn’t find it.

  18. Now, evaluate that cost against the cost of following those insane procedures. Figure out how many hours of labor were involved at each step and the loaded cost per hour. I think it would easily exceed $500.

    The insane procedures are probably followed whether or not the nut is there. For some reason, I doubt NASA is just going to let a technician hand tighten a nut on a billion dollar system. Even if someone bought a five cent nut from a hardware store, they’d probably have to provide torquing instructions and most of the rest of the procedure would be followed anyway.

  19. Whatever happened to the KISS method?

    I would note that one of the reasons that SpaceX can avoid a lot of this quality acceptance stuff is that they manufacture so much in house, and are vertically integrated, as a result of the fact that they couldn’t find contractors who were responsive to their needs in terms of price and schedule. The traditional NASA/AF way has bred a culture among the lower subcontractor tiers that isn’t useful for those trying to lower costs. We need to replace the existing infrastructure with more nimble players. The growing new space industry will help make that happen, but it won’t happen overnight.

    The day SpaceX doesn’t need to manufacture a LOX valve in-house, but can instead buy an Unreasonable (for example) LOX valve off the shelf, will be a very good day. I agree that day is not tomorrow, but that day is coming.

  20. Does anyone know the similar routine for construction of a commercial airliner? I’m assuming that once a batch of parts (or even a manufacturer) is certified, most of the ‘evidence chain’ stuff can be streamlined. Also, with the idea of building multiple copies, much of the rest of the procedure listed here would become moot.

  21. Remember the stories from the 1980s about $800 toilet seats, coffee makers and the like? They cost that much for a reason – overspecification combined with limited production qualtities result in high prices.

    The high lifecycle cost due to the weight of the items might also have something to do with it. What’s the fuel cost of transporting that toilet seat over thousands of flights? It would make sense to buy what would appear to be an absurdly expensive custom item if it were sufficiently lighter.

  22. Torque is always critical. Just because you get away with not properly torquing bolts in your backyard, doesn’t mean that it’s correct. Overtorquing can lead to premature failure of bolts, possibly causing serious damage or loss of life. Undertorquing can be similarly just as dangerous. This applies whether it’s spacecraft, automobiles, bicycles, buildings, etc.

    And to comment on the extensive procedures, half of the complexity can be attributed to ISO standards, which are followed by many companies, and are proven to reduce errors and costs.

    I will agree, to some extent, that there is room for reduction of overhead. The future NASA will likely be evaluating this. However, NASA and its contractors largely follow Lean Six-Sigma practices that are designed to eliminate or reduce non-value-added activities and minimize defects, so I would assume someone has already looked into this (politics notwithstanding) and determined that the inspection points are indeed necessary.

    By reading this post, it appears to me that the author is commenting via third-party account. There are many inaccuracies in the description of his bolt-tightening procedure. Perhaps this could be attributed to conventions used at another center, but as far as I know, acronyms, standards, and specifications usually apply agency-wide, not just to one specific center.

  23. The high lifecycle cost due to the weight of the items might also have something to do with it. What’s the fuel cost of transporting that toilet seat over thousands of flights? It would make sense to buy what would appear to be an absurdly expensive custom item if it were sufficiently lighter.

    In general aviation, it’s quite common for many parts to be no different than those used in cars except for the certification for aviation use. Obtaining and maintaining that certification is quite expensive and since so few units are sold (compared to cars), the prices are high.

    While I can’t find the figures, I’d be willing to bet that the cost of a toilet seat for an airliner is quite a bit higher than what you’d find on the shelf at your local hardware store. It’s unlikely it’d be any lighter than an off-the-shelf unit, it just has to meet FAA certification requirements. Back in the 1980s, did the military try to use airliner certified toilet seats, coffee makers, etc for their transport planes or did they order custome made units that would be much more expensive? I don’t know the answer to that.

    Back to the original topic, those insane procedures explain why it took a cast of thousands of people several months to prepare a Shuttle for launch, which is why the Shuttles were so expensive to operate.

  24. Compare the procedures involved with changing a bolt to the procedures SpaceX used on that cracked engine nozzle. How long would NASA have taken, how many millions of dollars would have been spent to do what it took a single SpaceX technician to do – the biggest cost to SpaceX was probably flying him across the country at short notice.

  25. This really isn’t so bad, I don’t necessarily see it as a ‘problem’. To those who say they could come up with a better/simpler method, I would like to see you prove it.

    What is outlined here is a pretty basic change control process with some added bells and whistles (ie the physical QA witnesses). It is efficient when you consider the hundreds of thousands of parts and millions of transactions going on in NASA when they put government grade space-faring vehicles together and launch them on a regular basis to complete missions in outer space.

  26. What a misleading piece of crap-ola. He even has to expand this article to include steps like making a copy of a piece of paper. That’s a sign he’s trying to make this look as bad as possible and not present an honest assessment. There are so many incorrect and misleading statements in here but I don’t want to waste my next 2 hours responding to each of them. But a few things jump out: first of all, he doesn’t even say what “Criticality 2” means. It means this is part of an assembly that is a single point failure. If this fails, the $xxxx million mission is over. So, yes, it seems smart to be very careful with this bolt. Someone makes this same comment in the comments section and he responds “It seems to me that if the torque is that critical, you have an unrobust design.” For a car, maybe. But for spaceflight hardware, weight is an extremely valuable commodity and you cannot have this so-called robust design anytime you want it. These are world class engineers designing these systems and to criticize them just shows your ignorance. He goes on about testing bolts but fails to mention that there is a problem with counterfeit bolts in the aerospace industry and, yes, they need to be checked unforunately.

Comments are closed.