Optimal Resource Allocation Process

The Right Mix of Value and Performance

The heat shield also known as the thermal protection shield system (TPS) can make or break a successful NASA Space Mission. The key objective of this effort was to define the most cost-effective investment strategy related to improving the accuracy of the Orion heat shield reliability assessment process through design of a customized, user-friendly probabilistic-based real-time decision support executing tool that can be used to enable rapid, risk-informed decisions by non-engineering users, without the assistance of software specialists.

Project Overview

A heat shield is the unique sub-system of a space capsule that is designed to protect the rest of the space capsule, and its human or scientific payload, from the extreme and uncertain temperature and pressure loads encountered during the capsule’s entry into an atmosphere. For the National Aeronautics and Space Administration (NASA) Orion space capsule (Figure 1), the heat shield is designed to withstand temperatures of thousands of degrees, and pressures several times that of the normal atmosphere, during the entry of the capsule into the Earth’s atmosphere. Heat shields typically employ a variety of thermal mechanisms (e.g., conduction, convection radiation, ablation and melting) to accomplish this extremely important objective. The heat shield is usually the only sub-system of the space capsule for which no form of redundancy exists. Hence, even if everything else in the mission goes perfectly, the success of the entire mission ultimately depends entirely upon the success of the heat shield. This piece of equipment is so important, that it is routinely over-designed; in fact, the mission goals and objectives may be limited in order to accommodate this over design.

Figure 1.
The Orion space capsule.

Figure 2.
STS Columbia.

The NASA Space Transportation System (STS, i.e., “space shuttle”) Columbia (Figure 2) is the most obvious example of a heat shield failure. The vehicle was completely destroyed (February 1, 2003, Figure 3) during re-entry into Earth’s atmosphere because of unforeseen and undetected damage to the tile-based heat shield. The damage was caused by pieces of insulating foam shedding from the fuel tanks during the ascent phase of the mission from Earth two weeks earlier. The Columbia failure grounded all subsequent STS flights for about two and a half years while an in-depth investigation was conducted. The findings, observations and recommendations of the investigation resulted in significant cultural changes throughout the agency and expense increases and operational changes to provide a variety of heat shield inspections for the remaining 21 STS flights through 2011.

Figure 3.
STS Columbia’s fatal re-entry.

Figure 4.
Illustration of Arc Jet Testing.

The overall goals of the current project were to: 1) quantify the reliability of the Orion thermal protection system (TPS, i.e., “heat shield”) to withstand a wide variety of possible re-entry conditions and thermal loads, and 2) identify the key sensitivities of the predicted reliability to uncertainties and TPS design variables. Within these goals, the objective of this particular effort was to develop a real-time Bayesian-based decision support tool to provide actionable guidance for optimal resource allocation.

Problem Definition

Optimal resource allocation is a decision support process that enables decision makers to assess in real-time the costs and benefits of various courses of possible investment quickly and accurately in order to achieve the “best” outcome among competing choices. In this case, the objective was to define the most cost-effective investment strategy related to improving the accuracy of the Orion heat shield reliability assessment process (please see companion Case Studies related to the Orion Real-Time Decision Support Tool and the Orion TPS Reliability Assessment ). However, because of the large number of contributors to uncertainty, a consistent and repeatable methodology to aid in this decision process is required.

Subject Matter Experts (SMEs) identified 19 possible investment areas related to the reliability assessment process (see Figure 4). In order to assess the optimal investment strategy, it was necessary to construct cost-to-benefit models, and to quantify their associated uncertainties, for each of the possible investment areas. It is important that all the cost-to-benefit models all be compatible; that is, they must all show cost-to-benefit on the same type of scale. Also, the cost-to-benefit models must be calibrated to the number of units that can be purchased, noting any possible limitations or discounts in the way units can be acquired (e.g., “buy one get one free”, or “limit 10 units per customer”).

Figure 4.
Illustration of the Optimal Resource Allocation Process.

The team was initially skeptical of this process. The default assumption among the team members was that new arc jet data was the only investment that could improve the reliability assessment process. However, unless the existing arc jet data are completely discarded, the new data can only impact the statistical metrics (mean value and standard deviation) of the resulting data set (existing plus new data) to a certain extent. A rigorous mathematically and statically based technique was proposed which bounds the effects that obtaining new data can have upon resulting data set statistical metrics. Cost modeling proved be generally quite easy, whereas benefit modeling among the various options proved to be more challenging.

PredictionProbe’s Solution

A manual, TPS-specific, prototype resource allocation process has been developed and demonstrated. This process builds upon the reliability assessment process described in a companion Case Study. The additional cost and benefit modeling needed to evaluate each of the potential investment options for their cost-effective influence on the system reliability estimate was based upon interviews with each of a team of SMEs. The team of SMEs included people from management, and specialists in several areas including testing, aerothermal effects, trajectory, and computational modeling. Each SME was individually asked to provide a numerical rating (from zero to unity), to numerically quantify the investment choices available with respect to their potential to improve the reliability assessment process. Here, zero = worst possible investment area, and unity = best possible investment area about the cost effectiveness of 19 potential investment options. The responses for each investment choice and each SME were recorded. For each investment choice, statistical metrics were then constructed (mean value, average, 90% low value and 90% high value). The statistical metrics for each investment choice were then presented back to the team for discussion without any attribution of specific values for the rankings.

Using the established reliability process, the information flow operations were then implemented within three distinct executions of the Hugin Explorer (version 7.6) Bayesian network software from Hugin Expert A/S of Denmark (PPI uses this tool as the Bayesian update core within SPISE and distributes this software tool within the US). This software was chosen for two reasons: 1) it offers extensive Bayesian network development and analysis capability, and 2) it could be leveraged through work performed for the NASA Aviation Safety Program. The first solution used the 90% Lo numerical values, the second solution used the team average values and the third solution used the 90% Hi values; this technique avoided the possibility of having zeroes or unity values contaminating the Bayesian analyses that might occur if those analyses used the minimum and maximum SME values for each investment area.

The investment options included additional testing for each of the relevant material properties, as well as additional testing to reduce key uncertainties in the process, improved arc jet testing capabilities, improved computational capabilities, improved failure mode modeling, and additional failure mode testing. The complete list of investment choices was:

Additional material properties testing for virgin density (RHOV)
Additional material properties testing for char density (RHOC)
Additional material properties testing for virgin specific heat (CPVRG)
Additional material properties testing for char specific heat (CPCHR)
Additional material properties testing for virgin thermal conductivity (XKVRG)
Additional material properties testing for char thermal conductivity (XKCHR)
Additional material properties testing for virgin emissivity (EMVT)
Additional material properties testing for char emissivity (EMCT)
Additional material properties testing for surface roughness effects (ROUGHT)
Additional material properties testing for normalized recession model (B’ tables) (ZBPRIM)
Additional material properties testing for pyrolysis gas blowing coefficient (CPGAS)
Improved recession amount uncertainty (REC)
Improved recession RS fit uncertainty (Additional STAB cases to be executed for each reliability case)
Improved arc jet testing capability (EXP MOD)
Improved computational (STAB) capability (CMP MOD)
Improve failure mode modeling (PFL MOD)
Additional arc jet test reproducibility pairs (REP TST); each reproducibility pair consists of two arc jet tests conducted at the same conditions within the same facility
Additional arc jet test validation pairs (VAL TST); each validation pair consists of one arc jet test and one STAB computation conducted at the same conditions
Additional failure mode testing (PFL TST) for the current failure mode

Ultimately, a fully automated and generic combined probabilistic and Bayesian framework that can accommodate a flow of uncertain information forward and backward, and which allows for using expert opinion mixed with physics-based probabilities is desirable. The tools and knowledge exist to accomplish this probabilistic and Bayesian integration, but the workload exceeds the scope of this project.

Results and Conclusions

The results of three distinct Bayesian Network analysis processes for the 90-percent Lo, team average, and 90-percent Hi scoring inputs were presented to the Orion Program Management. Results were presented order to show the most favorable investment areas to the least favorable investment areas. The most favorable investment areas within the 90-percent Lo investment implementation model recommendations and are considered “low-hanging fruit” because there is SME consensus on these recommendations. Conversely, the average and 90 percent High recommendations, respectively, represent less of a team consensus. The top three most highly rated investment options turned out to be (in order from most to least favorable) were: improving the STAB ablation modeling computational code, improving the arc jet testing capabilities, and improving the validation uncertainty between arc jet test and computational modeling of those tests.

Given a specific amount of investment, very specific investment recommendations could be provided to the Program Management to improve the TPS reliability. These recommendations took the form of (for example) getting a specific minimum number of new test samples to satisfy distinct needs within the reliability assessment or making specific improvements within the reliability assessment process.

About PredictionProbe, Inc.

PredictionProbe, Inc. is a small business and proud provider of an elite offering of world-class predictive technologies, tools, and services that enable decision makers with real solutions for real world challenges. To learn more visit us at: predictionprobe.com