The relationship between POD (Probability of Detection) and False Positives depends on more than the inspection itself. It also depends on the frequency of defectives in the population being inspected.
Your NonDestructive Evaluation system signals a "hit." Is it really a crack? Or is it a "false positive?" Such a simple question - such a complicated answer!
Consider these two, distinct inspection situations:
1) You are performing an inspection on a test piece with known provenance:
2) You have performed an inspection on a part with uncertain history:
The first thing to understand is that Sensitivity and PPV are NOT the same, nor are specificity and NPV. Consider all possible outcomes of a generic inspection, summarized in Table 1:
Table 1: Generic Contingency Table of Possible Inspection Outcomes
We will consider two numerical examples. The first is a "good" inspection, with specificity = 90% and sensitivity also 90%. The second is a coin-toss representing a random "inspection" where both are 50%. In these examples (Tables 2 and 3) the frequency of defects in the population being inspected is 0.3%, the same as the prevalence of AIDS in the US. (See note 2, below.)
Table 2: Contingency Table of Possible Inspection Outcomes
This is unexpected! The conditional probability of a defect, given a "hit" is less than 3%! How could that happen?
Here's why: The population has a very small prevalence of defects, P(defect) = 0.003 (this is the prevalence of AIDS in the US) so the false calls (false positives), P(+|no defect), outnumber the true positives, P(+|defect). Thus the fraction of positives that actually have the defect is small. (This is why "screening" physicians for AIDS is a bad idea: 97% of those testing positive would not have AIDS, assuming the screening test has sensitivity = 90%. And re-testing wouldn't improve the situation either, since the inspections would not be independent.)
Why bother to inspect? Look closely at the NPV, the Negative Predictive Value, the fraction correctly passed by the inspection. NPV=0.99967. The test is doing what it supposed to do (albeit helped considerably by the low defect rate). This inspection is about ten times more effective than a coin toss, as illustrated in Table 3.
Table 3: Contingency Table of Possible Inspection Outcomes
Result to Remember:
Receiver Operating Characteristic (ROC) Curve:
Changing the decision criterion (threshold) can improve the POD (sensitivity) but at the expense of increased false calls (diminished specificity). A plot of sensitivity vs. 1- specificity, called a Receiver Operating Characteristic Curve, was popularized during World War II, and still has advocates today, in spite of the fact that it cannot consider the frequency of defectives in the population, and thus ignores PPV and NPV.
Why was the ROC effective in WWII but is hopelessly ineffective for contemporary inspections(1)? In WWII the prevalence of targets in the general population was very high, say > 50%. (If you detected airplanes in bomber formation flying toward your coast they were unlikely to be friendly.) In contemporary inspections the prevalence of defects is very, very low. (3 per 1000 for AIDS(2), for example; much lower for intrinsic material defects.) Thus the PPV (positive predictive value) in WWII was high, but in contemporary inspections, it is unacceptably low.
Mail to Charles.Annis@StatisticalEngineering.com