SE_003399  SE logo

Joint, Marginal, and Conditional Distributions

We engineers often ignore the distinctions between joint, marginal, and conditional probabilities - to our detriment.

Joint, Marginal, and Conditional distributions are related.

Figure 1 - How the Joint, Marginal, and Conditional distributions are related.

conditional probability: conditional x, given y and theta where where is the probability of x by itself, given specific value of variable y, and the distribution parameters, vector theta.  (See Figure 1)  If x and y represent events A and B, then P(A|B) = nAB/nB , where nAB is the number of times both A and B occur, and nB is the number of times B occurs. P(A|B) = P(AB)/P(B), since
P(AB) = nAB/N and P(B) = nB/N so that


Joint probability is the probability of two or more things happening together. f(x, y, given theta) where is the probability of x and y together as a pair, given the distribution parameters, vector theta. Often these events are not independent, and sadly this is often ignored.  Furthermore, the correlation coefficient itself does NOT adequately describe these interrelationships.

Consider first the idea of a probability density or distribution: f(x, given theta) where  is the probability density of x, given the distribution parameters, vector theta.  For a normal distribution,  theta=(mu, sigma)where mean, mu is the mean, and standard deviation, sigma is the standard deviation.  This is sometimes called a pdf, probability density function.  The integral of a pdf, the area under the curve (corresponding to the probability) between specified values of x, is a cdf, cumulative distribution function,CDF(x, given theta). For discrete  f , F is the corresponding summation.

A joint probability density two or more variables is called a multivariate distribution. It is often summarized by a vector of parameters, which may or may not be sufficient to characterize the distribution completely. Example, the normal is summarized (sufficiently) by a mean vector and covariance matrix.

marginal probability: f(x, given theta) where is the probability density of x, for all possible values of y, given the distribution parameters, vector theta.  The marginal probability is determined from the joint distribution of x and y by integrating over all values of y, called "integrating out" the variable y.  In applications of Bayes's Theorem, y is often a matrix of possible parameter values.  The figure illustrates joint, marginal, and conditional probability relationships.

conditional probability

Note that in general the conditional probability of A given B is not the same as B given A.  The probability of both A and B together is P(AB), and if both P(A) and P(B) are non-zero this leads to a statement of Bayes Theorem:

P(A|B) = P(B|A) x P(A) / P(B)  and

P(B|A) = P(A|B) x P(B) / P(A) 

Conditional probability is also the basis for statistical dependence and statistical independence.

Independence: Two variables, A and B, are independent if their conditional probability is equal to their unconditional probability. In other words, A and B are independent if, and only if, P(A|B)=P(A), and P(B|A)=P(B). In engineering terms, A and B are independent if knowing something about one tells nothing about the other. This is the origin of the familiar, but often misused, formula P(AB) = P(A) X P(B), which is true only when A and B are independent.

conditional independence: A and B are conditionally independent, given C, if

Prob(A=a, B=b | C=c) = Prob(A=a | C=c) x Prob(B=b | C=c) whenever Prob(C=c) > 0.

So the joint probability of ABC, when A and B are conditionally independent, given C, is then
Prob(C)
x Prob(A | C) x Prob(B | C) A directed graph illustrating this conditional independence is A <- C -> B.