Covariance
Very similar algorithms can be used to compute the covariance. The naive algorithm is:
For the algorithm above, one could use the following pseudocode:
def naive_covariance(data1, data2): n = len(data1) sum12 = 0 sum1 = sum(data1) sum2 = sum(data2) for i in range(n): sum12 += data1*data2 covariance = (sum12 - sum1*sum2 / n) / n return covarianceA more numerically stable two-pass algorithm first computes the sample means, and then the covariance:
The two-pass algorithm may be written as:
def two_pass_covariance(data1, data2): n = len(data1) mean1 = sum(data1) / n mean2 = sum(data2) / n covariance = 0 for i in range(n): a = data1 - mean1 b = data2 - mean2 covariance += a*b / n return covarianceA slightly more accurate compensated version performs the full naive algorithm on the residuals. The final sums and should be zero, but the second pass compensates for any small error.
A stable one-pass algorithm exists, similar to the one above, that computes :
The apparent asymmetry in that last equation is due to the fact that, so both update terms are equal to . Even greater accuracy can be achieved by first computing the means, then using the stable one-pass algorithm on the residuals.
Likewise, there is a formula for combining the covariances of two sets that can be used to parallelize the computation:
- .
Read more about this topic: Algorithms For Calculating Variance