February 2017

Volume 32 Number 2

[Test Run]

The Sign Test Using C#

By James McCaffrey

James McCaffreyThe sign test is most often used in situations where you have “before and after” data and you want to determine if there’s statistical evidence of an effect of some sort. The idea is best explained by example. Suppose you’re working at a pharmaceutical company and want to know if a new weight-loss drug is effective. You get eight volunteers to use your drug for several weeks. You look at the weights of your eight subjects before and after the experiment. Say six out of eight of the subjects lost weight. Is there solid statistical evidence to suggest that your drug worked?

Weight loss is a classic sign test example, but the test applies in many IT and software scenarios, too. Suppose you have 40 Web server machines and you apply a software patch designed to improve performance. You measure response times before and after applying the patch. What can you conclude if 32 servers showed better performance, two servers showed no change, and six servers showed worse performance?

The best way to see where this article is headed is to take a look at the demo program in Figure 1. After reading this article, you’ll have a solid grasp of what type of problem the sign test solves, know exactly how to perform a sign test using C# and understand how to interpret the results of a sign test. All of the source code for the demo program is presented in this article. You can also get the complete demo program in the code download that accompanies this article.

The Sign Test Using C#
Figure 1 The Sign Test Using C#

The demo program sets up eight pairs of before-and-after data where the goal is to determine if some weight loss regimen had an effect or not. From the data, six of the subjects did show a weight loss, but two subjects showed a weight increase. The demo program computes the probability of “no effect” to be 0.1445. It’s up to you to interpret the results, for example, “The data shows a weak indication (p = 0.8555) that the weight loss effort had an effect.”

This article assumes you have at least intermediate programming skill but doesn’t assume you know anything about the sign test. The demo code is written in C# and relies on the .NET System.Numerics namespace, so you’ll need the Microsoft .NET Framework 4 (released in 2010) or later.

The Demo Program Structure

To create the demo program I launched Visual Studio and selected the C# Console Application template from the New Project menu item. I named the project SignTestUsingCSharp. After the template code loaded into the editor window, I right-clicked on the file Program.cs in the Solution Explorer window and renamed the file to SignTestProgram.cs, then allowed Visual Studio to rename class Program for me.

Next, I right-clicked on the project name and selected the Add | Reference item. From the Assemblies | Framework list, I selected the System.Numerics namespace and clicked OK to add it to my project. At the top of the editor window, I deleted all using statements except for the one referencing the top-level System namespace, and then I added a using statement to reference the System.Numerics namespace.

The overall structure of the program is presented in Figure 2. For simplicity, the program uses a strictly static method approach rather than an object-orienting programming (OOP) approach. Methods DoCounts ansd ShowVector are utility helpers. The work of calculating the no-effect probability is performed by method BinomRightTail. Methods BinomProb and Choose are helpers for BinomRightTail.

Figure 2 Sign Test Demo Program Structure

using System;
using System.Numerics;
namespace SignTestUsingCSharp
{
  class SignTestProgram
  {
    static void Main(string[] args)
    {
      Console.WriteLine("\nBegin Sign Test demo \n");
      // All calling statements go here
      Console.WriteLine("\n\nEnd Sign Test demo \n");
      Console.ReadLine();
    }
    static int[] DoCounts(double[] before,
      double[] after) { . . }
    static void ShowVector(string pre, double[] v,
      int dec, string post) { . . }
    static double BinomProb(int k, int n,
      double p) { . . }
    static double BinomRightTail(int k, int n,
      double p) { . . }
    static BigInteger Choose(int n, int k) { . . }
  }
}

After displaying a couple of introductory messages, the Main method sets up and displays demo data for a sign test:

double[] before = new double[] { 70, 80, 75, 85, 70, 75, 50, 60 };
double[] after  = new double[] { 65, 78, 72, 87, 68, 74, 48, 63 };
Console.WriteLine("The weight data is: \n");
ShowVector("Before:  ", before, 0, "");
ShowVector("After :  ", after, 0, "\n");

In a non-demo scenario with anything larger than about 30 data pairs you’d likely have data stored in a text file, and you’d write a helper method to read and store the data. Using parallel arrays is the most common approach when doing a sign test.

Next, the demo uses method DoCounts to count the number of item pairs where there was a decrease in weight, a “success,” and the number of weight increases, a “failure”:

int[] counts = DoCounts(before, after);
Console.WriteLine("Num success = " + counts[2]);
Console.WriteLine("Num failure = " + counts[0]);

The return value is an array where cell 0 holds the count of fails (weight increase), cell 1 holds the count where there was no change and cell 2 holds the count of successes (weight decrease). In the days before computers were easily available, the counts were done manually by putting a “+” sign next to successes and a “-” sign next to failures. This is why the sign test is named as it is. For the demo data, the manual approach would look like:

Before:  70 80 75 85 70 75 50 60
After :  65 78 72 87 68 74 48 63
         +  +  +  -  +  +  +  -

Notice the sign test doesn’t take into account the magnitude of a weight increase or decrease. Next, the demo prepares the call to the sign test like this:

int k = counts[2];
int n = counts[0] + counts[2];
Console.WriteLine("k = " + k + " n = " + n + " p = 0.5");

Variable k holds the count of successes. Variable n holds the total count of data pairs. In this situation, there are no instances where the before-and-after weights were equal. In such situations, the most common approach is to toss out ties. However, in some situations you might want to include ties as either successes or failures. For example, in a weight loss program, no change in weight would likely be considered a failure.

The Main method concludes with:

double p_value = BinomRightTail(k, n, 0.5);
Console.WriteLine("\nProbability of 'no effect' is " + p_value.ToString("F4"));
Console.WriteLine("Probability of 'an effect' is " + (1 - p_value).ToString("F4"));

The sign test is actually a specific example of the more general binomial test. Program-defined function BinomRightTail accepts the number of successes, the number of data pairs and a probability value, 0.5 in this case. When a binomial test uses 0.5 for the probability parameter, it’s a sign test, as I’ll explain shortly.

Understanding the Choose Function

The sign test uses the binomial distribution, which in turn uses the Choose function. The Choose(n, k) function returns the number of ways to select k items from n items. For example, Choose(5, 3) is the number of ways you can select three items from five items. Suppose the five items are (A, B, C, D, E). There are 10 ways to select three of the items:

(A, B, C), (A, B, D), (A, B, E), (A, C, D), (A, C, E),
(A, D, E), (B, C, D), (B, C, E), (B, D, E), (C, D, E)

The Choose function is defined Choose(n, k) = n! / [k! * (n-k)!] where the “!” character means factorial. So:

Choose(5, 3) = 5! / (3! * 2!) = (5 * 4 * 3 * 2 * 1) / (3 * 2 * 1) *
  (2 * 1) = 120 / 12 = 10

Implementing the Choose function is tricky because the return value can be astronomically large for even moderate values of n and k. For example:

Choose(100, 25) = 242,519,269,720,337,121,015,504

In order to return the very large values that can occur in the sign test, the demo program uses the BigInteger type in the System.Numerics namespace. The demo implementation of Choose uses two math tricks for efficiency. First, as it turns out, Choose(n, k) = Choose(n, n-k). For example:

Choose(10, 7) = Choose(10, 3)

By using the smaller value of k you can do fewer calculations. Second, there’s an alternative definition of Choose that’s best explained by an example:

Choose(10, 3) = (10 * 9 * 8) / (3 * 2 * 1)

In words, the denominator is just k! and the numerator uses just the first k terms of the n! equation and many terms cancel out. Putting these ideas together, the demo implementation of Choose is presented in Figure 3.

Figure 3 The Choose Function

static BigInteger Choose(int n, int k)
{
  if (n == k) return 1; // Required special case
  int delta, iMax;
  if (k < n - k) { // Ex: Choose(100,3)
    delta = n - k;
    iMax = k;
  }
  else { // Ex: Choose(100,97)
    delta = k;
    iMax = n - k;
  }
  BigInteger ans = delta + 1;
  for (int i = 2; i <= iMax; ++i)
    ans = (ans * (delta + i)) / i;
  return ans;
}

Understanding the Binomial Distribution

The key to understanding how to implement and interpret the sign test is understanding the binomial distribution. It’s best explained by example. Imagine you have a biased coin where, when flipped, the probability of getting a head is 0.6 and the probability of getting a tail is 0.4, and suppose you define a success as getting a head. If you flip the coin n = 8 times, the binomial distribution gives you the probability of getting exactly k successes in n trials where the probability of a success in a single trial is p = 0.6 in this example.

The probability of getting exactly eight heads and zero tails in eight flips is the probability of getting eight consecutive heads, which is:

Pr(X = 8) = 0.6 * 0.6 * 0.6 * 0.6 * 0.6 * 0.6 * 0.6 * 0.6 = (0.6)^8 * (0.4)^0 = 0.0168

To get exactly seven heads in eight flips you can get seven heads plus one tail on any of the eight flips. There are eight combinations:

Pr(X = 7) = Choose(8, 1) * [ (0.6)^7 * (0.4)^1 ] = 8 * 0.0280 * 0.4 = 0.0896

The general equation for the probability of getting exactly k successes in n trials where p is the probability of a success in a single trial is:

P(X = k) = Choose(n, k) * p^k * (1-p)^n-k

In the case of the sign test, p is always 0.5 so 1-p is also 0.5 and the equation simplifies to:

P(X = k) = Choose(n, k) * (0.5)^n

So, for the demo data, there are n = 8 trials (pairs of data) and there are k = 6 successes (weight losses), so the probability of getting exactly six successes is:

P(X = 6) = Choose(8, 6) * (0.5)^8 = 28 * 0.0039 = 0.1094

The probabilities of getting exactly zero through eight successes in eight trials when p = 0.5 are shown in the graph in Figure 4.

Implementing a function that returns the binomial probability is straightforward:

static double BinomProb(int k, int n, double p)
{
  // Probability of k "successes" in n trials
  // if p is prob of success on a single trial
  BigInteger c = Choose(n, k);
  double left = Math.Pow(p, k);
  double right = Math.Pow(1.0 - p, n - k);
  return (double)c * left * right;
}

The Binomial Distribution for n = 8 and p = 0.5
Figure 4 The Binomial Distribution for n = 8 and p = 0.5

The demo defines a general binomial function that accepts p as a parameter. An alternative is to define a version that assumes p = 0.5 and simplify the calculation as described earlier. The demo has no error checking. For example, in a production environment you’d likely want to make sure k <= n; neither k nor n are negative; and p is between 0.0 and 1.0.

Implementing the Sign Test

The idea of the sign test is to calculate the probability that there’s been no effect. Conceptually this means any difference between a before-value and an after-value has happened purely by chance. Mathematically, this means that the probability of an increase or decrease is 0.5.

The sign test assumes there’s no effect, then calculates the probability that the observed number of successes could’ve happened under this assumption. For the case of the demo data where there were six successes (weight losses) in eight trials, rather than calculate the probability of exactly six successes as you might guess, you calculate the probability of six or more successes. This idea is rather subtle.

Calculating the probability of k or more successes is sometimes called a right-tail test. So to implement the sign test, you calculate the probability of k or more successes by calculating the probability of exactly k successes plus k+1 successes, plus k+2 successes and so on. The demo implements this as:

static double BinomRightTail(int k, int n, double p)
{
  // Probability of k or more successes in n trials
  double sum = 0.0;
  for (int i = k; i <= n; ++i)
    sum += BinomProb(i, n, p);
  return sum;
}

All that’s needed to complete a sign test are optional functions to count the number of successes and to display values. The demo defines the counting method as:

static int[] DoCounts(double[] before, double[] after)
{
  int[] result = new int[3];
  for (int i = 0; i < before.Length; ++i) {
    if (after[i] > before[i])
      ++result[0];  // Fail
    else if (after[i] < before[i])
      ++result[2]; // Success
    else
      ++result[0]; // Neither
  }
  return result;
}

The helper display method is:

static void ShowVector(string pre, double[] v, int dec, string post)
{
  Console.Write(pre);
  for (int i = 0; i < v.Length; ++i)
    Console.Write(v[i].ToString("F" + dec) + " ");
  Console.WriteLine(post);
}

An alternative design is to combine the success-failure counting and binomial calculations into a larger meta-method.

Wrapping Up

You should always interpret the results of a sign test cautiously. It’s better to say, “The sign test suggests that there is an effect,” rather than, “There is an effect.”

The example problem is called a one-sided, or one-tailed, test. Because this example involved a weight-loss experiment, an effect is more weight losses (successes) than you’d get by chance. You can also perform a two-sided, also called two-tailed, sign test. For example, suppose you’re doing an experiment with some sort of pain medication. As part of your experiment, you weigh your test subjects before and after the experiment. You have no reason to believe that the pain medication will affect weight. In other words, an effect would be either a weight loss or a weight gain.

The trickiest part of the sign test is keeping your definitions clear. There’s potential confusion because there are multiple symmetries in every problem. You can define a success as an increase or decrease in an after-value. For example, in the weight-loss example, a success is a decrease in the after-value. But if your data represents test scores on some kind of exam before and after studying, a success would likely be defined as an increase in the after-value.

The sign test is an example of what’s called a non-parametric statistical test. This means, in part, that the sign test does not make any assumptions about the distribution of the data being studied. Instead of using a sign test, it’s possible to use what’s called a paired t-test. However, the t-test assumes that the population data has a normal (Gaussian, bell-shaped) distribution, which would be almost impossible to verify with a small data set size. Because of this, when I want to investigate before and after data, I’ll usually use a sign test instead of a paired t-test.


Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Internet Explorer and Bing. Dr. McCaffrey can be reached at jammc@microsoft.com.

Thanks to the Microsoft technical experts who reviewed his article: Chris Lee and Kirk Olynyk


Discuss this article in the MSDN Magazine forum