Quality Improvement Charts

An implementation of statistical process control charts for R

Jacob Anhøj


The qicharts2 package contains two main functions for analysis and visualisation of data for continuous quality improvement: qic() and paretochart().

qic() provides a simple interface for constructing statistical process control (SPC) charts for time series data.

paretochart() constructs Pareto charts from categorical variables.

Statistical Process Control is not about statistics, it is not about “process-hyphen-control”, and it is not about conformance to specifications. […] It is about the continual improvement of processes and outcomes. And it is, first and foremost, a way of thinking with some tools attached.

Donald J. Wheeler. Understanding Variation, 2. ed., p 152

This vignette will teach you the easy part of SPC, the tools, as implemented by qicharts2 for R. I strongly recommend that you also study the way of thinking, for example by reading Wheeler’s excellent book.


Install stable version from CRAN:


Or install development version from GitHub:

devtools::install_github('anhoej/qicharts2', build_vignettes = TRUE)

Your first run and control charts

Load qicharts2 and generate some random numbers to play with:

# Load qicharts2

# Lock random number generator to make examples reproducible.

# Generate 24 random numbers from a normal distribution.
y <- rnorm(24)

Make a run chart:


Make an I control chart:

qic(y, chart = 'i')

Run charts and control charts are point-and-line graphs showing measures or counts over time. A horizontal line represents the process centre expressed by either the median (run charts) or the mean (control charts). In addition, control charts visualise the limits of the natural variation inherent in the process. Because of the way these limits are computed, they are often referred to as 3-sigma limits (see Appendix 1). Other terms for 3-sigma limits are control limits and natural process limits.

Understanding variation

SPC is the application of statistical thinking and statistical tools to continuous quality improvement. Central to SPC is the understanding of process variation over time.

The purpose of analysing process data is to know when a change occurs so that we can take appropriate action. However, numbers may change even if the process stays the same (and vice versa). So how do we distinguish changes in numbers that represent change of the underlying process from those that are essentially noise?

Walther A. Shewhart, who founded SPC, described two types of variation, chance cause variation and assignable cause variation. Today, the terms common cause and special cause variation are commonly used.

Common cause variation

Special cause variation

A process will be said to be predictable when, through the use of past experience, we can describe, at least within limits, how the process will behave in the future.

Donald J. Wheeler. Understanding Variation, 2. ed., p 153

The overall purpose of SPC charts is to tell the two types of variation apart.

Testing for special cause variation

As expected from random numbers, the previous run and control charts exhibit common cause variation. Special cause variation, if present, may be found by statistical tests to identify patterns in data that are unlikely to be the result of pure chance.

Shewhart’s 3-sigma rule

Shewhart devised one test to identify special cause variation, the 3-sigma rule, which signals if one or more data points fall outside the 3-sigma limits. The 3-sigma rule is effective in detecting larger (> 2SD) shifts in data.

# Simulate a special cause to y in the form of a large, transient shift of 4 standard deviations.
y[22] <- 4

qic(y, chart = 'i')

However, minor to moderate shifts may go unnoticed by the 3-sigma rule for long periods of time.

# Simulate a special cause to y in the form of a moderate, persistent shift of 2 standard deviations.
y[13:24] <- rnorm(12, mean = 2)

qic(y, chart = 'i')

Therefore, many additional tests have been proposed to increase the sensitivity to non-random variation.

It is important to realise that while it may be tempting to apply more tests in order to increase the sensitivity of the chart, more test will increase the risk of false signals, thereby reducing the specificity of the chart. The choice of which and how many tests to apply is therefore critical.

The Western Electric rules

The best known tests for non-random variation are probably the Western Electric Rules described in the Statistical Quality Control Handbook (Western Electric Company 1956). The WE rules consist of four simple tests that can be applied to control charts by visual inspection and are based on the identification of unusual patterns in the distribution of data points relative to the control and centre lines.

  1. One or more points outside of the 3-sigma limits (Shewhart’s original 3-sigma rule).

  2. Two out of three successive points beyond a 2-sigma limit (two thirds of the distance between the centre line and the control line).

  3. Four out of five successive points beyond a 1-sigma limit.

  4. A run of eight successive points on one side of the centre line.

The WE rules have proven their worth during half a century. One thing to notice is that the WE rules are most effective with control charts that have between 20 and 30 data points. With fewer data points, they lose sensitivity (more false negatives), and with more data points they lose specificity (more false positives).

The Anhøj rules

The least known tests for non-random variation are probably the Anhøj rules, proposed and validated by me in two recent publications (Anhøj 2014, Anhøj 2015). The Anhøj rules are the core of qicharts2, and consist of two tests that are based solely on the distribution of data points in relation to the centre line:

Critical values for longest runs and number of crossings for 10 – 100 data points are tabulated in Appendix 2.

Apart from being comparable in sensitivity and specificity to WE rules 2-4 with 20-30 data points, the Anhøj rules have some advantages:

In the previous control chart, a shift of 2 standard deviations was introduced halfway through the process. While the shift is too small to be picked up by the 3-sigma rule, both Anhøj rules detect the signal. The longest run includes 13 data points (upper limit = 8), and the number of crossings is 4 (lower limit = 8). Note that the shift is also picked up by WE rule 4.

Non-random variation found by the Anhøj rules is visualised by qic() using a red and dashed centre line.

The parameters of the analysis are available using the summary() function on a qic object. See ?summary.qic for details.

# Save qic object to o
o <- qic(y, chart = 'i')

# Print summary data frame
#>   facet1 facet2 part n.obs n.useful longest.run longest.run.max n.crossings
#> 1      1      1    1    24       24          13               8           4
#>   n.crossings.min runs.signal      aLCL   aLCL.95       CL  aUCL.95     aUCL
#> 1               8           1 -2.114884 -1.057559 1.057091 3.171742 4.229067
#>   sigma.signal
#> 1            0

Many additional tests for special cause variation have been proposed since Shewhart invented the 3-sigma rule, and many tests are currently being used. However, the more tests that are applied to the analysis, the higher the risk of false signals. qicharts2 takes a conservative approach by applying only three tests, the 3-sigma rule and the two Anhøj rules. That means that a signal from qic() most likely represents a significant change in the underlying process.

Choosing the right control chart

By default, qic() produces run charts. To make a control chart, the chart argument must be specified.

While there is only one type of run chart, there are many types of control charts. Each type uses an assumed probability model to compute the 3-sigma limits.

qic() includes eleven control charts.

Chart Description Assumed distribution
Charts for measures
I Individual measurements Gaussian
MR Moving ranges of individual measurements Gaussian
Xbar Sample means Gaussian
S Sample standard deviations Gaussian
T Time between rare events Exponential
Charts for counts
C Defect counts Poisson
U Defect rates Poisson
U’ U chart using Laney’s correction for large sample sizes Mixed
P Proportion of defectives Binomial
P’ P chart using Laney’s correction for large sample sizes Mixed
G Opportunities between rare events Geometric

I recommend to always begin any analysis with the “assumption free” run chart. If the Anhøj rules find non-random variation, it makes no sense to compute 3-sigma limits based on assumptions regarding the location and dispersion of data, because these figures are meaningless when one or more shifts have already been identified.

Over the years, I have developed an increasing affection for the much-neglected run chart: a time plot of your process data with the median drawn in as a reference (yes, the median – not the average). It is “filter No. 1” for any process data and answers the question: “Did this process have at least one shift during this time period?” (This is generally signaled by a clump of eight consecutive data points either all above or below the median.)

If it did, then it makes no sense to do a control chart at this time because the overall average of all these data doesn’t exist. (Sort of like: If I put my right foot in a bucket of boiling water and my left foot in a bucket of ice water, on average, I’m pretty comfortable.)

Davis Balestracci. Quality Digest 2014

Case 1: Patient harm

The gtt dataset contains data on patient harm during hospitalisation. Harm is defined as any unintended physical injury resulting from treatment or care during hospitalisation and not a result of the disease itself. The Global Trigger Tool is a methodology for detection of adverse events in hospital patients using a retrospective record review approach. Each month a random sample of 20 patient records is reviewed and the number of adverse events and patient days found in the sample are counted. See ?gtt for details.

#>   admission_id admission_dte discharge_dte      month days harms              E
#> 1            1    2010-01-02    2010-01-03 2010-01-01    2     0           <NA>
#> 2            2    2009-12-24    2010-01-06 2010-01-01   14     2 Pressure ulcer
#> 3            3    2009-12-28    2010-01-07 2010-01-01   11     0           <NA>
#> 4            4    2010-01-05    2010-01-08 2010-01-01    4     0           <NA>
#> 5            5    2010-01-05    2010-01-08 2010-01-01    4     0           <NA>
#> 6            6    2010-01-05    2010-01-11 2010-01-01    7     0           <NA>
#>                  F    G    H    I
#> 1             <NA> <NA> <NA> <NA>
#> 2 Gastrointestinal <NA> <NA> <NA>
#> 3             <NA> <NA> <NA> <NA>
#> 4             <NA> <NA> <NA> <NA>
#> 5             <NA> <NA> <NA> <NA>
#> 6             <NA> <NA> <NA> <NA>

Run chart of adverse events per 1000 patient days

qic(month, harms, days,
    data     = gtt,
    multiply = 1000,
    title    = 'Patient harm',
    ylab     = 'Adverse events per 1000 patient days',
    xlab     = 'Month')

Since only random variation is present in the run chart, a U control chart for harm rate is appropriate for computing the limits of the common cause variation.

U chart of adverse events per 1000 patient days

qic(month, harms, days,
    data     = gtt,
    chart    = 'u',
    multiply = 1000,
    title    = 'Patient harm (U chart)',
    ylab     = 'Adverse events per 1000 patient days',
    xlab     = 'Month')

Because the sample sizes (number of patient days) are not the same each month, the 3-sigma limits vary. Large samples give narrow limits and vice versa. Varying limits are also seen in P and Xbar charts.

Run and P control charts of percent harmed patients

To plot the percentage of harmed patients (patients with 1 or more harms) we need to add a variable and to aggregate data by month.


gtt_by_month <- gtt %>%
  mutate(harmed = harms > 0) %>% 
  group_by(month) %>% 
  summarise(harms    = sum(harms),
            days     = sum(days),
            n.harmed = sum(harmed),
            n        = n())

#> # A tibble: 6 × 5
#>   month      harms  days n.harmed     n
#>   <date>     <dbl> <dbl>    <int> <int>
#> 1 2010-01-01     5   113        3    20
#> 2 2010-02-01     7   132        6    20
#> 3 2010-03-01     5   121        4    20
#> 4 2010-04-01     7   134        5    20
#> 5 2010-05-01     7   116        7    20
#> 6 2010-06-01     5   131        5    20
qic(month, n.harmed,
    n         = n,
    data      = gtt_by_month,
    y.percent = TRUE,
    title     = 'Harmed patients',
    ylab      = NULL,
    xlab      = 'Month')

Again, since the run chart finds random variation, a control chart may be applied. For proportion (or percent) harmed patients we use a P chart.

qic(month, n.harmed, n,
    data      = gtt_by_month,
    chart     = 'p',
    y.percent = TRUE,
    title     = 'Harmed patients (P chart)',
    ylab      = NULL,
    xlab      = 'Month')

Since both the harm rate and the proportion of harmed patients exhibit common cause variation only, we conclude that the risk of adverse events during hospitalisation is predictable: we in the future should expect around 60 adverse events per 1000 patient days, harming around 33% of patients (regardless of their length of stay), although harm rates anywhere between ~0 – 60% would be considered within common cause variation.

Actually, we often do not need to aggregate data in advance. With a little creativity we can make qic do the hard work. The previous plot could have been produced with this code. Note the use of the original gtt dataset and the number of harmed and total number of patients created on the fly.

    harms > 0,        # TRUE if patient had one or more harms
    days > 0,         # Always TRUE 
    data      = gtt,
    chart     = 'p',
    y.percent = TRUE,
    title     = 'Harmed patients (P chart)',
    ylab      = NULL,
    xlab      = 'Month')

Pareto analysis of adverse event types and severity

The Pareto chart, named after Vilfred Pareto, was invented by Joseph M. Juran as a tool to identify the most important causes of a problem.

Types and severity of adverse events in gtt are organised in an untidy wide format. Since paretochart() needs a vector of categorical data, we need to rearrange harm severity and harm category into separate columns.


gtt_ae_types <- gtt %>%
  gather(severity, category, E:I) %>% 

#>   admission_id admission_dte discharge_dte      month days harms severity
#> 1            2    2009-12-24    2010-01-06 2010-01-01   14     2        E
#> 2           11    2010-01-08    2010-01-18 2010-01-01   11     2        E
#> 3           12    2010-01-11    2010-01-19 2010-01-01    9     1        E
#> 4           29    2010-01-27    2010-02-10 2010-02-01   15     1        E
#> 5           41    2010-02-24    2010-03-01 2010-03-01    6     1        E
#> 6           45    2010-02-22    2010-03-06 2010-03-01   13     1        E
#>           category
#> 1   Pressure ulcer
#> 2        Infection
#> 3   Pressure ulcer
#> 4   Pressure ulcer
#> 5 Gastrointestinal
#> 6 Gastrointestinal
            title = 'Pareto chart of harm category')

The bars show the number in each category, and the curve shows the cumulated percentage over categories. Almost 80% of harms come from 3 categories: gastrointestinal, infection, and procedure.

            title = 'Pareto chart of harm severity')

Almost all harms are temporary (E-F).

Case 2: Clostridium difficile infections

The cdi dataset contains data on hospital acquired Clostridium difficile infections (CDI) before and after an intervention to reduce the risk of CDI. See ?cdi for details.

#>        month  n     days period notes
#> 1 2012-11-01 17 14768.42    pre  <NA>
#> 2 2012-12-01 12 14460.33    pre  <NA>
#> 3 2013-01-01 27 16176.92    pre  <NA>
#> 4 2013-02-01 20 14226.50    pre  <NA>
#> 5 2013-03-01 20 14816.25    pre  <NA>
#> 6 2013-04-01 18 14335.29    pre  <NA>

Run charts for number of infections

We begin by plotting a run chart of the number of CDIs per month.

qic(month, n,
    notes    = notes,
    data     = cdi,
    decimals = 0,
    title    = 'Hospital acquired Clostridium difficile infections',
    ylab     = 'Count',
    xlab     = 'Month')

A downward shift in the number of CDIs is clearly visible to the naked eye and is supported by the runs analysis resulting in the centre line being dashed red. The shift seem to begin around the time of the intervention.

Even though it is not necessary in this case, we can strengthen the analysis by using the median of the before-intervention period to test for non-random variation in the after period.

qic(month, n,
    data        = cdi,
    decimals    = 0,
    freeze      = 24,
    part.labels = c('Baseline', 'Intervention'),
    title       = 'Hospital acquired Clostridium difficile infections',
    ylab        = 'Count',
    xlab        = 'Month')

The median number of CDIs per month in the before period is 19. An unusually long run of 15 data points below the median proves that the CDI rate is reduced.

When a shift in the desired direction is the result of a deliberate change to the process, we may split the chart to compare the numbers before and after the intervention.

qic(month, n,
    data     = cdi,
    decimals = 0,
    part     = 24,
    title    = 'Hospital acquired Clostridium difficile infections',
    ylab     = 'Count',
    xlab     = 'Month')

C chart for number of infections

Since both periods show only random variation, a control chart may be applied to test for larger transient shifts in data and to establish the natural limits of the current process. The correct chart in this case is the C chart for number of infections.

qic(month, n,
    data  = cdi,
    chart = 'c',
    part  = 24,
    title = 'Hospital acquired Clostridium difficile infections (C chart)',
    ylab  = 'Count',
    xlab  = 'Month')

The split C chart shows that the CDI rate has dropped from an average of 19 to 7 per month. Furthermore, the 3-sigma limits tell that the current process is predictable and that we in the future should expect between 0 and 15 CDIs per month.

U chart for infection rate

Until now we have assumed that the area of opportunity, or exposure, is constant, i.e., the number of patients or patient days have not changed significantly over time. When the area of opportunity varies, a U chart (U for unequal area of opportunity) is more appropriate.

qic(month, n, days,
    data     = cdi,
    chart    = 'u',
    part     = 24,
    multiply = 10000, 
    title    = 'Hospital acquired Clostridium difficile infections (U chart)',
    ylab     = 'Cases per 10,000 risk days',
    xlab     = 'Month')

In fact, U charts can always be used instead of C charts. When the area of opportunity is constant the two charts will reach the same conclusion. But the raw numbers in a C chart may be easier to communicate than counts per something.

It is very important to recognise that while run and control charts can identify non-random variation, they cannot identify its causes. The analysis only tells that the CDI rate was reduced after an intervention. The causal relationship between the intervention and the shift must be established using other methods.

When should you recompute the limits?

When splitting a chart as in the example above you make a deliberate decision based on your understanding of the process. Some software packages offer automated recalculation of limits whenever a shift is detected. I find this approach very wrong. Limits are the voice of the process and recomputing limits without understanding the context, which computers don’t, is like telling someone to shut up, when they are really trying to tell you something important.

Wheeler suggests that limits may be recomputed when the answers to all four of these question is “yes” (Wheeler 2010, p. 224):

If the answer to any of these question is “no”, you should rather seek to find the assignable cause of the shift and, if need be, eliminate it.

Thus, when a deliberate change has been made to the process and the chart finds a shift attributable to this change, it may be time to recompute the limits.

Case 3: Coronary artery bypass graft

The cabg dataset contains data on individual coronary artery bypass operations. See ?cabg for details.

#>         date    age gender los death readmission
#> 1 2011-07-01 72.534   Male   8 FALSE        TRUE
#> 2 2011-07-01 59.225   Male   9 FALSE       FALSE
#> 3 2011-07-04 66.841   Male  15 FALSE       FALSE
#> 4 2011-07-04 70.337   Male   9 FALSE       FALSE
#> 5 2011-07-04 60.816   Male   6 FALSE        TRUE
#> 6 2011-07-05 57.027   Male   9 FALSE       FALSE

I and MR charts for age of individual patients

In this and the following cases we skip the obligatory first step of using a run chart.

First, we will make a control chart of the age of the last 100 patients. Since we are plotting individual measurements, we use an I chart.

    data      = tail(cabg, 100), 
    chart     = 'i',
    show.grid = TRUE,
    title     = 'Age of the last 100 patients (I chart)',
    ylab      = 'Years',
    xlab      = 'Patient #')

Two data points (patients number 45 and 70) are below the lower 3-sigma limit indicating that these patients are “unusually” young, i.e. representative of phenomena that are not normally present in the process.

I charts are often displayed along moving range charts that show the ranges of neighbouring data points.

    data      = tail(cabg, 100), 
    chart     = 'mr',
    show.grid = TRUE,
    title     = 'Age of the last 100 patients (MR chart)',
    ylab      = 'Years',
    xlab      = 'Patient #')

The MR chart identifies three unusually large ranges, which are clearly produced by the two unusually young patients also found in the I chart. Note that each data point in an I chart is contained in two moving ranges in the MR chart.

We should seek the cause(s) of these special causes. We may then chose to eliminate the outliers from the computations of the process centre and 3-sigma limits in order to predict the expected age range of future patients.

    data    = tail(cabg, 100), 
    chart   = 'i',
    exclude = c(45, 70),
    title   = 'Age of the last 100 patients (I chart)',
    ylab    = 'Years',
    xlab    = 'Patient #')

Thus, on average we should expect future patients to be 67 years of age, and patients younger than 43 years or older than 91 years would be unusual.

Xbar and S charts for average and standard deviation of patient age

We could also plot the mean age of patients by month. For this we choose an Xbar chart for multiple measurement per subgroup. First, we need to add month as the grouping variable.

cabg <- cabg %>% 
  mutate(month = as.Date(cut(date, 'month')))

Using month as the x axis variable, qic() automatically calculates the average age per month.

qic(month, age,
    data  = cabg,
    chart = 'xbar',
    title = 'Average age (Xbar chart)',
    ylab  = 'Years',
    xlab  = 'Month')

Xbar charts are usually displayed along S charts displaying the within subgroup standard deviation.

qic(month, age,
    data  = cabg,
    chart = 's',
    title = 'Age standard deviation (S chart)',
    ylab  = 'Years',
    xlab  = 'Month')

Since both the Xbar and the S chart display common cause variation, we may conclude that the monthly average and standard deviation of patient age is predictable. Note that the mean of the Xbar chart is close to the mean of the individuals chart but that the 3-sigma limits are narrower.

P charts for proportion of patients who were readmitted or died within 30 after surgery

To plot readmissions and mortality we first need to summarise these counts by month.

cabg_by_month <- cabg %>% 
  group_by(month) %>% 
  summarise(deaths       = sum(death),
            readmissions = sum(readmission),
            n            = n())

#> # A tibble: 6 × 4
#>   month      deaths readmissions     n
#>   <date>      <int>        <int> <int>
#> 1 2011-07-01      1           14    52
#> 2 2011-08-01      3           12    64
#> 3 2011-09-01      2           15    70
#> 4 2011-10-01      1            8    60
#> 5 2011-11-01      1           16    67
#> 6 2011-12-01      3           11    69

Then we use P charts for proportion of readmissions.

qic(month, readmissions, n,
    data      = cabg_by_month,
    chart     = 'p',
    y.percent = TRUE,
    title     = 'Readmissions within 30 days (P chart)',
    ylab      = NULL,
    xlab      = 'Month')

On average, 22% of patients are readmitted within 30 days after surgery, and if nothing changes we would expect this to continue.

What about mortality?

qic(month, deaths, n,
    data      = cabg_by_month,
    chart     = 'p',
    y.percent = TRUE,
    title     = '30 days mortality (P chart)',
    ylab      = NULL,
    xlab      = 'Month')

Thankfully, post-surgery mortality is a rare event, and the P chart suggests common cause variation. However, it seems curious that only one patient died during the summer of 2012. To investigate this further, G or T charts for rare events are useful.

T and G charts for rare events

Rare events are often better analysed as time or opportunities between events. First, we need to calculate the number of surgeries and days between fatalities.

fatalities <- cabg %>% 
  mutate(x = row_number()) %>% 
  filter(death) %>% 
  mutate(dd = x - lag(x),
         dt = date - lag(date))

#>         date    age gender los death readmission      month   x dd      dt
#> 1 2011-07-21 91.227 Female   5  TRUE       FALSE 2011-07-01  37 NA NA days
#> 2 2011-08-06 71.773   Male   2  TRUE       FALSE 2011-08-01  60 23 16 days
#> 3 2011-08-26 73.942   Male   1  TRUE       FALSE 2011-08-01  99 39 20 days
#> 4 2011-08-31 80.293   Male   3  TRUE       FALSE 2011-08-01 114 15  5 days
#> 5 2011-09-13 75.359   Male  12  TRUE       FALSE 2011-09-01 148 34 13 days
#> 6 2011-09-14 72.019 Female  19  TRUE       FALSE 2011-09-01 149  1  1 days

Then we use a G chart for opportunities (surgeries) between events (deaths).

    data  = fatalities,
    chart = 'g',
    title = 'Surgeries between deaths (G chart)',
    ylab  = 'Count',
    xlab  = 'Death #')

If we do not have information on the opportunities (surgeries with survival), we may instead use a T chart for time between events.

    data  = fatalities,
    chart = 't',
    title = 'Days between deaths (T chart)',
    ylab  = 'Days',
    xlab  = 'Death #')

The conclusions from the two charts are the same: Deaths number 24 and 25 are unusually far apart. We should look for an assignable cause to explain this as it may help us get ideas to reduce postoperative mortality in the future.

Faceting readmission rates by gender

To demonstrate the use of faceted charts, we summarise readmissions by month and gender.

cabg_by_month_gender <- cabg %>% 
  group_by(month, gender) %>% 
  summarise(readmissions = sum(readmission),
            n            = n())
#> `summarise()` has grouped output by 'month'. You can override using the
#> `.groups` argument.

#> # A tibble: 6 × 4
#> # Groups:   month [3]
#>   month      gender readmissions     n
#>   <date>     <fct>         <int> <int>
#> 1 2011-07-01 Female            2     9
#> 2 2011-07-01 Male             12    43
#> 3 2011-08-01 Female            0     9
#> 4 2011-08-01 Male             12    55
#> 5 2011-09-01 Female            3    18
#> 6 2011-09-01 Male             12    52

Then we use the facets argument to make a control chart for each gender while keeping the x and y axes the same.

qic(month, readmissions, n,
    data      = cabg_by_month_gender,
    facets    = ~ gender, 
    chart     = 'p',
    y.percent = TRUE,
    title     = 'Readmissions within 30 days (P chart)',
    ylab      = '',
    xlab      = 'Month')