This vignette provides an introduction to cregg, a package for analyzing and visualizing the results of conjoint experiments, which are factorial discrete choice experiments that are increasingly popular in the political and social sciences for studying decision making and preferences over multidimensional issues. cregg provides functionality that is useful for analyzing and otherwise examining data from these designs, namely:

  • Estimation of average marginal component effects (AMCEs) for fully randomized conjoint designs (as well as designs involving an unlimited number of two-way constraints between features) and munging of AMCE estimates into tidy data frames, via amce()
  • Calculation of marginal means (MMs) for conjoint designs and munging them into tidy data frames via mm()
  • Tabulation of display frequencies of feature attributes via cj_table() and cj_freqs() and cross-tabulation of feature restrictions using cj_props()
  • Tidying of raw “wide”-format conjoint survey datasets into “long” or “tidy” datasets using cj_tidy()

In addition, the package provides a number of tools that are likely useful to conjoint analysts:

  • ggplot2-based visualizations of AMCEs and MMs, via plot() methods for all of the above
  • Tidying of raw “wide”-format conjoint survey datasets into “long” or “tidy” datasets using cj_tidy()
  • Diagnostics to choose feature reference categories, via amce_by_reference()

To demonstrate package functionality, the package includes three example datasets:

  • taxes, a full randomized choice task conjoint experiment conducted by Ballard-Rosa et al. (2016)
  • immigration, a partial factorial conjoint experiment with several restrictions between features conducted by Hainmueller, Hopkins, and Yamamoto (2014)
  • conjoint_wide, a simulated “wide”-format conjoint dataset that is used to demonstrate functionality of cj_tidy()

The design of cregg follows a few key principles:

  • Following tidy data principles throughout, so that all of the main functions produce consistently structured, metadata-rich data frames. Thus the response from any function is a tidy data frame that can easily be stacked with others (e.g., for computing AMCEs for subsets of respondents) and then producing ggplot2 visualizations.
  • A formula-based interface that meshes well with the underlying survey-based effect estimation API.
  • A consistent interface for both unconstrained and two-way constrained designs that relies only on formula notation without any package-specific “design” specification. Conjoint designs involving two-way constraints between features are easily supported using simple formula notation: Y ~ A + B + C implies an unconstrained design, while Y ~ A * B + C implies a constraint between levels of features A and B. cregg figures out the constrained level pairs automatically without needing to further specify them explicitly.

cregg also provides some sugar:

  • Using “label” attributes on variables to provide pretty printing, with options to relabel features or plots on the fly
  • Using factor base levels rather than trying to set baseline levels atomically
  • A convenient API (via the cj(..., by = ~ group) idiom) for repeated, subgroup operations without the need for lapply() or for loops
  • All functions have arguments in data-formula order, making it simple to pipe into them via the magrittr pipe (%>%).

The package, whose primary point of contact is cj(), takes its name from the surname of a famous White House Press Secretary.

Contributions and feedback are welcome on GitHub.

Code Examples

The package includes several example conjoint datasets, which are used here and in examples:


Marginal Means

The package provides straightforward calculation and visualization of descriptive marginal means (MMs). These represent the mean outcome across all appearances of a particular conjoint feature level, averaging across all other features. In forced choice conjoint designs with two profiles per choice task, MMs by definition average 0.5 with values above 0.5 indicating features that increase profile favorability and values below 0.5 indicating features that decrease profile favorability. (They will average 0.33 in designs with three profiles, 0.25 with four profiles, etc.) For continuous/ordinal outcomes, MMs can take any value in the full range of the outcome. Calculation of MMs entail no modelling assumptions are simply descriptive quantities of interest:

# descriptive plotting
f1 <- ChosenImmigrant ~ Gender + LanguageSkills + PriorEntry + Education * Job + CountryOfOrigin * ReasonForApplication + 
    JobExperience + JobPlans
plot(mm(immigration, f1, id = ~CaseID), vline = 0.5)

cregg functions use attr(data$feature, "label") to provide pretty printing of feature labels, so that variable names can be arbitrary. These can be overwritten using the feature_labels argument to override these settings within cregg functions. (To overwrite/create feature labels permanently, use for example attr(immigration$LanguageSkills, "label") <- "English Proficiency".) Feature levels are always deduced from the levels() of righthand-side variables in the model specification. All variables should be factors with levels in desired display order. Similarly, the plotted order of features is given by the order of terms in the RHS formula unless overridden by the order of variable names given in feature_order.

Average Marginal Component Effects

A more common analytic approach for conjoints is to estimate average marginal component effects (AMCEs) using some form of regression analysis. cregg uses glm() and svyglm() to perform estimation and margins to generate average marginal effect estimates. Designs can be specified with any interactions between conjoint features but only AMCEs are returned. Any terms that linked by a * in the formula are treated as design constraints and AMCEs are estimated cognizant of these constraints; only two-way interactions are supported, however. Just like for mm(), the output of cj() (or its alias, amce()) is a tidy data frame:

# estimation
amces <- cj(immigration, f1, id = ~CaseID)
head(amces[c("feature", "level", "estimate", "std.error")], 20L)
                  feature                    level     estimate   std.error
1                  Gender                   Female  0.000000000          NA
2                  Gender                     Male -0.026023159 0.008012413
3         Language Skills           Fluent English  0.000000000          NA
4         Language Skills           Broken English -0.056319723 0.011312971
5         Language Skills Tried English but Unable -0.126359527 0.011370259
6         Language Skills         Used Interpreter -0.159740917 0.011589128
7             Prior Entry                    Never  0.000000000          NA
8             Prior Entry          Once as Tourist  0.055954954 0.012463168
9             Prior Entry    Many Times as Tourist  0.054748425 0.012912603
10            Prior Entry   Six Months with Family  0.075317887 0.012603718
11            Prior Entry   Once w/o Authorization -0.110084275 0.013026767
12 Educational Attainment                No Formal  0.000000000          NA
13 Educational Attainment                4th Grade  0.033068508 0.014957991
14 Educational Attainment                8th Grade  0.057744013 0.014961024
15 Educational Attainment              High School  0.119483476 0.015101284
16 Educational Attainment         Two-Year College  0.163405352 0.023021124
17 Educational Attainment           College Degree  0.190036705 0.023063509
18 Educational Attainment          Graduate Degree  0.176068029 0.016690780
19                    Job                  Janitor  0.000000000          NA
20                    Job                   Waiter -0.006814709 0.016856109

This makes it very easy to modify, combine, print, etc. the resulting output. It also makes it easy to visualize using ggplot2. A convenience visualization function is provided:

# plotting of AMCEs

Reference Category Diagnostics for AMCEs

Reference categories for AMCEs are often arbitrary and can affect intuitions about results, so the package also provides a diagnostic tool for helping to decide on an appropriate reference category:

amce_diagnostic <- amce_by_reference(immigration, ChosenImmigrant ~ LanguageSkills, ~LanguageSkills, id = ~CaseID)
plot(amce_diagnostic, group = "REFERENCE", legend_title = "Reference Category")