permshap()
to calculate exact permutation SHAP values. The function currently works for up to 14 features.S
and SE
lists.feature_names
as dimnames (https://github.com/ModelOriented/kernelshap/issues/96).ks_extract()
function. It was designed to extract objects like the matrix S
of SHAP values from the resulting “kernelshap” object x
. We feel that the standard extraction options (x$S
, x[["S"]]
, or getElement(x, "S")
) are sufficient.X
, and \(K\) is the dimension of a single prediction (usually 1).verbose = FALSE
now does not suppress the warning on too large background data anymore. Use suppressWarnings()
instead.bg_X
contained more columns than X
, unflexible prediction functions could fail when being applied to bg_X
.feature_names
allows to specify the features to calculate SHAP values for. The default equals to colnames(X)
. This should be changed only in situations when X
(the dataset to be explained) contains non-feature columns.Thanks to David Watson, exact calculations are now also possible for \(p>5\) features. By default, the algorithm uses exact calculations for \(p \le 8\) and a hybrid strategy otherwise, see the next section. At the same time, the exact algorithm became much more efficient.
A word of caution: Exact calculations mean to create \(2^p-2\) on-off vectors \(z\) (cheap step) and evaluating the model on a whopping \((2^p-2)N\) rows, where \(N\) is the number of rows of the background data (expensive step). As this explodes with large \(p\), we do not recommend the exact strategy for \(p > 10\).
The iterative Kernel SHAP sampling algorithm of Covert and Lee (2021) [1] works by randomly sample \(m\) on-off vectors \(z\) so that their sum follows the SHAP Kernel weight distribution (renormalized to the range from \(1\) to \(p-1\)). Based on these vectors, many predictions are formed. Then, Kernel SHAP values are derived as the solution of a constrained linear regression, see [1] for details. This is done multiple times until convergence.
A drawback of this strategy is that many (at least 75%) of the \(z\) vectors will have \(\sum z \in \{1, p-1\}\), producing many duplicates. Similarly, at least 92% of the mass will be used for the \(p(p+1)\) possible vectors with \(\sum z \in \{1, 2, p-1, p-2\}\) etc. This inefficiency can be fixed by a hybrid strategy, combining exact calculations with sampling. The hybrid algorithm has two steps:
The default behaviour of kernelshap()
is as follows:
It is also possible to use a pure sampling strategy, see Section “User visible changes” below. While this is usually not advisable compared to a hybrid approach, the options of kernelshap()
allow to study different properties of Kernel SHAP and doing empirical research on the topic.
Kernel SHAP in the Python implementation “shap” uses a quite similar hybrid strategy, but without iterating. The new logic in the R package thus combines the efficiency of the Python implementation with the convergence monitoring of [1].
[1] Ian Covert and Su-In Lee. Improving KernelSHAP: Practical Shapley Value Estimation Using Linear Regression. Proceedings of The 24th International Conference on Artificial Intelligence and Statistics, PMLR 130:3457-3465, 2021.
m
is reduced from \(8p\) to \(2p\) except when hybrid_degree = 0
(pure sampling).exact
is now TRUE
for \(p \le 8\) instead of \(p \le 5\).hybrid_degree
is introduced to control the exact part of the hybrid algorithm. The default is 2 for \(4 \le p \le 16\) and degree 1 otherwise. Set to 0 to force a pure sampling strategy (not recommended but useful to demonstrate superiority of hybrid approaches).tol
was reduced from 0.01 to 0.005.max_iter
was reduced from 250 to 100.m
.print()
is now more slim.summary()
function shows more infos.m_exact
(the number of on-off vectors used for the exact part), prop_exact
(proportion of mass treated in exact fashion), exact
flag, and txt
(the info message when starting the algorithm).mgcv::gam()
would cause an error in check_pred()
(they are 1D-arrays).The interface of kernelshap()
has been revised. Instead of specifying a prediction function, it suffices now to pass the fitted model object. The default pred_fun
is now stats::predict
, which works in most cases. Some other cases are catched via model class (“ranger” and mlr3 “Learner”). The pred_fun
can be overwritten by a function of the form function(object, X, ...)
. Additional arguments to the prediction function are passed via ...
of kernelshap()
.
Some examples:
kernelshap(fit, X, bg_X)
kernelshap(fit, X, bg_X, type = "response")
kernelshap(fit, X, bg_X, pred_fun = function(m, X) exp(predict(m, X)))
kernelshap()
has received a more intuitive interface, see breaking change above.kernelshap()
, e.g., using the “doFuture” package, and then set parallel = TRUE
. Especially on Windows, sometimes not all global variables or packages are loaded in the parallel instances. These can be specified by parallel_args
, a list of arguments passed to foreach()
.kernelshap()
has become much faster.matrix
, data.frame
s, and tibble
s, the package now also accepts data.table
s (if the prediction function can deal with them).kernelshap()
is less picky regarding the output structure of pred_fun()
.kernelshap()
is less picky about the column structure of the background data bg_X
. It should simply contain the columns of X
(but can have more or in different order). The old behaviour was to launch an error if colnames(X) != colnames(bg_X)
.m = "auto"
has been changed from trunc(20 * sqrt(p))
to max(trunc(20 * sqrt(p)), 5 * p
. This will have an effect for cases where the number of features \(p > 16\). The change will imply more robust results for large p.ks_extract(, what = "S")
.MASS::ginv()
, the Moore-Penrose pseudoinverse using svd()
.This is the initial release.