SNSeg supports change-points estimation for both univariate and multivariate time series (including high-dimensional time series with dimension greater than 10.) using Self-Normalization (SN) based framework. Please read Zhao, Jiang and Shao (2022) <doi.org/10.1111/rssb.12552> for details of the SN-based algorithms.

The package contain three functions for change-points estimation:

- The function
`SNSeg_Uni`

works for a univariate time series. - The function
`SNSeg_Multi`

works for multivariate time series with dimension up to ten. - The function
`SNSeg_HD`

works for high-dimensional time series with dimension above ten.

All functions contain two important input arguments:
`grid_size_scale`

and `grid_size`

.

`grid_size_scale`

: the trimming parameter \(\epsilon\), a parameter to control the local window size`grid_size`

. The range of`grid_size_scale`

should be between 0.05 and 0.5. Any input`grid_size_scale`

smaller than 0.05 will be automatically changed to 0.05, and any input`grid_size_scale`

greater than 0.5 will be automatically changed to 0.5.`grid_size`

: the local window size \(h\) for local scanning for each time point. According to Zhao et al. (2022),`grid_size`

is a product of`grid_size_scale`

and the length of time.

Users can set their own `grid_size`

or leave it as
`NULL`

in the input arguments. If `grid_size`

is
`NULL`

, these functions will compute the SN test statistic
using the value of `grid_size_scale`

. If
`grid_size`

is set by users, the functions will first
calculate `grid_size_scale`

by diving the length of time, and
then compute the SN test statistic.

For the other input arguments: * `ts`

: Users should enter
a time series for this argument. For `SNSeg_Uni`

, the
dimension of `ts`

should be exactly one in most cases (or two
when `paras_to_test = 'bivcor'`

); for
`SNSeg_Multi`

, the dimension must be at least two but no more
than ten; for `SNSeg_HD`

, the dimension must be greater than
ten. * `paras_to_test`

: This argument in functions
`SNSeg_Uni`

and `SNSeg_Multi`

allows users to
enter the parameter(s) they would like to test the change in. *
`confidence`

: Users should choose a confidence level among
0.9, 0.95, 0.99, 0.995 and 0.995. A smaller confidence level is easier
to reject the null hypothesis and detect a change-point.

The package also offers an option to plot the time series (this only
works for the univariate time series cases!) Users need to set
`plot_SN = TRUE`

to visualize the time series, and
`est_cp_loc = TRUE`

to add the estimated change-point
locations in the plot.

`max_SNsweep`

To visualize the computed SN-based test statistic at each time point,
users can apply the function `max_SNsweep`

by plugging in the
output object from one of the functions `SNSeg_Uni`

,
`SNSeg_Multi`

and `SNSeg_HD`

. The options
`est_cp_loc = TRUE`

and `critical_loc = TRUE`

are
provided to draw the estimated change-point locations and the critical
value threshold inside the test statistic plot.

`max_SNsweep`

also returns the SN-based test statistic for
each time point. A large number of test statistics in the output can be
messy for users who only seek to generate a plot. To hide the test
statistics output, users can create an arbitrary variable name and set
it to the `max_SNsweep`

function, e.g.,
`SN_stat <- max_SNsweep(...)`

.

`SNSeg_estimate`

The function `SNSeg_estimate`

allows users to compute the
parameter estimates (e.g., mean, variance, acf, quantile, etc.) of each
of the segments separated by the estimated change-points. To use this
function, users should use the output of the functions
`SNSeg_Uni`

, `SNSeg_Multi`

and
`SNSeg_HD`

as the input of `SNSeg_estimate`

.

The typical S3 methods `summary`

, `print`

and
`plot`

are available to `SNSeg_Uni`

,
`SNSeg_Multi`

and `SNSeg_HD`

objects. The
`summary`

method displays the parameter to be tested, the
estimated change-point amount and locations, the `grid_size`

,
confidence level as well as the critical value of the SN-based test. The
`print`

method shows the change-point locations. The
`plot`

method plots the time series, and similar to the
argument `plot_SN = TRUE`

, the `plot`

method
allows users to generate time series segmentation plot(s) with the
estimated change-point locations. It also provides `ts_index`

option to allow users to plot any individual time series they want.
Users can apply their preferred color of the change-point(s) within the
plot(s) by setting `cpts.col`

to any color.

We then provide the examples of the functions `SNSeg_Uni`

,
`SNSeg_Multi`

and `SNSeg_HD`

.

`SNSeg_Uni`

:The function `SNSeg_Uni`

detect change-points for a
univariate time series based on the change in a single or
parameters.

- For a univariate time series, the input argument
`paras_to_test`

allows one or a combination of multiple parameters from “mean”, “variance”, “acf” and a numeric quantile value between 0 and 1. For instance, to test the change in autocorrelation and 80th quantile, users should set`paras_to_test`

to c(‘acf’, 0.8). - To test the change in correlation between two univariate time
series, the input time series must be two-dimensional and
`paras_to_test`

should be set to`bivcor`

. - Users can also set
`paras_to_test`

using their own parameters. In this case, the input of`paras_to_test`

needs to be a function which returns a numeric value.

We provide examples for different cases:

```
# Please run the following function before running examples:
<- function(u,p,trunc_r,gpd_scale,gpd_shape){
mix_GauGPD <- u<p
indicator <- rep(0, length(u))
rv >0] <- qtruncnorm(u[indicator>0]/p,a=-Inf,b=trunc_r)
rv[indicator<=0] <- qgpd((u[indicator<=0]-p)/(1-p), loc=trunc_r, scale=gpd_scale,shape=gpd_shape)
rv[indicatorreturn(rv)
}
```

```
set.seed(7)
<- 2000
n <- 2
reptime <- round(n*c(0,cumsum(c(0.5,0.25)),1))
cp_sets <- c(0.4,0,0.4)
mean_shift <- -0.7
rho <- MAR(n, reptime, rho)
ts <- length(cp_sets)-1
no_seg for(index in 1:no_seg){ # Mean shift
<- cp_sets[index]+1
tau1 <- cp_sets[index+1]
tau2 :tau2,] <- ts[tau1:tau2,] + mean_shift[index]
ts[tau1
}<- ts[,2]
ts # grid_size undefined
<- SNSeg_Uni(ts, paras_to_test = "mean", confidence = 0.9,
result grid_size_scale = 0.05, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = FALSE)
# grid_size defined & generate time series segmentation plot
<- SNSeg_Uni(ts, paras_to_test = "mean", confidence = 0.9,
result grid_size_scale = 0.05, grid_size = 116,
plot_SN = TRUE, est_cp_loc = TRUE)
```

```
# Estimated change-point locations
$est_cp
result#> [1] 1018 1487
# Parameter estimates (mean) of each segment
SNSeg_estimate(result)
#> $mean
#> [1] 0.39444849 0.02107613 0.38362925
#>
#> attr(,"class")
#> [1] "SNSeg_estimate"
# plot the SN-based test statistic
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
```

We then show how to use the S3 methods `summary`

,
`print`

and `plot`

.

```
summary(result)
#> There are 2 change-point(s) detected at 90th confidence level based on the change in the single mean parameter.
#>
#> The critical value of SN-based test is 135.39283594
#>
#> The detected change-point location(s) are 1018,1487 with a grid_size of 116
```

```
print(result)
#> The detected change-point location(s) are 1018,1487
```

`plot(result, cpts.col = 'red')`

We can see both the `plot_SN = TRUE`

option and the
`plot.SN`

function generates the same plot. Users can select
any choice based on their preferences. The class of the argument
`result`

is `SNSeg_Uni`

.

```
set.seed(7)
<- MAR_Variance(2, "V1")
ts <- ts[,2]
ts # grid_size defined
<- SNSeg_Uni(ts, paras_to_test = "variance", confidence = 0.9,
result grid_size_scale = 0.05, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result# Parameter estimates (variance) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
```

```
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')
```

```
set.seed(7)
<- MAR_Variance(2, "A3")
ts <- ts[,2]
ts # grid_size defined
<- SNSeg_Uni(ts, paras_to_test = "acf", confidence = 0.9,
result grid_size_scale = 0.05, grid_size = 92, plot_SN = FALSE,
est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result# Parameter estimates (acf) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
```

```
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')
```

```
library(mvtnorm)
set.seed(7)
<- 1000
n <- list(4*matrix(c(1,0.8,0.8,1), nrow=2),
sigma_cross matrix(c(1,0.2,0.2,1), nrow=2),
matrix(c(1,0.8,0.8,1), nrow=2))
<- round(c(0,n/3,2*n/3,n))
cp_sets <- length(cp_sets)-2
noCP <- rep(0.5, noCP+1)
rho_sets <- MAR_MTS_Covariance(n, 2, rho_sets, cp_sets, sigma_cross)
ts <- ts[1][[1]]
ts # grid_size defined
<- SNSeg_Uni(ts, paras_to_test = "bivcor", confidence = 0.9,
result grid_size_scale = 0.05, grid_size = 77,
plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result# Parameter estimates (bivariate correlation) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
```

```
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')
```

Please download the packages `truncnorm`

and
`evd`

before running the following code.

```
library(truncnorm)
library(evd)
set.seed(7)
<- 1000
n <- c(0,n/2,n)
cp_sets <- length(cp_sets)-2
noCP <- 2
reptime <- 0.2
rho # AR time series with no change-point (mean, var)=(0,1)
<- MAR(n, reptime, rho)*sqrt(1-rho^2)
ts <- 0
trunc_r <- pnorm(trunc_r)
p <- 2
gpd_scale <- 0.125
gpd_shape for(ts_index in 1:reptime){
2]+1):n, ts_index] <- mix_GauGPD(pnorm(ts[(cp_sets[2]+1):n, ts_index]),
ts[(cp_sets[
p,trunc_r,gpd_scale,gpd_shape)
}<- ts[,2]
ts # grid_size undefined
# test in 90% quantile
<- SNSeg_Uni(ts, paras_to_test = 0.9, confidence = 0.9,
result grid_size_scale = 0.066, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result# Parameter estimates (90th quantile) of each segment
SNSeg_estimate(result)
# plot the SN-based test statistic
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
```

```
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')
```

To perform the SN-based test using any general functional, users
should define their own function as the input of
`paras_to_test`

. The input of this function should consist of
the followings:

`ts`

: A univariate time series provided by the user.

The user-defined function should return a numeric value as the output.

```
set.seed(7)
<- 500
n <- 2
reptime <- round(n*c(0,cumsum(c(0.5,0.25)),1))
cp_sets <- c(0.4,0,0.4)
mean_shift <- -0.7
rho <- MAR(n, reptime, rho)
ts <- length(cp_sets)-1
no_seg for(index in 1:no_seg){ # Mean shift
<- cp_sets[index]+1
tau1 <- cp_sets[index+1]
tau2 :tau2,] <- ts[tau1:tau2,] + mean_shift[index]
ts[tau1
}<- ts[,2]
ts # Define a general functional for the input 'paras_to_test'
= function(ts){
paras_to_test mean(ts)
}<- SNSeg_Uni(ts, paras_to_test = paras_to_test,
result.SNCP.general confidence = 0.9, grid_size_scale = 0.05,
grid_size = NULL, plot_SN = FALSE,
est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result.SNCP.general# Parameter estimates (general functional) of each segment
SNSeg_estimate(result.SNCP.general)
# plot the SN-based test statistic
<- max_SNsweep(result.SNCP.general, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
```

```
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')
```

```
set.seed(7)
<- 1000
n <- c(0,333,667,1000)
cp_sets <- length(cp_sets)-1
no_seg <- 0
rho # AR time series with no change-point (mean, var)=(0,1)
<- MAR(n, 2, rho)*sqrt(1-rho^2)
ts <- length(cp_sets)-1
no_seg <- c(1,1.6,1)
sd_shift for(index in 1:no_seg){ # Mean shift
<- cp_sets[index]+1
tau1 <- cp_sets[index+1]
tau2 :tau2,] <- ts[tau1:tau2,]*sd_shift[index]
ts[tau1
}<- 2
d <- ts[,2]
ts
# Test in 90th and 95th quantile with grid_size undefined
<- SNSeg_Uni(ts, paras_to_test = c(0.9, 0.95), confidence = 0.9,
result grid_size_scale = 0.05, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = FALSE)
# Test in 90th quantile and the variance with grid_size undefined
<- SNSeg_Uni(ts, paras_to_test = c(0.9, 'variance'),
result confidence = 0.95, grid_size_scale = 0.078,
grid_size = NULL, plot_SN = FALSE,
est_cp_loc = FALSE)
# Test in 90th quantile, variance and acf with grid_size undefined
<- SNSeg_Uni(ts, paras_to_test = c(0.9,'variance', "acf"),
result confidence = 0.9, grid_size_scale = 0.064,
grid_size = NULL, plot_SN = TRUE,
est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp result
```

```
# Test in 60th quantile, mean, variance and acf with grid_size defined
<- SNSeg_Uni(ts, paras_to_test = c(0.6, 'mean', "variance",
result.last "acf"), confidence = 0.9, grid_size_scale = 0.05,
grid_size = 83, plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result.last# Parameter estimates (of the last example) of each segment
SNSeg_estimate(result.last)
# SN-based test statistic segmentation plot
<- max_SNsweep(result.last, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
```

```
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red')
```

`SNSeg_Multi`

The function `SNSeg_Multi`

detects change-points for
multivariate time series based on the change in multivariate means or
covariances. Users can set `paras_to_test`

to
`mean`

or `covariance`

for change-points
estimation.

- The dimension of parameters to be tested must be at most ten. If the
parameter is
`mean`

, the dimension is equivalent to the number of time series. If the parameter is`covariance`

, the dimension is equivalent to the number of unknown parameters within the covariance matrix.

For examples of multivariate means and covariances, please check the following commands:

```
# Please run this function before running examples:
<- function(d, rho){
exchange_cor_matrix <- matrix(rho, d, d)
tmp diag(tmp) <- 1
return(tmp)
}
```

```
library(mvtnorm)
set.seed(10)
<- 5
d <- 1000
n <- round(n*c(0,cumsum(c(0.075,0.3,0.05,0.1,0.05)),1))
cp_sets <- c(-3,0,3,0,-3,0)/sqrt(d)
mean_shift <- sign(mean_shift)*ceiling(abs(mean_shift)*10)/10
mean_shift <- 0.5
rho_sets <- list(exchange_cor_matrix(d,0))
sigma_cross <- MAR_MTS_Covariance(n, 2, rho_sets, cp_sets=c(0,n), sigma_cross)
ts <- length(cp_sets)-2
noCP <- length(cp_sets)-1
no_seg for(rep_index in 1:2){
for(index in 1:no_seg){ # Mean shift
<- cp_sets[index]+1
tau1 <- cp_sets[index+1]
tau2 :tau2] <- ts[[rep_index]][,tau1:tau2] + mean_shift[index]
ts[[rep_index]][,tau1
}
}<- ts[1][[1]]
ts
# grid_size undefined
<- SNSeg_Multi(ts, paras_to_test = "mean", confidence = 0.95,
result grid_size_scale = 0.079, grid_size = NULL,
plot_SN = FALSE, est_cp_loc = TRUE)
# grid_size defined
<- SNSeg_Multi(ts, paras_to_test = "mean", confidence = 0.99,
result grid_size_scale = 0.05, grid_size = 65,
plot_SN = FALSE, est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result# Parameter estimates (multivariate mean) of each segment
SNSeg_estimate(result)
# SN-based test statistic segmentation plot
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
```

```
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
par(mfrow=c(2,3))
plot(result, cpts.col = 'red')
```

```
library(mvtnorm)
set.seed(10)
<- 2
reptime <- 4
d <- 1000
n <- list(exchange_cor_matrix(d,0.2),
sigma_cross 2*exchange_cor_matrix(d,0.5),
4*exchange_cor_matrix(d,0.5))
<- c(0.3,0.3,0.3)
rho_sets <- c(0,0,0) # with mean change
mean_shift <- round(c(0,n/3,2*n/3,n))
cp_sets <- MAR_MTS_Covariance(n, reptime, rho_sets, cp_sets, sigma_cross)
ts <- length(cp_sets)-2
noCP <- length(cp_sets)-1
no_seg for(rep_index in 1:reptime){
for(index in 1:no_seg){ # Mean shift
<- cp_sets[index]+1
tau1 <- cp_sets[index+1]
tau2 :tau2] <- ts[[rep_index]][,tau1:tau2] +
ts[[rep_index]][,tau1
mean_shift[index]
}
}<- ts[[1]]
ts
# grid_size undefined
<- SNSeg_Multi(ts, paras_to_test = "covariance",
result confidence = 0.9, grid_size_scale = 0.05,
grid_size = NULL, plot_SN = FALSE, est_cp_loc = FALSE)
# grid_size defined
<- SNSeg_Multi(ts, paras_to_test = "covariance",
result confidence = 0.9, grid_size_scale = 0.05,
grid_size = 81, plot_SN = FALSE,
est_cp_loc = TRUE)
# Estimated change-point locations
$est_cp
result# Parameter estimates (covariance estimate) of each segment
SNSeg_estimate(result)
# SN-based test statistic segmentation plot
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
```

```
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
par(mfrow=c(1,1))
plot(result, cpts.col = 'red')
```

`SNSeg_HD`

The function `SNSeg_HD`

performs change-points estimation
for high-dimensional time series (dimension greater than 10) using the
change in high-dimensional means. For usage examples of
`SNSeg_HD`

, we provide the followig code:

```
<- 600
n <- 100
p <- 5
nocp <- round(seq(0,nocp+1,1)/(nocp+1)*n)
cp_sets <- 5
num_entry <- sqrt(4/5) # Wang et al(2020)
kappa <- rep(c(0,kappa),100)[1:(length(cp_sets)-1)]
mean_shift set.seed(1)
<- matrix(rnorm(n*p,0,1),n,p)
ts <- length(cp_sets)-1
no_seg for(index in 1:no_seg){ # Mean shift
<- cp_sets[index]+1
tau1 <- cp_sets[index+1]
tau2 :tau2,1:num_entry] <- ts[tau1:tau2,1:num_entry] +
ts[tau1# sparse change
mean_shift[index]
}# SN segmentation plot
# grid_size undefined (plot the first 3 time series)
<- SNSeg_HD(ts, confidence = 0.9, grid_size_scale = 0.05,
result grid_size = NULL, plot_SN = FALSE, est_cp_loc = TRUE,
ts_index = c(1:3))
# grid_size defined (plot the 1st, 3rd and 5th time series)
<- SNSeg_HD(ts, confidence = 0.9, grid_size_scale = 0.05,
result grid_size = 52, plot_SN = FALSE, est_cp_loc = TRUE,
ts_index = c(1,3,5))
# Estimated change-point locations
$est_cp
result# Parameter estimates (high-dimensional means) of each segment
<- SNSeg_estimate(result)
summary.stat # SN-based test statistic segmentation plot
<- max_SNsweep(result, plot_SN = TRUE, est_cp_loc = TRUE,
SN_stat critical_loc = TRUE)
```

```
# built-in functions
# summary statistic
summary(result)
# print
print(result)
# plot
plot(result, cpts.col = 'red', ts_index = c(1:3))
```

We note that only the selected time series were plotted by setting
values for `ts_index`

.