Introduction to gkrreg: Gaussian Kernel Robust Regression

Overview

The gkrreg package implements the Gaussian Kernel Robust Regression (GKRReg, or GKRR) method proposed by De Carvalho et al. (2017). The method fits a linear regression model by iteratively re-weighting observations using the Gaussian kernel, so that outliers and leverage points are automatically down-weighted. Convergence is theoretically guaranteed (Propositions 4.1 and 4.2 of the paper).

The package provides:

gkrr() — model fitting with four width hyper-parameter estimators ("s1", "s2", "s3", "s4") plus automatic data-driven selection ("auto");
gkrr_boot() — bootstrap inference (standard errors, confidence intervals and p-values);
summary.gkrr() — a coefficient table modelled after summary.lm(), with sandwich inference by default and bootstrap inference when available;
plot.gkrr() — six diagnostic panels emphasising kernel weights;
Six real datasets from the robust regression literature.

The GKRR method

Model and objective function

GKRReg minimises

\[S(\boldsymbol{\beta}) = 2\sum_{i=1}^n \bigl[1 - G(y_i,\, \hat{\mu}_i)\bigr], \qquad G(a,b) = \exp\!\bigl(-{(a-b)^2}/{\gamma^2}\bigr),\]

where \(\hat{\mu}_i = \mathbf{x}_i^\top\boldsymbol{\beta}\) and \(\gamma^2 > 0\) is the kernel width hyper-parameter. An observation with a large residual contributes close to 2 to \(S\); a perfectly fitted observation contributes 0.

Estimation algorithm

Differentiating \(S\) yields an IRWLS problem with kernel weights \(k_{ii}^{(t)} = G(y_i, \hat{\mu}_i^{(t-1)}) \in (0,1]\). The algorithm starts from OLS and alternates between WLS and weight updates until convergence (Propositions 4.1–4.2 of De Carvalho et al. (2017)).

Width hyper-parameter \(\gamma^2\)

Five options are available via the sigma_method argument:

Method	Description	Recommended scenario
`"s1"`	Mean of the 0.1 and 0.9 quantiles of OLS squared residuals on a sub-sample (Caputo estimator)	Clean data
`"s2"`	Pairwise median of \((y_i - \hat{\mu}_j^{\text{OLS}})^2\), \(i \neq j\)	Y-space outliers \(\leq 10\%\); X-space outliers \(\leq 15\%\)
`"s3"`	Residual variance \(\sum(y_i - \hat{\mu}_i)^2 / (n - p - 1)\)	Y-space outliers \(\geq 15\%\); leverage points
`"s4"`	AICc-selected bandwidth \(h^2\) via `sm::h.select()`	Large samples (\(n \geq 200\))
`"auto"`	Selects among S1–S4 by out-of-bag bootstrap MSE	When the outlier scenario is unknown

When "auto" is used, a message is emitted showing which method was selected and the comparative OOB MSEs. The selection uses \(B = 99\) replicates by default, configurable via auto_args = list(B = ...).

Basic usage

library(gkrreg)

data(belgium_calls)

fit <- gkrr(calls ~ year, data = belgium_calls, sigma_method = "s3")
print(fit)
#> 
#> GKRReg -- Gaussian Kernel Robust Regression
#> --------------------------------------------
#> gamma^2 : 31.61  (method: s3)
#> Iterations: 9  [converged]
#> 
#> Coefficients:
#> (Intercept)        year 
#>     -6.5764      0.1351 
#> 
#> (Sandwich inference available -- use summary() for the full table)

summary() uses the sandwich variance estimator by default, producing standard errors, confidence intervals and Wald z-test p-values with no additional computation:

summary(fit)
#> 
#> Call:
#> gkrr(formula = calls ~ year, data = belgium_calls, sigma_method = "s3")
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -0.6165 -0.1917 -0.0271  3.5213 18.4537 
#> 
#> Coefficients:
#>             Estimate Std. Error CI 95% lower CI 95% upper   p-value    
#> (Intercept)  -6.5764     1.1145      -8.7609      -4.3920 3.621e-09 ***
#> year          0.1351     0.0203       0.0954       0.1748 2.529e-11 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> (Sandwich SE (asymptotic Wald z-test))
#> 
#> gamma^2: 31.61  |  method: s3  |  iterations: 9  |  converged: TRUE
#> R-squared: -0.1219  |  Weighted R-squared: 0.5499
#> Kernel weights -- min: 0.0000  mean: 0.7501  max: 1.0000
#> Note: sandwich inference may be less reliable here (n = 24 (small sample); 12% of observations received near-zero kernel weight (< 0.01)).
#>   Consider bootstrap inference via boot = TRUE in gkrr() or
#>   summary(fit, boot = gkrr_boot(fit)).

The sandwich covariance matrix is accessible via vcov():

round(vcov(fit), 6)
#>             (Intercept)      year
#> (Intercept)    1.242183 -0.022546
#> year          -0.022546  0.000410

Statistical inference

Sandwich variance estimator (default)

The original paper does not provide a sampling distribution for \(\hat{\boldsymbol{\beta}}\). The WLS covariance matrix \((X^\top \hat{K} X)^{-1}\) from the final IRWLS step treats \(\hat{K}\) as fixed and therefore underestimates the true variance.

gkrreg implements an analytic sandwich variance estimator based on the theory of generalised \(M\)-estimators. At convergence the GKRReg estimating equations are:

\[\sum_{i=1}^n k_{ii}(\boldsymbol{\beta})\,\mathbf{x}_i (y_i - \mathbf{x}_i^\top\boldsymbol{\beta}) = \mathbf{0},\]

and the asymptotic sandwich covariance is \(n^{-1}\hat{A}^{-1}\hat{B}\hat{A}^{-1}\), where

\[\hat{A} = \frac{1}{n}\,\mathbf{X}^\top \operatorname{diag}\!\left[k_{ii}\!\left(1 - \frac{2\hat{e}_i^2}{\hat{\gamma}^2}\right)\right] \mathbf{X}, \qquad \hat{B} = \frac{1}{n}\,\mathbf{X}^\top \operatorname{diag}\!\left(k_{ii}^2\,\hat{e}_i^2\right)\mathbf{X}.\]

This corresponds to the HC0 class of heteroskedasticity-robust covariance estimators (white1982?). The correction term \((1 - 2\hat{e}_i^2/\hat{\gamma}^2)\) in \(\hat{A}\) arises directly from differentiating the Gaussian kernel with respect to \(\boldsymbol{\beta}\).

When is the sandwich reliable? The sandwich is consistent and efficient asymptotically, but treats \(\hat{\gamma}^2\) as fixed. For small samples (\(n < 50\)) or heavy contamination, bootstrap inference is recommended. summary() emits a note when these conditions are detected automatically.

Bootstrap inference

gkrr_boot() runs a pairs bootstrap, re-fitting the complete GKRReg algorithm — including re-estimating \(\gamma^2\) — on each replicate. This captures all sources of variability and is more reliable than the sandwich for small samples or heavy contamination.

Three CI types are available: "percentile", "normal", and "bca" (bias-corrected and accelerated, Efron (1987); the default and recommended). Bootstrap p-values use the centred-t approach:

\[p_j = \frac{2}{B}\,\#\!\left\{|\hat\beta_j^* - \hat\beta_j| \geq |\hat\beta_j|\right\}.\]

Option A — boot = TRUE inside gkrr()

fit_b <- gkrr(calls ~ year, data = belgium_calls,
              sigma_method = "s3",
              boot      = TRUE,
              boot_args = list(B = 999, type = "bca", seed = 1))
summary(fit_b)

Option B — separate gkrr_boot() call

boot <- gkrr_boot(fit, B = 999, type = "bca", seed = 1)
summary(fit, boot = boot)
plot(boot, which = 1)   # histogram + shaded CI per coefficient
plot(boot, which = 2)   # bootstrap scatter-plot matrix

Choosing between sandwich and bootstrap

Scenario	Recommended
\(n \geq 50\), mild contamination	Sandwich (fast, deterministic)
\(n < 50\) or heavy contamination	Bootstrap
Both available and SEs diverge by > 25%	Bootstrap

When both are available, summary() automatically warns if the relative discrepancy in standard errors exceeds 25% for any coefficient.

Diagnostic plots

plot.gkrr() provides six panels. Point size is inversely proportional to \(k_{ii}\): outliers appear large and red; well-fitted observations appear small and blue.

oldpar <- par(no.readonly = TRUE)
on.exit(par(oldpar))
par(mfrow = c(1, 3))
plot(fit, which = 1, ask = FALSE)   # residuals vs. fitted
plot(fit, which = 3, ask = FALSE)   # weight vs. residual + kernel curve
plot(fit, which = 4, ask = FALSE)   # weight vs. index

par(oldpar)

Panel 3 overlays the theoretical curve \(G(e) = \exp(-e^2/\hat{\gamma}^2)\), making the down-weighting mechanism directly visible. Observations 15–20 in belgium_calls (a recording error) receive near-zero weights and are effectively excluded from the fit.

`which`	Description
1	Residuals vs. fitted values
2	Observed vs. fitted (\(y\) vs. \(\hat{\mu}\))
3	Kernel weight vs. residual with theoretical curve
4	Kernel weight vs. observation index
5	Normal QQ-plot of residuals, coloured by weight
6	Convergence of \(S(\boldsymbol{\beta})\)

Comparison with other methods

data(kootenay)

fit_ols  <- lm(newgate ~ libby, data = kootenay)
fit_gkrr <- gkrr(newgate ~ libby, data = kootenay, sigma_method = "s1")

if (requireNamespace("MASS", quietly = TRUE)) {
  fit_m  <- MASS::rlm(newgate ~ libby, data = kootenay, method = "M")
  fit_mm <- MASS::rlm(newgate ~ libby, data = kootenay, method = "MM")
} else { fit_m <- fit_mm <- NULL }

tab <- rbind(OLS  = coef(fit_ols),
             GKRR = coef(fit_gkrr))
if (!is.null(fit_m))
  tab <- rbind(tab, M = coef(fit_m), MM = coef(fit_mm))
print(round(tab, 4))
#>      (Intercept)   libby
#> OLS      23.1649 -0.0138
#> GKRR      5.4561  0.6216
#> M        23.1942 -0.0186
#> MM        5.4667  0.6209

plot(kootenay$libby, kootenay$newgate,
     xlab = "Libby flow", ylab = "Newgate flow",
     main = "Kootenay River -- X-space outlier (1934)",
     pch = 19, col = "grey60")
points(kootenay["1934","libby"], kootenay["1934","newgate"],
       col = "red", pch = 17, cex = 1.6)
abline(fit_ols,  col = "black",     lwd = 2, lty = 2)
abline(fit_gkrr, col = "firebrick", lwd = 2)
if (!is.null(fit_m)) {
  abline(fit_m,  col = "darkorange", lwd = 2, lty = 3)
  abline(fit_mm, col = "purple",     lwd = 2, lty = 4)
  legend("topleft", c("OLS","GKRR","M","MM","1934"),
         col = c("black","firebrick","darkorange","purple","red"),
         lty = c(2,1,3,4,NA), pch = c(NA,NA,NA,NA,17), lwd = 2, bty = "n")
} else {
  legend("topleft", c("OLS","GKRR","1934"),
         col = c("black","firebrick","red"),
         lty = c(2,1,NA), pch = c(NA,NA,17), lwd = 2, bty = "n")
}

Real-data applications

Belgium international calls — Y-space outliers

fit_ols  <- lm(calls ~ year, data = belgium_calls)
fit_gkrr <- gkrr(calls ~ year, data = belgium_calls, sigma_method = "s3")

plot(belgium_calls$year + 1900, belgium_calls$calls,
     xlab = "Year", ylab = "Calls (tens of millions)",
     main = "Belgium International Calls",
     pch = 19, col = "grey60")
points(belgium_calls$year[15:20] + 1900, belgium_calls$calls[15:20],
       col = "red", pch = 17, cex = 1.4)
abline(fit_ols,  col = "black",     lwd = 2, lty = 2)
abline(fit_gkrr, col = "firebrick", lwd = 2)
legend("topleft", c("OLS","GKRR (S3)","Outliers (1964-69)"),
       col = c("black","firebrick","red"),
       lty = c(2,1,NA), pch = c(NA,NA,17), lwd = 2, bty = "n")

oldpar <- par(no.readonly = TRUE)
on.exit(par(oldpar))
par(mfrow = c(1, 3))
plot(fit_gkrr, which = 1, ask = FALSE)
plot(fit_gkrr, which = 3, ask = FALSE)
plot(fit_gkrr, which = 4, ask = FALSE)

par(oldpar)

Delivery time — leverage points in multiple regression

data(delivery)
fit_ols  <- lm(delivery_time ~ n_products + distance, data = delivery)
fit_gkrr <- gkrr(delivery_time ~ n_products + distance, data = delivery,
                 sigma_method = "s3")
round(rbind(OLS  = coef(fit_ols),
            GKRR = coef(fit_gkrr)), 4)
#>      (Intercept) n_products distance
#> OLS       2.3412     1.6159   0.0144
#> GKRR      4.0607     1.3891   0.0145

oldpar <- par(no.readonly = TRUE)
on.exit(par(oldpar))
par(mfrow = c(1, 3))
plot(fit_gkrr, which = 2, ask = FALSE)
plot(fit_gkrr, which = 4, ask = FALSE)
plot(fit_gkrr, which = 5, ask = FALSE)

par(oldpar)

Mammals — leverage on the log scale

data(mammals)
fit_ols  <- lm(log_brain ~ log_body, data = mammals)
fit_gkrr <- gkrr(log_brain ~ log_body, data = mammals, sigma_method = "s3")

plot(mammals$log_body, mammals$log_brain,
     xlab = "log(body mass, kg)", ylab = "log(brain mass, g)",
     main = "Brain vs. Body Mass (62 Mammal Species)",
     pch = 19, col = "grey60", cex = 0.8)
elephants <- mammals$species %in% c("African elephant","Asian elephant")
points(mammals$log_body[elephants], mammals$log_brain[elephants],
       col = "red", pch = 17, cex = 1.5)
abline(fit_ols,  col = "black",     lwd = 2, lty = 2)
abline(fit_gkrr, col = "firebrick", lwd = 2)
legend("topleft", c("OLS","GKRR (S3)","Elephants"),
       col = c("black","firebrick","red"),
       lty = c(2,1,NA), pch = c(NA,NA,17), lwd = 2, bty = "n")

Session information

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: x86_64-pc-linux-gnu
#> Running under: Ubuntu 24.04.4 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
#> LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Etc/UTC
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] gkrreg_0.4.0   rmarkdown_2.31
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39    R6_2.6.1         fastmap_1.2.0    xfun_0.58       
#>  [5] maketools_1.3.2  cachem_1.1.0     knitr_1.51       htmltools_0.5.9 
#>  [9] buildtools_1.0.0 lifecycle_1.0.5  cli_3.6.6        sass_0.4.10     
#> [13] jquerylib_0.1.4  sm_2.2-6.0       compiler_4.6.0   sys_3.4.3       
#> [17] tools_4.6.0      evaluate_1.0.5   bslib_0.11.0     yaml_2.3.12     
#> [21] otel_0.2.0       jsonlite_2.0.0   rlang_1.2.0      MASS_7.3-65

References

De Carvalho, Francisco de A. T., Eufrásio de A. Lima Neto, and Marcelo R. P. Ferreira. 2017. “A Robust Regression Method Based on Exponential-Type Kernel Functions.” Neurocomputing 234: 58–74. https://doi.org/10.1016/j.neucom.2016.12.035.

Efron, Bradley. 1987. “Better Bootstrap Confidence Intervals.” Journal of the American Statistical Association 82 (397): 171–85. https://doi.org/10.1080/01621459.1987.10478410.