| Title: | Gaussian Kernel Robust Regression (GKRReg) |
|---|---|
| Description: | Implements the Gaussian Kernel Robust Regression (GKRReg / GKRR) method proposed by De Carvalho, Lima Neto and Ferreira (2017) <doi:10.1016/j.neucom.2016.12.035>. The method re-weights observations iteratively using the Gaussian kernel so that poorly-fitted observations (outliers, leverage points) receive small weights, yielding resistance to Y-space outliers, X-space outliers and leverage points. Convergence is guaranteed by Propositions 4.1 and 4.2 of the original paper. Three estimators for the kernel width hyper-parameter are provided (S1: Caputo, S2: pairwise median, S3: residual variance). Inference is provided via an analytic sandwich variance estimator (default) or via bootstrap (percentile, normal and BCa intervals with p-values) through gkrr_boot(). Six real datasets from the robust regression literature are included to facilitate reproducible comparisons. |
| Authors: | Eufrásio de Andrade Lima Neto [aut], Marcelo Rodrigo Portela Ferreira [aut, cre] |
| Maintainer: | Marcelo Rodrigo Portela Ferreira <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 0.4.0 |
| Built: | 2026-06-18 09:06:28 UTC |
| Source: | https://github.com/marcelorpf/gkrreg |
Number of international telephone calls made from Belgium between 1950 and 1973, taken from the Belgian Statistical Survey published by the Ministry of Economy. The data contain a block of Y-space outliers (observations 15–20) caused by a recording error in the original source: calls were recorded in total minutes instead of number of calls for those years.
belgium_callsbelgium_calls
A data frame with 24 rows and 2 columns:
Last two digits of the calendar year (50 = 1950, ..., 73 = 1973).
Number of international calls originating from Belgium, in tens of millions.
This dataset is used in De Carvalho et al. (2017), Section 6.2, to
illustrate the robustness of GKRR to Y-space outliers. The recommended
width estimator for this scenario is sigma_method = "s3".
Observations 15–20 (years 1964–1969) are Y-space outliers. Their recorded values are anomalously large relative to the linear trend of the remaining years because calls were measured in total minutes rather than number of calls.
P.J. Rousseeuw and A.M. Leroy (1987). Robust Regression and Outlier Detection. John Wiley & Sons, Table 6, p. 26.
Available in robustbase as telef.
De Carvalho, F.A.T., Lima Neto, E.A., Ferreira, M.R.P. (2017). A robust regression method based on exponential-type kernel functions. Neurocomputing, 234, 58–74. doi:10.1016/j.neucom.2016.12.035
data(belgium_calls) # Fit competing methods and compare fit_ols <- lm(calls ~ year, data = belgium_calls) fit_gkrr <- gkrr(calls ~ year, data = belgium_calls, sigma_method = "s3") plot(belgium_calls$year + 1900, belgium_calls$calls, xlab = "Year", ylab = "Calls (tens of millions)", main = "Belgium International Calls", pch = 19, col = "grey60") abline(fit_ols, col = "black", lwd = 2, lty = 2) abline(fit_gkrr, col = "firebrick", lwd = 2) legend("topleft", c("OLS", "GKRR"), col = c("black","firebrick"), lty = c(2,1), lwd = 2, bty = "n")data(belgium_calls) # Fit competing methods and compare fit_ols <- lm(calls ~ year, data = belgium_calls) fit_gkrr <- gkrr(calls ~ year, data = belgium_calls, sigma_method = "s3") plot(belgium_calls$year + 1900, belgium_calls$calls, xlab = "Year", ylab = "Calls (tens of millions)", main = "Belgium International Calls", pch = 19, col = "grey60") abline(fit_ols, col = "black", lwd = 2, lty = 2) abline(fit_gkrr, col = "firebrick", lwd = 2) legend("topleft", c("OLS", "GKRR"), col = c("black","firebrick"), lty = c(2,1), lwd = 2, bty = "n")
Measurements on the cloud point of a liquid, which is a measure of the
degree of crystallization in a stock. The dataset contains three leverage
points (baseline measurements at percentage_i8 = 0) that pull
poorly-specified models away from the underlying trend.
cloud_pointcloud_point
A data frame with 19 rows and 2 columns:
Percentage of I-8 in the liquid mixture (0–10).
Cloud point temperature of the liquid (degrees Celsius).
This dataset is used in De Carvalho et al. (2017), Section 6.3, to
illustrate robustness to leverage points. The recommended width estimator
is sigma_method = "s3".
Observations 1, 10 and 16 have percentage_i8 = 0 and act as leverage
points because they correspond to a baseline condition that is far from the
centroid of the predictor space.
N.R. Draper and R. Craig Smith (1998). Applied Regression Analysis, 3rd edition. John Wiley & Sons, Exercise 4, p. 629.
Available in robustbase as cloud.
De Carvalho, F.A.T., Lima Neto, E.A., Ferreira, M.R.P. (2017). doi:10.1016/j.neucom.2016.12.035
data(cloud_point) fit_ols <- lm(cloud_point ~ percentage_i8, data = cloud_point) fit_gkrr <- gkrr(cloud_point ~ percentage_i8, data = cloud_point, sigma_method = "s3") plot(cloud_point$percentage_i8, cloud_point$cloud_point, xlab = "Percentage I-8", ylab = "Cloud point (C)", main = "Cloud Point Data", pch = 19, col = "grey60") abline(fit_ols, col = "black", lwd = 2, lty = 2) abline(fit_gkrr, col = "firebrick", lwd = 2) legend("topleft", c("OLS", "GKRR"), col = c("black","firebrick"), lty = c(2,1), lwd = 2, bty = "n")data(cloud_point) fit_ols <- lm(cloud_point ~ percentage_i8, data = cloud_point) fit_gkrr <- gkrr(cloud_point ~ percentage_i8, data = cloud_point, sigma_method = "s3") plot(cloud_point$percentage_i8, cloud_point$cloud_point, xlab = "Percentage I-8", ylab = "Cloud point (C)", main = "Cloud Point Data", pch = 19, col = "grey60") abline(fit_ols, col = "black", lwd = 2, lty = 2) abline(fit_gkrr, col = "firebrick", lwd = 2) legend("topleft", c("OLS", "GKRR"), col = c("black","firebrick"), lty = c(2,1), lwd = 2, bty = "n")
Data on the time required by a route driver to service vending machines at 25 stops. Two predictors (number of products delivered and distance travelled) are used to model delivery time. Observation 9 is a bad leverage point: it has both a very large number of products (30) and a very large distance (1460 feet), pulling regression fits away from the main trend.
deliverydelivery
A data frame with 25 rows and 3 columns:
Number of products stocked (cases).
Distance walked by the driver (feet).
Total service time at the stop (minutes).
This dataset is used in De Carvalho et al. (2017), Section 6.4, to
illustrate robustness to leverage points in multiple regression. The
recommended width estimator is sigma_method = "s3".
Observation 9 (n_products = 30, distance = 1460) is a
leverage point that simultaneously deviates in the predictor space and
has an unusually large response, making it a bad leverage point.
D.C. Montgomery and E.A. Peck (1992). Introduction to Linear Regression Analysis, 2nd edition. John Wiley & Sons, Table B.7.
Available in robustbase as delivery.
De Carvalho, F.A.T., Lima Neto, E.A., Ferreira, M.R.P. (2017). doi:10.1016/j.neucom.2016.12.035
data(delivery) fit_ols <- lm(delivery_time ~ n_products + distance, data = delivery) fit_gkrr <- gkrr(delivery_time ~ n_products + distance, data = delivery, sigma_method = "s3") # Compare coefficients rbind(OLS = coef(fit_ols), GKRR = coef(fit_gkrr)) # Diagnostic plot: kernel weights highlight observation 9 plot(fit_gkrr, which = 4, ask = FALSE)data(delivery) fit_ols <- lm(delivery_time ~ n_products + distance, data = delivery) fit_gkrr <- gkrr(delivery_time ~ n_products + distance, data = delivery, sigma_method = "s3") # Compare coefficients rbind(OLS = coef(fit_ols), GKRR = coef(fit_gkrr)) # Diagnostic plot: kernel weights highlight observation 9 plot(fit_gkrr, which = 4, ask = FALSE)
Fits a robust linear regression model using the Gaussian kernel to
down-weight poorly fitted observations (outliers and leverage points).
Weights are updated at each
iteration of an IRWLS until the objective function
converges.
gkrr( formula, data = NULL, sigma_method = c("s1", "s2", "s3", "s4", "auto"), alpha = 0.5, tol = 1e-10, maxit = 100L, boot = FALSE, boot_args = list(), auto_args = list(), conf = 0.95 )gkrr( formula, data = NULL, sigma_method = c("s1", "s2", "s3", "s4", "auto"), alpha = 0.5, tol = 1e-10, maxit = 100L, boot = FALSE, boot_args = list(), auto_args = list(), conf = 0.95 )
formula |
Model formula, e.g. |
data |
Optional |
sigma_method |
Estimator for |
alpha |
Fraction of the sample used in estimator S1
(default |
tol |
Convergence tolerance (default |
maxit |
Maximum number of iterations (default |
boot |
Logical or a
|
boot_args |
Named list of extra arguments passed to
|
auto_args |
Named list of arguments controlling the |
conf |
Confidence level for sandwich intervals (default
|
Convergence is guaranteed by Propositions 4.1 and 4.2 of De Carvalho et al. (2017).
When boot = TRUE a gkrr_boot object is computed and
stored inside the fitted model, making it available automatically to
summary.gkrr for inference (confidence intervals and
bootstrap p-values).
An object of class "gkrr" containing:
coefficientsNamed vector of estimated coefficients.
fitted.valuesFitted values .
residualsResiduals .
weightsFinal kernel weights .
gamma2Value of used.
criterionSequence of values across iterations.
iterationsNumber of iterations performed.
convergedLogical; was convergence achieved?
sigma_methodLabel of the estimator.
sandwichA list with sandwich inference components:
vcov, se, tval, pval, ci_lo,
ci_hi, conf. Always computed.
bootA "gkrr_boot" object if boot != FALSE,
otherwise NULL.
call, terms, model
Model metadata.
De Carvalho, F.A.T., Lima Neto, E.A., Ferreira, M.R.P. (2017). A robust regression method based on exponential-type kernel functions. Neurocomputing, 234, 58–74. doi:10.1016/j.neucom.2016.12.035
gkrr_boot for bootstrap inference,
summary.gkrr for the inference table.
set.seed(1) n <- 80 x <- runif(n, 0, 10) y <- 2 + 3 * x + rnorm(n) y[1:12] <- y[1:12] + 25 # 15% Y-space outliers # Basic fit with sandwich inference (default) fit <- gkrr(y ~ x, sigma_method = "s3") summary(fit) # Using S4 estimator (AICc bandwidth) fit_s4 <- gkrr(y ~ x, sigma_method = "s4") summary(fit_s4) # Automatic estimator selection fit_auto <- gkrr(y ~ x, sigma_method = "auto", auto_args = list(B = 49, seed = 1)) summary(fit_auto) # Fit with bootstrap inference (BCa, B = 999 by default) fit_b <- gkrr(y ~ x, sigma_method = "s3", boot = TRUE, boot_args = list(B = 499, seed = 1)) summary(fit_b) plot(fit_b)set.seed(1) n <- 80 x <- runif(n, 0, 10) y <- 2 + 3 * x + rnorm(n) y[1:12] <- y[1:12] + 25 # 15% Y-space outliers # Basic fit with sandwich inference (default) fit <- gkrr(y ~ x, sigma_method = "s3") summary(fit) # Using S4 estimator (AICc bandwidth) fit_s4 <- gkrr(y ~ x, sigma_method = "s4") summary(fit_s4) # Automatic estimator selection fit_auto <- gkrr(y ~ x, sigma_method = "auto", auto_args = list(B = 49, seed = 1)) summary(fit_auto) # Fit with bootstrap inference (BCa, B = 999 by default) fit_b <- gkrr(y ~ x, sigma_method = "s3", boot = TRUE, boot_args = list(B = 499, seed = 1)) summary(fit_b) plot(fit_b)
Runs a pairs bootstrap to estimate standard errors, confidence intervals
and (optionally) p-values for the coefficients of a gkrr
model. The full fitting algorithm is re-executed on every replicate,
including re-estimation of .
gkrr_boot( object, B = 999L, type = c("bca", "percentile", "normal"), conf = 0.95, seed = NULL, verbose = FALSE )gkrr_boot( object, B = 999L, type = c("bca", "percentile", "normal"), conf = 0.95, seed = NULL, verbose = FALSE )
object |
A |
B |
Number of bootstrap replicates (default |
type |
CI type: |
conf |
Confidence level, scalar in (0, 1) (default |
seed |
Integer seed for reproducibility (optional). |
verbose |
Logical. If |
An object of class "gkrr_boot" containing:
t0Original coefficient estimates (length ).
tMatrix of bootstrap
estimates.
seBootstrap standard errors.
biasBootstrap bias .
ciMatrix of CIs with rows
"lower" and "upper".
pvalBootstrap p-values (centred-t, two-sided).
confConfidence level used.
typeCI type used.
BNumber of successful replicates.
B_failedReplicates discarded due to non-convergence.
callThe matched call.
Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397), 171–185.
De Carvalho, F.A.T., Lima Neto, E.A., Ferreira, M.R.P. (2017). doi:10.1016/j.neucom.2016.12.035
set.seed(42) n <- 80 x <- runif(n, 0, 10) y <- 2 + 3 * x + rnorm(n) y[1:12] <- y[1:12] + 25 # 15% Y-space outliers fit <- gkrr(y ~ x, sigma_method = "s3") boot <- gkrr_boot(fit, B = 499, seed = 1) print(boot) summary(boot) plot(boot)set.seed(42) n <- 80 x <- runif(n, 0, 10) y <- 2 + 3 * x + rnorm(n) y[1:12] <- y[1:12] + 25 # 15% Y-space outliers fit <- gkrr(y ~ x, sigma_method = "s3") boot <- gkrr_boot(fit, B = 499, seed = 1) print(boot) summary(boot) plot(boot)
Annual water-flow measurements (in units of cubic feet) at two
gauging stations on the Kootenay River: Libby (Montana, USA) and Newgate
(British Columbia, Canada). The dataset contains a single X-space outlier
(year 1934) where the Libby measurement is anomalously large, likely due to
an upstream dam operation.
kootenaykootenay
A data frame with 13 rows and 2 columns, with row names giving the year of measurement (1931–1943):
Water flow at Libby, Montana ( cubic feet).
Water flow at Newgate, British Columbia
( cubic feet).
This dataset is used in De Carvalho et al. (2017), Section 6.5, to
illustrate robustness to X-space outliers. The recommended width estimator
is sigma_method = "s1".
The observation for year 1934 has an anomalously large libby value
(77.6 vs. a typical range of 17–39), making it an X-space outlier that
strongly influences naive regression fits.
J. Neter, M.H. Kutner, C.J. Nachtsheim, and W. Wasserman (1996). Applied Linear Statistical Models, 4th edition. Irwin/McGraw-Hill, p. 414.
Available in robustbase as kootenay.
De Carvalho, F.A.T., Lima Neto, E.A., Ferreira, M.R.P. (2017). doi:10.1016/j.neucom.2016.12.035
data(kootenay) fit_ols <- lm(newgate ~ libby, data = kootenay) fit_gkrr <- gkrr(newgate ~ libby, data = kootenay, sigma_method = "s1") plot(kootenay$libby, kootenay$newgate, xlab = "Libby flow (10^6 ft^3)", ylab = "Newgate flow (10^6 ft^3)", main = "Kootenay River Water Flow", pch = 19, col = "grey60") # Highlight the X-space outlier (1934) points(kootenay["1934", "libby"], kootenay["1934", "newgate"], col = "red", pch = 17, cex = 1.5) abline(fit_ols, col = "black", lwd = 2, lty = 2) abline(fit_gkrr, col = "firebrick", lwd = 2) legend("topleft", c("OLS", "GKRR", "1934 (outlier)"), col = c("black","firebrick","red"), lty = c(2,1,NA), pch = c(NA,NA,17), lwd = 2, bty = "n")data(kootenay) fit_ols <- lm(newgate ~ libby, data = kootenay) fit_gkrr <- gkrr(newgate ~ libby, data = kootenay, sigma_method = "s1") plot(kootenay$libby, kootenay$newgate, xlab = "Libby flow (10^6 ft^3)", ylab = "Newgate flow (10^6 ft^3)", main = "Kootenay River Water Flow", pch = 19, col = "grey60") # Highlight the X-space outlier (1934) points(kootenay["1934", "libby"], kootenay["1934", "newgate"], col = "red", pch = 17, cex = 1.5) abline(fit_ols, col = "black", lwd = 2, lty = 2) abline(fit_gkrr, col = "firebrick", lwd = 2) legend("topleft", c("OLS", "GKRR", "1934 (outlier)"), col = c("black","firebrick","red"), lty = c(2,1,NA), pch = c(NA,NA,17), lwd = 2, bty = "n")
Body mass (kg) and brain mass (g) for 62 species of land mammals. On the natural logarithm scale the relationship is approximately linear, but the dataset contains several leverage points (very large mammals such as African and Asian elephants) that strongly influence OLS estimates.
mammalsmammals
A data frame with 62 rows and 5 columns:
Common name of the mammal species (character).
Body mass in kilograms.
Brain mass in grams.
Natural logarithm of body mass.
Natural logarithm of brain mass.
On the log scale, African elephant (log_body 8.8) and
Asian elephant (log_body 7.8) are high-leverage
observations. Additionally, some primates (e.g., humans) deviate from the
overall trend (Y-space outliers relative to the bulk regression line).
P.J. Rousseeuw and A.M. Leroy (1987). Robust Regression and Outlier Detection. John Wiley & Sons.
Originally from: T. Allison and D.V. Cicchetti (1976). Sleep in mammals: ecological and constitutional correlates. Science, 194, 732–734.
Available in MASS as mammals.
data(mammals) fit_ols <- lm(log_brain ~ log_body, data = mammals) fit_gkrr <- gkrr(log_brain ~ log_body, data = mammals, sigma_method = "s3") plot(mammals$log_body, mammals$log_brain, xlab = "log(body mass, kg)", ylab = "log(brain mass, g)", main = "Brain vs. Body Mass (62 Mammal Species)", pch = 19, col = "grey60", cex = 0.8) # Label the leverage points elephants <- mammals$species %in% c("African elephant","Asian elephant") points(mammals$log_body[elephants], mammals$log_brain[elephants], col = "red", pch = 17, cex = 1.4) abline(fit_ols, col = "black", lwd = 2, lty = 2) abline(fit_gkrr, col = "firebrick", lwd = 2) legend("topleft", c("OLS","GKRR","Elephants"), col = c("black","firebrick","red"), lty = c(2,1,NA), pch = c(NA,NA,17), lwd = 2, bty = "n")data(mammals) fit_ols <- lm(log_brain ~ log_body, data = mammals) fit_gkrr <- gkrr(log_brain ~ log_body, data = mammals, sigma_method = "s3") plot(mammals$log_body, mammals$log_brain, xlab = "log(body mass, kg)", ylab = "log(brain mass, g)", main = "Brain vs. Body Mass (62 Mammal Species)", pch = 19, col = "grey60", cex = 0.8) # Label the leverage points elephants <- mammals$species %in% c("African elephant","Asian elephant") points(mammals$log_body[elephants], mammals$log_brain[elephants], col = "red", pch = 17, cex = 1.4) abline(fit_ols, col = "black", lwd = 2, lty = 2) abline(fit_gkrr, col = "firebrick", lwd = 2) legend("topleft", c("OLS","GKRR","Elephants"), col = c("black","firebrick","red"), lty = c(2,1,NA), pch = c(NA,NA,17), lwd = 2, bty = "n")
Produces up to 6 diagnostic panels for a "gkrr" object.
Point size is inversely proportional to the kernel weight ,
so outliers (small weights) appear large and red while well-fitted
observations appear small and blue.
## S3 method for class 'gkrr' plot(x, which = 1:5, n_id = 3L, ask = length(which) > 1L, ...)## S3 method for class 'gkrr' plot(x, which = 1:5, n_id = 3L, ask = length(which) > 1L, ...)
x |
A |
which |
Integer vector selecting panels to draw (default |
n_id |
Number of extreme observations to label in panels 1–5
(default |
ask |
Logical; if |
... |
Additional arguments (ignored; included for S3 compatibility). |
which = 1Residuals vs. fitted values.
which = 2Observed vs. fitted ( vs. ).
which = 3Kernel weight vs. residual, with the theoretical
curve overlaid.
which = 4Kernel weight vs. observation index.
which = 5Normal QQ-plot of residuals, coloured by weight.
which = 6Objective function by iteration.
Invisibly returns x.
Produces up to 2 panels for a "gkrr_boot" object.
## S3 method for class 'gkrr_boot' plot(x, which = 1:2, ask = length(which) > 1L, ...)## S3 method for class 'gkrr_boot' plot(x, which = 1:2, ask = length(which) > 1L, ...)
x |
A |
which |
Integer vector: |
ask |
Logical; waits for user input between panels when |
... |
Additional arguments (ignored). |
which = 1For each coefficient: histogram of bootstrap replicates with smoothed density, original estimate and shaded CI region.
which = 2Scatter-plot matrix of bootstrap replicates across all pairs of coefficients, with a 95% confidence ellipse and Pearson correlation in each off-diagonal cell. Only available when the model has at least 2 coefficients.
Invisibly returns x.
Four estimators for the Gaussian kernel width parameter .
All functions receive y (observed response) and yhat
(OLS fitted values) and return a positive scalar .
sigma2_s1(y, yhat, alpha = 0.5) sigma2_s2(y, yhat) sigma2_s3(y, yhat, p) sigma2_s4(y, yhat)sigma2_s1(y, yhat, alpha = 0.5) sigma2_s2(y, yhat) sigma2_s3(y, yhat, p) sigma2_s4(y, yhat)
y |
Numeric vector of observed responses (S4 only). |
yhat |
Numeric vector of OLS fitted values (S4 only). |
alpha |
Fraction of the sample used in S1 (default |
p |
Number of predictors excluding the intercept (used in S3 only). |
Mean of the 0.1 and 0.9 quantiles of the
squared residuals on a sub-sample of size
.
Recommended for clean data.
Median of ,\ .
Recommended for Y-space outliers up to 10% and X-space outliers
up to 15%.
.
Recommended for Y-space outliers and leverage points.
Squares the bandwidth
selected by minimising the corrected Akaike information criterion in
a nonparametric regression of on ,
via sm::h.select(..., method = "aicc").
Recommended for large samples ().
Requires the sm package.
A positive scalar .
De Carvalho et al. (2017) doi:10.1016/j.neucom.2016.12.035
The Hertzsprung-Russell diagram plots the log-luminosity against the log-effective-temperature of 47 stars in the star cluster CYG OB1. The dataset contains 4 giant stars (observations 11, 20, 30 and 34) that are well off the main sequence and act as leverage points, severely distorting OLS estimates of the linear trend in the main-sequence stars.
stars_cygstars_cyg
A data frame with 47 rows and 2 columns:
Logarithm (base 10) of the effective surface temperature of the star (Kelvin). Higher values correspond to hotter, bluer stars.
Logarithm (base 10) of the light intensity of the star relative to that of the Sun.
Observations 11, 20, 30 and 34 are giant stars that lie far from the main
sequence (low temperature, high luminosity). They are leverage points
because their log_temp values are much lower than the bulk of the
data, and they also deviate from the linear relationship of the main sequence
(bad leverage points).
Humphreys, R.M. (1978). Studies of luminous stars in nearby galaxies. Astrophysical Journal Supplement, 38, 309–350.
Cited by: P.J. Rousseeuw and A.M. Leroy (1987). Robust Regression and Outlier Detection. John Wiley & Sons, p. 27.
Available in robustbase as starsCYG.
Rousseeuw, P.J. and Leroy, A.M. (1987). Robust Regression and Outlier Detection. John Wiley & Sons.
data(stars_cyg) fit_ols <- lm(log_light ~ log_temp, data = stars_cyg) fit_gkrr <- gkrr(log_light ~ log_temp, data = stars_cyg, sigma_method = "s3") plot(stars_cyg$log_temp, stars_cyg$log_light, xlab = "log10(Temperature, K)", ylab = "log10(Luminosity, Sun = 1)", main = "Hertzsprung-Russell Diagram (CYG OB1)", pch = 19, col = "grey60") # Highlight the 4 giant stars giants <- c(11, 20, 30, 34) points(stars_cyg$log_temp[giants], stars_cyg$log_light[giants], col = "red", pch = 17, cex = 1.5) abline(fit_ols, col = "black", lwd = 2, lty = 2) abline(fit_gkrr, col = "firebrick", lwd = 2) legend("topleft", c("OLS","GKRR","Giant stars"), col = c("black","firebrick","red"), lty = c(2,1,NA), pch = c(NA,NA,17), lwd = 2, bty = "n")data(stars_cyg) fit_ols <- lm(log_light ~ log_temp, data = stars_cyg) fit_gkrr <- gkrr(log_light ~ log_temp, data = stars_cyg, sigma_method = "s3") plot(stars_cyg$log_temp, stars_cyg$log_light, xlab = "log10(Temperature, K)", ylab = "log10(Luminosity, Sun = 1)", main = "Hertzsprung-Russell Diagram (CYG OB1)", pch = 19, col = "grey60") # Highlight the 4 giant stars giants <- c(11, 20, 30, 34) points(stars_cyg$log_temp[giants], stars_cyg$log_light[giants], col = "red", pch = 17, cex = 1.5) abline(fit_ols, col = "black", lwd = 2, lty = 2) abline(fit_gkrr, col = "firebrick", lwd = 2) legend("topleft", c("OLS","GKRR","Giant stars"), col = c("black","firebrick","red"), lty = c(2,1,NA), pch = c(NA,NA,17), lwd = 2, bty = "n")
Prints a coefficient table modelled after summary.lm. By default
inference is based on the sandwich variance estimator (always computed at
fit time). When a gkrr_boot object is available it takes
precedence, replacing the sandwich standard errors and p-values with their
bootstrap counterparts.
## S3 method for class 'gkrr' summary( object, boot = NULL, digits = 4L, signif_stars = TRUE, se_tol = 0.25, ... )## S3 method for class 'gkrr' summary( object, boot = NULL, digits = 4L, signif_stars = TRUE, se_tol = 0.25, ... )
object |
A |
boot |
Optional |
digits |
Number of significant digits (default |
signif_stars |
Logical; print significance stars (default |
se_tol |
Relative divergence threshold between sandwich and
bootstrap standard errors. When both are available and
|
... |
Currently ignored. |
Sandwich inference (default, boot = FALSE in
gkrr):
Standard errors are derived from the asymptotic sandwich covariance matrix
, where
and .
P-values use the two-sided Wald z-test
.
Confidence intervals are .
Note: the sandwich estimator is asymptotically valid but does not
account for the variability introduced by estimating .
For small samples or when estimation has high variance,
bootstrap inference (boot = TRUE) is more reliable.
Bootstrap inference (when boot != FALSE):
Bootstrap p-values use the centred-t approach:
which counts how many bootstrap replicates are at least as extreme as the observed estimate relative to zero.
Invisibly returns a list with components coefficients
(the printed table as a matrix) and r_squared.