| Title: | Semiparametric Likelihood Estimation with Errors in Variables |
|---|---|
| Description: | Efficient regression analysis under general two-phase sampling, where Phase I includes error-prone data and Phase II contains validated data on a subset. |
| Authors: | Sarah Lotspeich [aut], Ran Tao [aut, cre], Joey Sherrill [prg], Jiangmei Xiong [ctb], Shawn Garbett [prg] |
| Maintainer: | Ran Tao <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 1.2.0 |
| Built: | 2026-06-03 08:35:59 UTC |
| Source: | https://github.com/dragontaoran/sleev |
Extracts estimated coefficients from a two-phase linear regression model
of class linear2ph.
## S3 method for class 'linear2ph' coefficients(object, ...)## S3 method for class 'linear2ph' coefficients(object, ...)
object |
An object of class |
... |
Additional arguments passed to other methods. |
A numeric vector of coefficients if the model converged,
otherwise NULL with a warning.
Extracts estimated coefficients from a two-phase logistic regression model
of class logistic2ph.
## S3 method for class 'logistic2ph' coefficients(object, ...)## S3 method for class 'logistic2ph' coefficients(object, ...)
object |
An object of class |
... |
Additional arguments passed to other methods. |
A numeric vector of coefficients if the model converged,
otherwise NULL with a warning.
linear2ph
Performs cross-validation to calculate the average predicted log likelihood for the linear2ph function. This function can be used to select the B-spline basis that yields the largest average predicted log likelihood. See pacakge vigenette for code examples.
cv_linear2ph( y_unval = NULL, y = NULL, x_unval = NULL, x = NULL, z = NULL, data = NULL, nfolds = 5, max_iter = 2000, tol = 1e-04, verbose = FALSE )cv_linear2ph( y_unval = NULL, y = NULL, x_unval = NULL, x = NULL, z = NULL, data = NULL, nfolds = 5, max_iter = 2000, tol = 1e-04, verbose = FALSE )
y_unval |
Specifies the column of the error-prone outcome that is continuous. Subjects with missing values of |
y |
Specifies the column that stores the validated value of |
x_unval |
Specifies the columns of the error-prone covariates. Subjects with missing values of |
x |
Specifies the columns that store the validated values of |
z |
Specifies the columns of the accurately measured covariates. Subjects with missing values of |
data |
Specifies the name of the dataset. This argument is required. |
nfolds |
Specifies the number of cross-validation folds. The default value is |
max_iter |
Specifies the maximum number of iterations in the EM algorithm. The default number is |
tol |
Specifies the convergence criterion in the EM algorithm. The default value is |
verbose |
If |
cv_linear2ph gives log-likelihood prediction for models and data like those in linear2ph. Therefore, the arguments of cv_linear2ph is analogous to that of linear2ph.
cv_linear2ph() returns a list that includes the following components:
avg_pred_loglike |
The average predicted log likelihood across each fold. |
pred_loglike |
The predicted log likelihood in each fold. |
converge |
The convergence status of the EM algorithm in each run. |
## Not run: data("mock.vccc") # different B-spline sizes sns <- c(15, 20, 25, 30, 35, 40) # vector to hold mean log-likelihood pred_loglike.1 <- rep(NA, length(sns)) # specify number of folds in the cross validation k <- 5 for (i in 1:length(sns)) { # constructing B-spline basis using the same process as in Section 4.3.1 sn <- sns[i] data.sieve <- spline2ph(x = "VL_unval", data = mock.vccc, size = sn, degree = 3, group = "Sex") # cross validation, produce mean log-likelihood start.time <- Sys.time() res.1 <- cv_linear2ph(y = "CD4_val", y_unval = "CD4_unval", x ="VL_val", x_unval = "VL_unval", z = "Sex", data = data.sieve, nfolds = k, max_iter = 2000, tol = 1e-04, verbose = FALSE) # save mean log-likelihood result pred_loglike.1[i] <- res.1$avg_pred_loglik } # Print predicted log-likelihood for different B-spline sizes print(pred_loglike.1) ## End(Not run)## Not run: data("mock.vccc") # different B-spline sizes sns <- c(15, 20, 25, 30, 35, 40) # vector to hold mean log-likelihood pred_loglike.1 <- rep(NA, length(sns)) # specify number of folds in the cross validation k <- 5 for (i in 1:length(sns)) { # constructing B-spline basis using the same process as in Section 4.3.1 sn <- sns[i] data.sieve <- spline2ph(x = "VL_unval", data = mock.vccc, size = sn, degree = 3, group = "Sex") # cross validation, produce mean log-likelihood start.time <- Sys.time() res.1 <- cv_linear2ph(y = "CD4_val", y_unval = "CD4_unval", x ="VL_val", x_unval = "VL_unval", z = "Sex", data = data.sieve, nfolds = k, max_iter = 2000, tol = 1e-04, verbose = FALSE) # save mean log-likelihood result pred_loglike.1[i] <- res.1$avg_pred_loglik } # Print predicted log-likelihood for different B-spline sizes print(pred_loglike.1) ## End(Not run)
logistic2ph
Performs cross-validation to calculate the average predicted log likelihood for the logistic2ph function. This function can be used to select the B-spline basis that yields the largest average predicted log likelihood. See pacakge vigenette for code examples.
cv_logistic2ph( y_unval = NULL, y = NULL, x_unval = NULL, x = NULL, z = NULL, data, nfolds = 5, tol = 1e-04, max_iter = 1000, verbose = FALSE )cv_logistic2ph( y_unval = NULL, y = NULL, x_unval = NULL, x = NULL, z = NULL, data, nfolds = 5, tol = 1e-04, max_iter = 1000, verbose = FALSE )
y_unval |
Column name of the error-prone or unvalidated binary outcome. This argument is optional. If |
y |
Column name that stores the validated value of |
x_unval |
Specifies the columns of the error-prone covariates. This argument is required. |
x |
Specifies the columns that store the validated values of |
z |
Specifies the columns of the accurately measured covariates. Subjects with missing values of |
data |
Specifies the name of the dataset. This argument is required. |
nfolds |
Specifies the number of cross-validation folds. The default value is |
tol |
Specifies the convergence criterion in the EM algorithm. The default value is |
max_iter |
Specifies the maximum number of iterations in the EM algorithm. The default number is |
verbose |
If |
cv_logistic2ph gives log-likelihood prediction for models and data like those in logistic2ph. Therefore, the arguments of cv_logistic2ph is analogous to that of logistic2ph.
cv_logistic2ph() returns a list that includes the following components:
avg_pred_loglike |
Stores the average predicted log likelihood. |
pred_loglike |
Stores the predicted log likelihood in each fold. |
converge |
Stores the convergence status of the EM algorithm in each run. |
## Not run: data("mock.vccc") # different B-spline sizes sns <- c(15, 20, 25, 30, 35, 40) # vector to hold mean log-likelihood pred_loglike.1 <- rep(NA, length(sns)) # specify number of folds in the cross validation k <- 5 for (i in 1:length(sns)) { # constructing B-spline basis using the same process as in Section 4.3.1 sn <- sns[i] data.sieve <- spline2ph(x = "CD4_unval", size = 20, degree = 3, data = mock.vccc, group = "Prior_ART", split_group = TRUE) # cross validation, produce mean log-likelihood start.time <- Sys.time() res.1 <- cv_logistic2ph(y = "ADE_val", y_unval = "ADE_unval", x = "CD4_val", x_unval = "CD4_unval", z = "Prior_ART", data = data.sieve, tol = 1e-04, max_iter = 1000, verbose = FALSE) # save mean log-likelihood result pred_loglike.1[i] <- res.1$avg_pred_loglik } # Print predicted log-likelihood for different B-spline sizes print(pred_loglike.1) ## End(Not run)## Not run: data("mock.vccc") # different B-spline sizes sns <- c(15, 20, 25, 30, 35, 40) # vector to hold mean log-likelihood pred_loglike.1 <- rep(NA, length(sns)) # specify number of folds in the cross validation k <- 5 for (i in 1:length(sns)) { # constructing B-spline basis using the same process as in Section 4.3.1 sn <- sns[i] data.sieve <- spline2ph(x = "CD4_unval", size = 20, degree = 3, data = mock.vccc, group = "Prior_ART", split_group = TRUE) # cross validation, produce mean log-likelihood start.time <- Sys.time() res.1 <- cv_logistic2ph(y = "ADE_val", y_unval = "ADE_unval", x = "CD4_val", x_unval = "CD4_unval", z = "Prior_ART", data = data.sieve, tol = 1e-04, max_iter = 1000, verbose = FALSE) # save mean log-likelihood result pred_loglike.1[i] <- res.1$avg_pred_loglik } # Print predicted log-likelihood for different B-spline sizes print(pred_loglike.1) ## End(Not run)
Performs efficient semiparametric estimation for general two-phase measurement error models when there are errors in both the outcome and covariates. See pacakge vigenette for code examples.
linear2ph( y_unval = NULL, y = NULL, x_unval = NULL, x = NULL, z = NULL, data = NULL, hn_scale = 1, se = TRUE, tol = 1e-04, max_iter = 1000, verbose = FALSE )linear2ph( y_unval = NULL, y = NULL, x_unval = NULL, x = NULL, z = NULL, data = NULL, hn_scale = 1, se = TRUE, tol = 1e-04, max_iter = 1000, verbose = FALSE )
y_unval |
Column name of the error-prone or unvalidated continuous outcome. Subjects with missing values of |
y |
Column name that stores the validated value of |
x_unval |
Specifies the columns of the error-prone covariates. Subjects with missing values of |
x |
Specifies the columns that store the validated values of |
z |
Specifies the columns of the accurately measured covariates. Subjects with missing values of |
data |
Specifies the name of the dataset. This argument is required. |
hn_scale |
Specifies the scale of the perturbation constant in the variance estimation. For example, if |
se |
If |
tol |
Specifies the convergence criterion in the EM algorithm. The default value is |
max_iter |
Maximum number of iterations in the EM algorithm. The default number is |
verbose |
If |
Models for linear2ph() are specified through the arguments. The dataset input should at least contain columns for unvalidated error-prone outcome, validated error-prone outcome,
unvalidated error-prone covariate(s), validated error-prone covariate(s), and B-spline basis. B-spline basis can be generated from splines::bs() function, with argument x
being the unvalidated error-prone covariate(s). See vignette for options in tuning the B-spline basis.
linear2ph() returns an object of class "linear2ph". The function coef() is used to obtain the coefficients of the fitted model. The function summary() is used to obtain and print a summary of results.
An object of class "linear2ph" is a list containing at least the following components:
call |
the matched call. |
coefficients |
A named vector of the linear regression coefficient estimates. |
sigma |
The residual standard error. |
covariance |
The covariance matrix of the linear regression coefficient estimates. |
converge |
In parameter estimation, if the EM algorithm converges, then |
converge_cov |
In variance estimation, if the EM algorithm converges, then |
Tao, R., Mercaldo, N. D., Haneuse, S., Maronge, J. M., Rathouz, P. J., Heagerty, P. J., & Schildcrout, J. S. (2021). Two-wave two-phase outcome-dependent sampling designs, with applications to longitudinal binary data. Statistics in Medicine, 40(8), 1863–1876. https://doi.org/10.1002/sim.8876
cv_linear2ph() to calculate the average predicted log likelihood of this function.
## Not run: # Regression model: CD4 ~ VL + Sex. CD4 and VL are partially validated. data("mock.vccc") sn <- 20 data.linear <- spline2ph(x = "VL_unval", data = mock.vccc, size = sn, degree = 3, group = "Sex") res_linear <- linear2ph(y_unval = "CD4_unval", y = "CD4_val", x_unval = "VL_unval", x = "VL_val", z = "Sex", data = data.linear, hn_scale = 1, se = TRUE, tol = 1e-04, max_iter = 1000, verbose = FALSE) ## End(Not run)## Not run: # Regression model: CD4 ~ VL + Sex. CD4 and VL are partially validated. data("mock.vccc") sn <- 20 data.linear <- spline2ph(x = "VL_unval", data = mock.vccc, size = sn, degree = 3, group = "Sex") res_linear <- linear2ph(y_unval = "CD4_unval", y = "CD4_val", x_unval = "VL_unval", x = "VL_val", z = "Sex", data = data.linear, hn_scale = 1, se = TRUE, tol = 1e-04, max_iter = 1000, verbose = FALSE) ## End(Not run)
This function returns the sieve maximum likelihood estimators (SMLE) for the logistic regression model from Lotspeich et al. (2021). See pacakge vigenette for code examples.
logistic2ph( y_unval = NULL, y = NULL, x_unval = NULL, x = NULL, z = NULL, data = NULL, hn_scale = 1, se = TRUE, tol = 1e-04, max_iter = 1000, verbose = FALSE )logistic2ph( y_unval = NULL, y = NULL, x_unval = NULL, x = NULL, z = NULL, data = NULL, hn_scale = 1, se = TRUE, tol = 1e-04, max_iter = 1000, verbose = FALSE )
y_unval |
Column name of the error-prone or unvalidated binary outcome. This argument is optional. If |
y |
Column name that stores the validated value of |
x_unval |
Specifies the columns of the error-prone covariates. This argument is required. |
x |
Specifies the columns that store the validated values of |
z |
Specifies the columns of the accurately measured covariates. This argument is optional. |
data |
Specifies the name of the dataset. This argument is required. |
hn_scale |
Specifies the scale of the perturbation constant in the variance estimation. For example, if |
se |
If |
tol |
Specifies the convergence criterion in the EM algorithm. The default value is |
max_iter |
Maximum number of iterations in the EM algorithm. The default number is |
verbose |
If |
Models for logistic2ph() are specified through the arguments. The dataset input should at least contain columns for unvalidated error-prone outcome, validated error-prone outcome,
unvalidated error-prone covariate(s), validated error-prone covariate(s), and B-spline basis. B-spline basis can be generated from splines::bs() function, with argument x
being the unvalidated error-prone covariate(s). See vignette for options in tuning the B-spline basis.
logistic2ph() returns an object of class "logistic2ph". The function coef() is used to obtain the coefficients of the fitted model. The function summary() is used to obtain and print a summary of results.
An object of class "logistic2ph" is a list containing at least the following components:
call |
the matched call. |
coefficients |
A named vector of the logistic regression coefficient estimates. |
covariance |
The covariance matrix of the logistic regression coefficient estimates. |
converge |
In parameter estimation, if the EM algorithm converges, then |
converge_cov |
In variance estimation, if the EM algorithm converges, then |
Lotspeich, S. C., Shepherd, B. E., Amorim, G. G. C., Shaw, P. A., & Tao, R. (2021). Efficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort. Biometrics, biom.13512. https://doi.org/10.1111/biom.13512
## Not run: # Regression model: ADE ~ CD4 + Prior_ART. ADE and CD4 are partially validated. data("mock.vccc") sn <- 20 data.logistic <- spline2ph(x = "CD4_unval", size = 20, degree = 3, data = mock.vccc, group = "Prior_ART", split_group = TRUE) res_logistic <- logistic2ph(y = "ADE_val", y_unval = "ADE_unval", x = "CD4_val", x_unval = "CD4_unval", z = "Prior_ART", data = data.logistic, hn_scale = 1/2, se = TRUE, tol = 1e-04, max_iter = 1000, verbose = FALSE) ## End(Not run)## Not run: # Regression model: ADE ~ CD4 + Prior_ART. ADE and CD4 are partially validated. data("mock.vccc") sn <- 20 data.logistic <- spline2ph(x = "CD4_unval", size = 20, degree = 3, data = mock.vccc, group = "Prior_ART", split_group = TRUE) res_logistic <- logistic2ph(y = "ADE_val", y_unval = "ADE_unval", x = "CD4_val", x_unval = "CD4_unval", z = "Prior_ART", data = data.logistic, hn_scale = 1/2, se = TRUE, tol = 1e-04, max_iter = 1000, verbose = FALSE) ## End(Not run)
A simulated dataset constructed to imitate the Vanderbilt Comprehensive Care Clinic (VCCC) patient records, which have been fully validated and therefore contain validated and unvalidated versions of all variables. The VCCC cohort is a good candidate for the purpose of illustration. The data presented in this section are a mocked-up version of the actual data due to confidentiality, but the data structure and features, such as mean and variability, closely resemble the real dataset.
mock.vcccmock.vccc
A data frame with 2087 rows and 8 variables:
patient ID
viral load at antiretroviral therapy (ART) initiation, error-prone outcome, continuous
viral load at antiretroviral therapy (ART) initiation, validated outcome, continuous
having an AIDS-defining event (ADE) within one year of ART initiation, error-prone outcome, binary
having an AIDS-defining event (ADE) within one year of ART initiation, validated outcome, binary
CD4 count at ART initiation, error-prone covariate, continuous
CD4 count at ART initiation, validated covariate, continuous
whether patient is ART naive at enrollment, error-free covariate, binary
sex of patient, 1 indicates male and 0 indicates female & error-free covariate, binary
age of patient, error-free covariate, continuous
https://www.vanderbilthealth.com/clinic/comprehensive-care-clinic
Prints the details of a linear2ph object.
## S3 method for class 'linear2ph' print(x, ...)## S3 method for class 'linear2ph' print(x, ...)
x |
An object of class |
... |
Additional arguments passed to methods |
Prints the details of a logistic2ph object.
## S3 method for class 'logistic2ph' print(x, ...)## S3 method for class 'logistic2ph' print(x, ...)
x |
An object of class |
... |
Additional arguments passed to methods |
Prints a structured summary of a linear2ph model.
## S3 method for class 'summary.linear2ph' print(x, ...)## S3 method for class 'summary.linear2ph' print(x, ...)
x |
An object of class |
... |
Additional arguments passed to methods |
Invisibly returns x.
Prints a structured summary of a logistic2ph model.
## S3 method for class 'summary.logistic2ph' print(x, ...)## S3 method for class 'summary.logistic2ph' print(x, ...)
x |
An object of class |
... |
Additional arguments passed to methods |
Invisibly returns x.
Creates splines for two-phase regression function in this package, including linear2ph, logistic2ph, cv_linear2ph, cv_logistic2ph.
spline2ph( x, data, size = 20, degree = 3, bs_names = NULL, group = NULL, split_group = TRUE )spline2ph( x, data, size = 20, degree = 3, bs_names = NULL, group = NULL, split_group = TRUE )
x |
Column names of the covariate of the dataset. |
data |
Specifies the name of the dataset. This argument is required. |
size |
Pass on to the |
degree |
Pass on to the |
bs_names |
Optional. Vecotr of column names of the output B-spline basis matrix. When not specified, a default will be provided. |
group |
Optional. Column name of the categorical variable of which might have heterogeneous errors among different groups. |
split_group |
Optional. Whether to split by group proportion for the group with B-spline size if the |
This function can be directly applied for regression model with one or more error-prone continuous covariates.
the data.frame object including the original dataset and the B-spline bases.
# example code data("mock.vccc") sn <- 20 data.linear <- spline2ph(x = "VL_unval", data = mock.vccc, size = sn, degree = 3, group = "Sex")# example code data("mock.vccc") sn <- 20 data.linear <- spline2ph(x = "VL_unval", data = mock.vccc, size = sn, degree = 3, group = "Sex")
Summarizes the details of a linear2ph object.
## S3 method for class 'linear2ph' summary(object, ...)## S3 method for class 'linear2ph' summary(object, ...)
object |
An object of class |
... |
Additional arguments passed to methods |
An object of class summary.linear2ph, containing the call, coefficients, and covariance.
Summarizes the details of a logistic2ph object.
## S3 method for class 'logistic2ph' summary(object, ...)## S3 method for class 'logistic2ph' summary(object, ...)
object |
An object of class |
... |
Additional arguments passed to methods |
An object of class summary.logistic2ph, containing the call, coefficients, and covariance.