generate_y.RdGenerate simulated data according to the model f(X) = f_prog(X) + trt*(b0 + b1*f_pred(X)) for different outcome distributions. For continuous data f(X) is the conditional mean, for binary data the logit response probility, for count data the log-rate and for survival data the log-hazard rate.
generate_y(
X,
trt,
prog,
pred,
b0,
b1 = NULL,
sd_te = NULL,
type = c("binary", "continuous", "count", "survival"),
sigma_error = 1,
theta = 1,
cens_time = NULL,
lambda0 = NULL,
include_truth = FALSE,
sign_better = 1
)matrix/dataframe of predictor variables
Binary treatment indicator variable
Character variable giving expression for prognostic effects (defined in terms of names in the X matrix)
Character variable giving expression for predictive effects (defined in terms of names in the X matrix)
Treatment (main effect)
Coefficient of the predictive effects defined in pred
Standard deviation of the treatment effects defined via pred. If given b1 is ignored. For binary data this is assumed to be on the log-odds ratio scale. For survival and count data this is assumed to be on the log scale.
Outcome data type to generate ("continuous", "binary", "count" and "survival" are allowed here)
Residual error, only needed for type = "continuous"
Overdispersion parameter, only needed for data_type = "count" (variance of neg bin in this parameterization is mu + mu^2/theta)
Function to generate the censoring time, only needed for data_type = "survival"
Intercept of exponential regression (on non-log scale)
boolean, will the true prognostic and predictive effects be included in the outcome data-set?
whether larger response is better (used to determine whether b1 is negative or positive if not given)
Data set
X <- generate_X_dist(n = 10000, p = 10, rho = 0.5)
## observational data set
trt <- generate_trt(n = nrow(X), type = "random", X = X, prop = "X2")
dat <- generate_y(X, trt,
prog = "0.5*((X1=='Y')+X3)",
pred = "X3>0", b0 = 0, b1 = 1,
type = "continuous", sigma_error = 3
)
#### generate data from user specified covariate X
X <- sapply(1:10, function(ii) {
rnorm(500)
})
X <- as.data.frame(X)
colnames(X) <- paste0("Z", 1:10)
trt <- generate_trt(nrow(X), p_trt = 0.5)
dat <- generate_y(X, trt,
prog = "0.5*Z1",
pred = "(Z5>0)", b0 = 0, b1 = 1,
type = "binary"
)
glm(Y ~ trt * ., data = dat, family = binomial)
#>
#> Call: glm(formula = Y ~ trt * ., family = binomial, data = dat)
#>
#> Coefficients:
#> (Intercept) trt Z1 Z2 Z3 Z4
#> 0.04097 0.66630 0.48682 -0.27904 -0.10889 -0.18297
#> Z5 Z6 Z7 Z8 Z9 Z10
#> -0.13666 0.06119 -0.01345 -0.21496 0.14267 -0.11297
#> trt:Z1 trt:Z2 trt:Z3 trt:Z4 trt:Z5 trt:Z6
#> -0.06650 0.09557 -0.01194 0.06109 0.48794 -0.10708
#> trt:Z7 trt:Z8 trt:Z9 trt:Z10
#> -0.06243 0.24723 -0.09472 -0.07955
#>
#> Degrees of Freedom: 499 Total (i.e. Null); 478 Residual
#> Null Deviance: 679.6
#> Residual Deviance: 625.7 AIC: 669.7