--- title: "Ex. 2 - Understanding the elements in output" author: Yuan-Ling Liaw and Waldir Leoncio header-includes: - \usepackage{setspace}\onehalfspacing output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Ex. 2 - Understanding the elements in output} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE, warning = FALSE} library(knitr) options(width = 90, tidy = TRUE, warning = FALSE, message = FALSE) opts_chunk$set(comment = "", warning = FALSE, message = FALSE, echo = TRUE, tidy = TRUE) ``` ```{r load} library(lsasim) ``` ```{r packageVersion} packageVersion("lsasim") ``` ```{r equation, eval=FALSE} questionnaire_gen(n_obs, cat_prop = NULL, n_vars = NULL, n_X = NULL, n_W = NULL, cor_matrix = NULL, cov_matrix = NULL, c_mean = NULL, c_sd = NULL, theta = FALSE, family = NULL, full_output = FALSE, verbose = TRUE) ``` By default, the function returns a data.frame object where the first column ("subject") is a $1, \ldots, n$ ordered list of the $n$ observations and the other columns correspond to the questionnaire answers. If `theta = TRUE`, the first column after "subject" will be the latent variable theta; in any case, the continuous variables always come before the categorical ones. If the logical argument `full_output` is `TRUE`, output will be a list containing the questionnaire data as well as several objects that might be of interest for further analysis of the data, listed below: - `bg`: a data frame containing the background questionnaire answers (i.e., the same object output if `full_output = FALSE`). - `c_mean`: is a vector of population means for each continuous variable ($Y$ and $X$). - `c_sd`: is a vector of population standard deviations for each continuous variable ($Y$ and $X$). - `cat_prop`: list of cumulative proportions for each item. If `theta = TRUE`, the first element of `cat_prop` must be a scalar 1, which corresponds to `theta`. - `cat_prop_W_p`: a list containing the probabilities for each category of the categorical variables (`cat_prop_W` contains the cumulative probabilities). - `cor_matrix`: latent correlation matrix. The first row/column corresponds to the latent trait ($Y$). The other rows/columns correspond to the continuous ($X$ or $Z$) or the discrete ($W$) background variables, in the same order as `cat_prop`. - `cov_matrix`: latent covariance matrix, formatted as `cor_matrix`. - `family`: distribution of the background variables. Can be `NULL` (default) or 'gaussian'. - `n_obs`: number of observations to generate. - `n_tot`: named vector containing the number of total variables, the number of continuous background variables (i.e., the total number of background variables except theta) and the number of categorical variables. - `n_W`: vector containing the number of categorical variables. - `n_X`: vector containing the number of continuous variables (except theta). - `sd_YXW`: vector with the standard deviations of all the variables - `sd_YXZ`: vector containing the standard deviations of theta, the background continuous variables ($X$) and the Normally-distributed variables $Z$ which will generate the background categorical variables ($W$). - `theta`: if `TRUE`, the first continuous variable will be labeled "theta". Otherwise, it will be labeled `q1`. - `var_W`: list containing the variances of the categorical variables. - `var_YX`: list containing the variances of the continuous variables (including theta) - `linear_regression`: This list is printed only if `theta = TRUE`, `family = "gaussian"` and `full_output = TRUE`. It contains one vector named `betas` and one tabled named `cov_YXW`. The former displays the true linear regression coefficients of theta on the background questionnaire answers; the latter contains the covariance matrix between all these variables. --- We generate one continuous and two ordinal covariates. We specify the covariance matrix between the numeric and ordinal variables. The data is generated from a multivariate normal distribution. And we set the logical argument `full_output = TRUE`. ```{r, include = FALSE} set.seed(1234) (props <- list(1, c(.25, 1), c(.2, .8, 1))) (yw_cov <- matrix(c(1, .5, .5, .5, 1, .8, .5, .8, 1), nrow = 3)) bg <- questionnaire_gen(n_obs = 10, cat_prop = props, cov_matrix = yw_cov, theta = TRUE, family = "gaussian", full_output = TRUE) names(bg) ``` The output is a list containing the following elements: `r names(bg)`. ```{r, eval=FALSE} ?questionnaire_gen ``` ```{r ex 1a} set.seed(1234) (props <- list(1, c(.25, 1), c(.2, .8, 1))) (yw_cov <- matrix(c(1, .5, .5, .5, 1, .8, .5, .8, 1), nrow = 3)) ``` ```{r} questionnaire_gen(n_obs = 10, cat_prop = props, cov_matrix = yw_cov, theta = TRUE, family = "gaussian", full_output = TRUE) ```