Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

The following steps describe a simple procedure on simulating synthetic data assuming that the continuous variables are following a multivariate normal distribution:
  • Read the original data, create a subset table to include only the columns with variables that you want to simulate and remove any rows that include missing values

  • Calculate the mean value and the variance-covariance matrix (using the cov() function) of independent (explanatory) variablesĀ 

  • Generate random independent variables that follow the multivariate normal distribution (using the mvrnorm() function) and having the same mean and the same correlations between them as the original independent variables

  • Apply logistic regression (using the glm() function) to estimate the coefficients that represent the relationship between each dependent variable and the independent variables of the original dataset

  • Use the estimated coefficients to calculate the log odds of the predicted variables and use them to generate random binary variables (using the rbinom() function) with probability equal to each log odd

  • Combine the simulated dependent and independent variables in one new tableĀ 


An example of simulating HOP-related data:

Example R Script



  • No labels