Upload
alexandria-cockcroft
View
231
Download
7
Embed Size (px)
Citation preview
Why?
• I was producing graphs for a SAS Graphics Training Course that will be rolled out soon, and I wanted to control the correlation between the variables.
2
Previous Method
3
Use Excel to fill down and then generate
another column that was fairly correlated
Generating Correlated Random Variables using the SAS Datastep
data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-
sig2**2*rho**2)*r2;output; end;run;
4
Generating Correlated Random Variables using the SAS Datastep
data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-
sig2**2*rho**2)*r2;output; end;run;
5
Generating Correlated Random Variables using the SAS Datastep
data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-
sig2**2*rho**2)*r2;output; end;run;
6
Generating Correlated Random Variables using the SAS Datastep
data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-
sig2**2*rho**2)*r2;output; end;run;
7
Generating Correlated Random Variables using the SAS Datastep
data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-
sig2**2*rho**2)*r2;output; end;run;
8
Generating Correlated Random Variables using the SAS Datastep
data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-
sig2**2*rho**2)*r2;output; end;run;
9
Y and x for different correlation coefficients
10
Generating Correlated Random Variables using Proc IML
• To generate more than 2 correlated random variables than it’s easier to use the Cholesky decomposition method in Proc IML.
• IML = Interactive Matrix Language
11
Generating Correlated Random Variables using Proc IML
proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;
x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);
Cholesky_decomp = root(x); /* U */
matrix_con = x3||x4;mean = mean1||mean2;
final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;
quit;
12
Use is similar to set.Reading in the simulated data and the means
Generating Correlated Random Variables using Proc IML
proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;
x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);
Cholesky_decomp = root(x); /* U */
matrix_con = x3||x4;mean = mean1||mean2;
final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;
quit;
13
Variance covariance matrix
Generating Correlated Random Variables using Proc IML
proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;
x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);
Cholesky_decomp = root(x); /* U */
matrix_con = x3||x4;mean = mean1||mean2;
final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;
quit;
14
Applying Cholesky’s decompositon
Generating Correlated Random Variables using Proc IML
proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;
x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);
Cholesky_decomp = root(x); /* U */
matrix_con = x3||x4;mean = mean1||mean2;
final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;
quit;
15
Concatenating the variables
Generating Correlated Random Variables using Proc IML
proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;
x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);
Cholesky_decomp = root(x); /* U */
matrix_con = x3||x4;mean = mean1||mean2;
final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;
quit;
16
Correlated Variables
Generating Correlated Random Variables using Proc IML
proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;
x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);
Cholesky_decomp = root(x); /* U */
matrix_con = x3||x4;mean = mean1||mean2;
final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;
quit;
17
Outputting the variables
References
• Generating Multivariate Normal Data by using Proc IMLLingling Han, University of Georgia, Athens, GA
18
Appendix
• Correlation Coefficient =
19
R Code - Generating Correlated Random Variables
mean1 = 0mean2 = 10sig1 = 2sig2 = 5rho = 0.9
r1 = rnorm(100, 0, 1)r2 = rnorm(100, 0, 1)
y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2;
20
R Code - Generating Correlated Random Variables
mean1 = 0mean2 = 10sig1 = 2sig2 = 5rho = 0.9
r1 = rnorm(100, 0, 1)r2 = rnorm(100, 0, 1)
y1 = mean1 + sig1*r1y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2
21
R Code - Generating Correlated Random Variables using Matrices
C = matrix(c(4, 9, 9, 25), nrow = 2, ncol = 2)cholc = chol(C)R = matrix(c(r1,r2), nrow = 100, ncol = 2, byrow
= F)mean = matrix(c(mean1,mean2), nrow = 100,
ncol = 2, byrow = T)RC = mean + R %*% cholc
22
Use previous values of r1 and r2