Effect sizes are indices that help researchers, stakeholders, and policymakers understand the relationship between variables and draw meaningful conclusions from data. However, the usefulness of an effect size is only as good as its estimation. To ensure the accuracy of effect size estimates, it is important to mitigate the attenuation induced by various statistical artifacts, such as measurement error and range restriction. This page provides documentation on these artifacts and provides corrections that can be applied to attenuated effect sizes to obtain unbiased estimates of the true population effect size. Equations and R code are provided for each correction. For additional information on these corrections and their application in meta-analysis, consult the book by Hunter and Schmidt (1990) and the paper by Wiernik and Dahlke (2020) . The {psychmeta} package (Dahlke and Wiernik 2019) contains many of these corrections with convenient implementation in R.
Interactive Visualization
To visualize how various statistical artifacts can bias effect size estimates, play around with this shiny app!
Effect sizes estimated from small samples often show systematic bias. The following corrections for r and d are both very close approximations to the more complex unbiased estimators that area bit more computationally expensive. It is often suggested that they should only be applied in situations with low sample sizes (e.g., \(n < 20\)), however these corrections can be applied at any sample size. This is because \(n\) is built into the correction factor therefore the magnitude of the correction is negligable in large sample sizes. The following corrections can be applied prior any of the other corrections on this page.
Small Samples
Below are the correction factors that may be applied to the correlation coefficient (r) and the standardized mean difference (d). Note that the correction for r is not conventionally used as it tends to be a very small adjustment.
Correlation Coefficient (r)
Point Estimate\[\displaystyle{ \hat{\rho} = r \cdot \left( 1 + \frac{1-r^2}{2(n-4)} \right) }\]
Standard Error\[\displaystyle{ se_{\hat{\rho}} = se_{r}\cdot \left( 1 + \frac{1-r^2}{2(n-4)} \right)}\]
# Parameters needed r =0.50# observed correlation between x and y n =20# sample sizeSEr = (1- r^2) /sqrt(n -1) # standard error of observed correlation between x and y #Point Estimate rho = r * (1+ (1- r^2) / (2* (n -4))) #Standard ErrorSErho = SEr * (1+ (1- r^2) / (2* (n -4)))# Print Resultspaste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )
[1] "rho = 0.51, SE = 0.18"
Standardized Mean Difference (d)
Point Estimate\[\displaystyle{ \hat{\delta} = d \left( 1-\frac{3}{4n-9}\right) }\]
Standard Error\[\displaystyle{ se_{\hat{\delta}} = se_{d}\left( 1-\frac{3}{4n-9}\right) }\]
# Parameters needed d =0.50# observed standardized mean difference n =20# total sample sizeSEd =0.10# standard error of observed standardized mean difference # Point Estimatedelta = d * ( 1-3/ (4* (n -9)) ) # Standard ErrorSEdelta = SEd * ( 1-3/ (4* (n -9)) ) # Print Resultspaste0('delta = ',round(delta,2),', SE = ',round(SEdelta,2) )
[1] "delta = 0.47, SE = 0.09"
Measurement Error
Measurement error is ubiquitous in science and it can cause severe bias in standardized parameter estimates. Measurement error is often thought of as random error/noise, however this is generally not the case. Measurement error variance can be attributed to many factors such as transient error (e.g., fluctuations in fatigue or mood), test item interpretations (e.g., varied interpretation of the levels in a Likert scale), extraneous response biases (e.g., participants’ tendency to select socially desirable options). Different estimators of reliability coefficients will capture these different sources of measurement error. For example, test-retest reliability will capture various transient errors that will not be captured by an internal consistency reliability coefficient (i.e., Cronbach’s alpha, McDonald’s omega). To avoid over/under-correcting effect size estimates, researchers should take great care in selecting the appropriate reliability coefficient for their research design.
Univariate measurement error in continuous variables
Measurement error will likely exist in both variables under invistigation (i.e., \(x\) and \(y\)), however applying a correction may depend on the research question. For example, if you would like to know how related two psychological constructs are, correcting both variables for measurement error is appropriate, however it may not be appropriate if you would like to use one variable to predict the other (e.g., exam scores to predict college grades). Since observed scores are all that is available to us, therefore correcting the predictor variable for measurement error will not capture its real world predictive utility.
Correlation Coefficient (r)
Point Estimate\[\displaystyle{ \hat{\rho} = \frac{r}{\sqrt{r_{xx'}}} }\]
Standard Error\[\displaystyle{ se_{\hat{\rho}} = \frac{se_{r}}{\sqrt{r_{xx'}}} }\]
# Parameters neededr =0.50# observed correlation between x and ySEr =0.10# standard error of observed correlation between x and yrxx =0.80# reliability of x within sample# Point Estimaterho = r /sqrt(rxx)# Standard ErrorSErho = SEr /sqrt(rxx)# Print Resultspaste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )
[1] "rho = 0.56, SE = 0.11"
Standardized Mean Difference (d)
Point Estimate \[\displaystyle{ \hat{\delta} = \frac{d}{\sqrt{r_{yy'}}} }\]
Standard Error \[\displaystyle{ se_{\hat{\delta}} = \frac{se_{d}}{\sqrt{r_{yy'}}} }\]
# Parameters neededd =0.50# observed standardized mean differenceSEd =0.50# standard error of observed standardized mean differenceryy =0.80# reliability of y within sample# Point Estimatedelta = d /sqrt(ryy)# Standard ErrorSEdelta = SEd /sqrt(ryy)# Print Resultspaste0('delta = ',round(delta,2),', SE = ',round(SEdelta,2) )
[1] "delta = 0.56, SE = 0.56"
Bivariate measurement error for continuous variables
Unreliability of \(x\) and \(y\) will bias a pearson correlation coefficient by adding random, uncorrelated noise into the bivariate relationship. If the goal is to understand the relationship between the true, uncontaminated scores between two things, then measurement error on both variables should be corrected for.
Correlation Coefficient (r)
Point Estimate \[\displaystyle{ \hat{\rho} = \frac{r}{\sqrt{r_{xx'} r_{yy'} }} }\]
Standard Error \[\displaystyle{ se_{\hat{\rho}} = \frac{se_{r}}{\sqrt{r_{xx'} r_{yy'} }} }\]
# Parameters neededr =0.50# observed correlationSEr =0.10# standard error of observed correlationrxx =0.80# reliability of x within sampleryy =0.70# reliability of y within sample# Point Estimaterho = r /sqrt(rxx * ryy)# Standard ErrorSErho = SEr /sqrt(rxx * ryy)# Print Resultspaste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )
[1] "rho = 0.67, SE = 0.13"
Misclassification
Misclassification encompasses measurement errors for categorical variables. For example, our ability to identify individuals with major depressive disorder is only as accurate as our measurement of depression, therefore if the measure for depression contains measurement error then so will the assignment of individuals in the major depressive group and the control group (i.e., some people with major depressive disorder will be mis-labeled as a control and vice versa). To correct for group misclassification, you must first have an estimate of the phi coefficient (\(\phi_{gG}\)) from the 2x2 contingency table comparing actual vs observed group membership. Phi can also be approximated directly from the misclassification rate (\(p_{mis}\)), however this may cause undercorrections when misclassification rates differ between groups.
Standardized Mean Difference (d)
Point Estimate
-Step 1. Calculate \(\phi\) coefficient from the contingency table between actual group membership (G) and observed group membership (g): \[\displaystyle{ \phi_{gG} = \sqrt{\frac{\chi_{gG}^{2}}{n}} }\]or\[\displaystyle{\phi_{gG} \approx 1 - 2 \cdot p_{mis} }\]
-Step 2. Transform d to r using probability of observed group membership (\(p_g\)): \[\displaystyle{ r = \frac{d}{ \sqrt{\frac{1}{p_g ( 1-p_g ) + d^2}}} }\]
-Step 4. Back-transform \(\hat{\rho}\) to \(\hat{\delta}\) using probability of actual group membership (\(p_G\)). Observed group membership (\(p_g\)) can be used instead assuming equal misclassification rate between groups: \[\displaystyle{ \hat{\delta} = \frac{\hat{\rho}}{p_G (1-p_G )(1-\hat{\rho}^2)} }\]
# Parameters neededd =0.50# observed standardized mean differenceSEd =0.50# standard error of observed standardized mean differencerxx =0.80# reliability of x within sampleryy =0.70# reliability of y within samplepg =0.50# observed proportion of individuals in group 1 or 2pG =0.50# actual proportion of individuals in group 1 or 2 (pG=pg when misclassification rate is equal among groups)n =50# sample sizechi2 =32# chi squared statistic for actual vs observed group contingency table# p_mis = .20 # misclassification rate (only if alternative step 1 is used)# Point Estimatephi =sqrt(chi2 / n)# step 1# phi = 1 - 2 * p_mis # step 1 (assume equal misclassification)r = d /sqrt(1/ (pg * (1- pg)) ) # step 2rho = r / phi # step 3delta = rho / ( pG * (1- pG) * (1- rho^2) ) # step 4# Standard ErrorSEdelta = SEd * (rho / r) / ( (1+ d^2* pg * (1- pg)) *sqrt( (d^2+1/(pg*(1-pg))) * ( pG * (1- pG) * (1- rho^2)^3) ) )# Print Resultspaste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )
[1] "rho = 0.31, SE = 0.13"
Univariate dichotomization of naturally continuous variables
In some cases, researchers may categorize naturally continuous variables into two groups in order to facilitate interpretation or to perform specific statistical analyses. For example, congenital amusia, also known as tone deafness, is often diagnosed by scoring below a predetermined cutoff in a pitch or melodic discrimination task. This cutoff is arbitrary (although potentially useful) since pitch discrimination ability typically follows a normal distribution. Before someone can adjust the effect sizes for this artificial dichotomization, it is necessary to determine the proportions of participants above or below the cutoff (\(p_x\)) as well as the cutoff value (\(c_{x}\)), which can be estimated using the quantile function of a standard normal distribution at \(p_{x}\).
Correlation Coefficient (r)
Point Estimate Note: \(\Phi\) indicates the normal ordinate of a standard normal distribution \[\displaystyle{ \hat{\rho} = \frac{r\sqrt{p_x(1-p_x)}}{\Phi(c_x)} }\]
Standard Error \[\displaystyle{ se_{\hat{\rho}} = \frac{se_{r}\sqrt{p_x(1-p_x)}}{\Phi(c_x)} }\]
# Parametersr =0.50# observed correlationSEr =0.10# standard error of observed correlationpx =0.50# proportion of individuals in upper or lower group of X# Point Estimatecx =qnorm(px) # Find cut point rho = r *sqrt(px * (1- px)) /dnorm(cx)# Standard ErrorSErho = SEr *sqrt(px * (1- px)) /dnorm(cx)# Print Resultspaste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )
[1] "rho = 0.63, SE = 0.13"
Standardized Mean Difference (d)
Point Estimate - Step 1. Transform d to r using probability of observed group membership (\(p_g\)): \[\displaystyle{ r = \frac{d}{ \sqrt{\frac{1}{p_g ( 1-p_g ) + d^2}}} }\]
-Step 3. Back-transform \(\hat{\rho}\) to \(\hat{\delta}\) using probability of actual group membership (\(p_G\)). Observed group membership (\(p_g\)) can be used instead assuming equal misclassification rate between groups: \[\displaystyle{ \hat{\delta} = \frac{\hat{\rho}}{p_G (1-p_G )(1-\hat{\rho}^2)} }\]
# Parametersd =0.50# observed standardized mean differenceSEd =0.10# standard error of observed standardized mean differencepx =0.50# proportion of individuals in upper or lower group of Xpg =0.50# observed proportion of individuals in group 1 or 2pG =0.50# actual proportion of individuals in group 1 or 2 (pG=pg when misclassification rate is equal among groups)# Point Estimatecx =qnorm(px)# Find cut pointr = d /sqrt(1/ (pg * (1- pg)) ) # step 1rho = r *sqrt(px * (1- px)) /dnorm(cx)# step 2delta = rho / ( pG * (1- pG) * (1- rho^2) ) # step 3# Standard ErrorSEdelta = SEd * (rho / r) / ( (1+ d^2* pg * (1- pg)) *sqrt( (d^2+1/(pg*(1-pg))) * ( pG * (1- pG) * (1- rho^2)^3) ) )# Print Resultspaste0('delta = ',round(delta,2),', SE = ',round(SEdelta,2) )
[1] "delta = 1.39, SE = 0.13"
Bivariate dichotomization of naturally continuous variables
I somewhat rarer circumstances, researchers may dichotomize both x and y variables. This will attenuate correlation coefficients more than usual. If a tetrachoric correlation is available, this is more analogous to a pearson correlation coefficient than the correction below.
Correlation Coefficient (r)
Point EstimateNote:\(\Phi\)indicates the normal ordinate of a standard normal distribution \[\displaystyle{ \hat{\rho} = \frac{r\sqrt{p_x p_y (1-p_x) (1-p_y)}}{\Phi(c_x)\Phi(c_y)} }\]
Standard Error \[\displaystyle{ se_{\hat{\rho}} = \frac{se_{r}\sqrt{p_x p_y(1-p_x)(1-p_y)}}{\Phi(c_x)\Phi(c_y)} }\]
# Parametersr =0.50# observed correlationSEr =0.10# standard error of observed correlationpx =0.50# proportion of individuals in upper or lower group of Xpy =0.50# proportion of individuals in upper or lower group of Y# Point Estimatecx =qnorm(px)# Find cut point on Y variable cy =qnorm(py)# Find cut point on Y variablerho = r *sqrt(px * py * (1- px) * (1- py) ) / (dnorm(cx)*dnorm(cy))# Standard ErrorSErho = SEr *sqrt(px * py * (1- px) * (1- py) ) / (dnorm(cx)*dnorm(cy))# Print Resultspaste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )
[1] "rho = 0.79, SE = 0.16"
Range Restriction
Range restriction is a phenomenon that occurs when the variability in a sample is limited compared to the larger population it aims to represent. This limitation can lead to biased estimates of effect size. For example, in the context of college admissions, the SAT is frequently utilized as a factor in the selection process. Consequently, a sample of admitted college students may exhibit less variation in SAT scores compared to the entire pool of applicants. When studying the SAT’s predictive ability for first-year GPA, this range restriction can cause an underestimation of the correlation. Correction formulas have been developed to mitigate this issue.
Univariate Direct Range Restriction
In certain situations, the selection of participants can be identical to one of the variables of interest. For instance, if a study is investigating the correlation between school grades and IQ in students with an intellectual disability, and the diagnosis is defined as having an IQ score of less than 70, then the sample would exhibit direct range restriction.
# Parametersr =0.50# observed correlationSEr =0.10# standard error of observed correlationux =0.85# ratio of observed standard deviation to reference standard deviation (ux = SDsample/SDreference)# Point Estimaterho = r / (ux *sqrt( r^2* (1/ux^2-1) +1))# Standard ErrorSErho = SEr / (ux *sqrt( r^2* (1/ux^2-1) +1))# Print Resultspaste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )
[1] "rho = 0.56, SE = 0.11"
Standardized Mean Difference (d)
Point Estimate
-Step 1. Transform d to r using probability of observed group membership (\(p_g\)): \[\displaystyle{ r = \frac{d}{ \sqrt{\frac{1}{p_g ( 1-p_g ) + d^2}}} }\]
-Step 2. Correct r for direct range restriction: \[\displaystyle{ \hat{\rho} = \frac{r}{u_x \sqrt{r^2 \left(\frac{1}{u_x^2} - 1\right) +1}} }\]
-Step 3. Back-transform \(\hat{\rho}\) to \(\hat{\delta}\) using probability of actual group membership (\(p_G\)). Observed group membership (\(p_g\)) can be used instead assuming equal misclassification rate between groups: \[\displaystyle{ \hat{\delta} = \frac{\hat{\rho}}{p_G (1-p_G )(1-\hat{\rho}^2)} }\]
# Parametersd =0.50# observed correlationSEd =0.50# standard error of observed correlationux =0.85# ratio of observed standard deviation to reference standard deviation (ux = SDsample/SDreference)pg =0.50# observed proportion of individuals in group 1 or 2pG =0.50# actual proportion of individuals in group 1 or 2 (pG=pg when misclassification rate is equal among groups)# Point Estimater = d /sqrt(1/ (pg * (1- pg)) ) # step 1rho = r / (ux *sqrt( r^2* (1/ux^2-1) +1)) # step 2delta = rho / ( pG * (1- pG) * (1- rho^2) ) # step 3# Standard ErrorSEdelta = SEd * (rho / r) / ( (1+ d^2* pg * (1- pg)) *sqrt( (d^2+1/(pg*(1-pg))) * ( pG * (1- pG) * (1- rho^2)^3) ) )
Univariate Indirect Range Restriction
When the selection process is correlated with one of the variables of interest then the resulting sample with have a reduced variance due to indirect range restriction. For example, suppose a company is hiring employees based on their performance in a test that is correlated with their IQ. If the company only hires employees who score above a certain threshold on the test, then the range of IQ scores in the selected sample will be indirectly restricted. This is because the IQ scores of the selected employees will be higher than the IQ scores of the general population due to the correlation between the test scores and IQ.
Standard Error\[\displaystyle{se_{\rho} = \frac{se_{r}}{\sqrt{r^2 + u_x^2 (1 - r^2)}} }\]
# Parametersr =0.50# observed correlationSEr =0.10# standard error of observed correlationux =0.85# ratio of observed standard deviation to reference standard deviation (ux = SDsample/SDreference)# Point Estimaterho = r /sqrt( r^2+ ux^2* (1- r^2))# Standard ErrorSErho = SEr /sqrt( r^2+ ux^2* (1- r^2))# Print Resultspaste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )
[1] "rho = 0.56, SE = 0.11"
Standardized Mean Difference (d)
Point Estimate
-Step 1. Transform d to r using probability of observed group membership (\(p_g\)): \[r = \frac{d}{ \sqrt{\frac{1}{p_g ( 1-p_g ) + d^2}}} \]
-Step 2. Correct r for direct range restriction: \[\hat{\rho} = \frac{r}{\sqrt{r^2 + u_x^2 (1 - r^2)}} \]
-Step 3. Back-transform \(\hat{\rho}\) to \(\hat{\delta}\) using probability of actual group membership (\(p_G\)). Observed group membership (\(p_g\)) can be used instead assuming equal misclassification rate between groups: \[\hat{\delta} = \frac{\hat{\rho}}{p_G (1-p_G )(1-\hat{\rho}^2)} \]
# Parametersd =0.50# observed standardized mean differenceSEd =0.10# standard error of observed standardized mean differenceux =0.85# ratio of observed standard deviation to reference standard deviation (ux = SDsample/SDreference)pg =0.50# observed proportion of individuals in group 1 or 2pG =0.50# actual proportion of individuals in group 1 or 2 (pG=pg when misclassification rate is equal among groups)# Point Estimater = d /sqrt(1/ (pg * (1- pg)) + d^2) # step 1rho = r /sqrt( r^2+ ux^2* (1- r^2)) # step 2delta = rho /sqrt( pG * (1- pG) * (1- rho^2) ) # step 3# Standard ErrorSEdelta = SEd * (rho / r) / ( (1+ d^2* pg * (1- pg)) *sqrt( (d^2+1/(pg*(1-pg))) * ( pG * (1- pG) * (1- rho^2)^3) ) )# Print Resultspaste0('delta = ',round(delta,2),', SE = ',round(SEdelta,2) )
[1] "delta = 0.59, SE = 0.12"
Bivariate Direct Range Restriction
In some instances, direct selection can be placed on both x and y variables. This may happen in instances where a researcher requires subjects to be within the “normal” range of x and y, which tends to restrict the range by excluding individuals at the tails of x and y.
-Step 2. Correct r for bivariate direct range restriction: \[\displaystyle{ \hat{\rho} = -\Gamma + \mathrm{sign} (r) \sqrt{\Gamma^2 + 1} }\]
Standard Error \[\displaystyle{se_{\rho} = se_{r} \left( \frac{\hat{\rho}}{r} \right) }\]
# Parametersr =0.50# observed correlation between x and ySEr =0.10# standard error of observed correlation between x and yux =0.85# ratio of observed standard deviation of x to reference standard deviation of x (ux = SDsample/SDreference)uy =0.80# ratio of observed standard deviation of y to reference standard deviation of y (uy = SDsample/SDreference)# Point EstimateGamma = (1- r^2) / (2*r) * ux * uy # step 1rho =-Gamma +sign(r) *sqrt(Gamma^2+1)# step 2# Standard ErrorSErho = SEr * (rho / r)# Print Resultspaste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )
[1] "rho = 0.61, SE = 0.12"
Bivariate Indirect Range Restriction
When the selection process is correlated with both of the variables of interest then the resulting sample with have a reduced variance due to indirect range restriction. This is often the case in college admissions testing where both the predictor (e.g., SAT scores) and the outcome variable (e.g., first year GPA) are correlated with the college admissions process.
-Step 2. Correct r for bivariate indirect range restriction: \[\displaystyle{ \hat{\rho} = r u_x u_y + \lambda \sqrt{\left|1-u_x^2 \right|\left|1-u_y^2 \right|} }\]
Standard Error
-Step 1. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(u_x\)\[\displaystyle{ \beta_1 = u_y r - \frac{\lambda u_x (1 - u_x^2) \sqrt{\left|1 - u_x^2 \right|}}{\sqrt{\left|1 - u_y^2 \right|^3}} }\]
-Step 2. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(u_y\)\[\displaystyle{ \beta_2 = u_x r - \frac{\lambda u_y (1 - u_y^2) \sqrt{\left|1 - u_y^2 \right|}}{\sqrt{\left|1 - u_x^2 \right|^3}} }\]
-Step 3. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(r\)\[\displaystyle{ \beta_3 = u_x u_y }\] -Step 4. Calculate standard error for \(u_x\) \[\displaystyle{ se_{u_x} = u_x \sqrt{ \frac{1}{2(n-1)} + \frac{1}{2(n_{\mathrm{ref}} -1)}} }\] -Step 5. Calculate standard error for \(u_y\) (note: sample size for reference sample is \(n_\mathrm{ref}\)): \[\displaystyle{ se_{u_y} = u_y \sqrt{ \frac{1}{2(n-1)} + \frac{1}{2(n_{\mathrm{ref}} -1)}} }\] -Step 6. Calculate standard error for \(r\) \[\displaystyle{ se_r = \frac{1-r^2}{\sqrt{n-1}} }\] -Step 7. Calculate standard error of \(\hat{\rho}\) using a Taylor Series Approximation \[\displaystyle{se_{\hat{\rho}} \approx \sqrt{b_1^2 se_{u_x}^2 + b_2^2 se_{u_y}^2 + b_3^2 se_{r}^2 } }\]
# Parametersr =0.50# observed correlation between x and ySEr =0.10# standard error for observed correlation between x and yrxx =0.70# reliability of x within study sampleryy =0.80# reliability of y within study sampleux =0.85# observed u-ratio of x uy =0.80# observed u-ratio of yrsy =1# direction of correlation between selector and y (-1 = negative, 0 = no correlation, 1 = positive)rsx =1# direction of correlation between selector and x (-1 = negative, 0 = no correlation, 1 = positive)na =200n =100# Point Estimatelambda =sign( rsx * rsy * (1-ux) * (1-uy) ) * ( sign(1- ux) *min(c(ux,1/ux)) +sign(1- uy) *min(c(uy,1/uy)) ) / ( min(c(ux,1/ux)) +min(c(uy,1/uy)) )rho = r * ux * uy + lambda *sqrt( abs(1- ux^2) *abs(1- uy^2) )# Standard Error (Taylor Series Approximation)b1 = r * uy - ( lambda * ux *(1- ux^2) *sqrt( abs(1- uy^2) ) ) /sqrt(abs(1- uy^2)^3)# First order partial derivitive of uxb2 = r * ux - ( lambda * uy *(1- uy^2) *sqrt( abs(1- ux^2) ) ) /sqrt(abs(1- ux^2)^3)# First order partial derivitive of uyb3 = ux*uy# First order partial derivitive of rSEux = ux *sqrt( 1/ (2*(n-1)) +1/ (2*(na-1)) )SEuy = uy *sqrt( 1/ (2*(n-1)) +1/ (2*(na-1)) )SErho =sqrt( b1^2* SEux^2+ b2^2* SEuy^2+ b3^2* SEr^2 )# Print Resultspaste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )
[1] "rho = 0.66, SE = 0.08"
Range Restriction and Measurement Error
Range restriction and measurement error can act in tandem to bias effect size estimates. When correcting for both range restriction and measurement error, reliability and u-ratios are affected by each other, that is, measurement error variance contributes to the overall variance affecting u and restricted range affects the reliability coefficients. Therefore reliability and u-ratios must be first disattenuated for each correction procedure.
Univariate Direct Range Restriction and Measurement Error
In certain situations, the selection of participants can be identical to one of the variables of interest. For instance, if a study is investigating the correlation between school grades and IQ in students with an intellectual disability, and the diagnosis is defined as having an IQ score of less than 70, then the sample would exhibit direct range restriction.
Standard Error \[\displaystyle{se_{\hat{\rho}} = se_r \left( \frac{\hat{\rho}}{r} \right)}\]
# Parametersr =0.50# observed correlationSEr =0.10# standard error of observed correlationrxx =0.80# reliability of xryy =0.70# reliability of yux =0.85# ratio of observed standard deviation of y to reference standard deviation of x (ux = SDsample/SDreference)# Point Estimaterho = r / ( ux *sqrt(1- ux^2* (1- rxx)) *sqrt( r^2* (1/ux^2-1) + ryy) )# Standard ErrorSErho = SEr * (rho / r)# Print Resultspaste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )
[1] "rho = 0.71, SE = 0.14"
Standardized Mean Difference (d)
Point Estimate
-Step 1. Calculate \(\phi_{gG}\) from the contingency table between observed and actual group membership: \[ \phi_{gG} = \sqrt{\frac{\chi^2}{n}} \]or\[\phi_{gG} \approx 1 - 2 \cdot p_{mis} \]
-Step 2. Transform d to r using probability of observed group membership (\(p_g\)): \[r = \frac{d}{ \sqrt{\frac{1}{p_g ( 1-p_g ) }+d^2}} \]
-Step 3. Correct r for direct range restriction: \[\hat{\rho} = \frac{r}{u_x \sqrt{1 - u_x^2 (1-\phi_{gG}) } \sqrt{r^2 \left(\frac{1}{u_x^2} - 1\right)+r_{yy'} } } \]
-Step 4. Back-transform \(\hat{\rho}\) to \(\hat{\delta}\) using probability of actual group membership (\(p_G\)). Observed group membership (\(p_g\)) can be used instead assuming equal misclassification rate between groups: \[\displaystyle{ \hat{\delta} = \frac{\hat{\rho}}{p_G (1-p_G )(1-\hat{\rho}^2)} }\]
# Parametersd =0.50# observed standardized mean differenceSEd =0.10# standard error of observed standardized mean differencen =100# sample sizeryy =0.80# reliability of yuy =0.85# ratio of observed standard deviation of y to reference standard deviation of y (uy = SDsample/SDreference)pg =0.50# observed proportion of individuals in group 1 or 2pG =0.50# actual proportion of individuals in group 1 or 2 (pG=pg when misclassification rate is equal among groups)chi2 =36# chi squared statistic for actual vs observed group contingency table #p_mis = 0.20 # proportion of individuals misclassified (only needed if equal misclassification rate is assumed, and alternative phi calculation)# Point Estimatephi =sqrt(chi2 / n) # step 1# phi = 1 - 2 * p_mis # step 1 (alternative)r = d /sqrt(1/ (pg * (1- pg)) + d^2) # step 2rho = r / ( uy *sqrt(1- uy^2* (1- phi)) *sqrt( r^2* (1/uy^2-1) + ryy) ) # step 3delta = rho /sqrt( pG * (1- pG) * (1- rho^2) ) # step 4# Standard ErrorSEdelta = SEd * (rho / r) / ( (1+ d^2* pg * (1- pg)) *sqrt( (d^2+1/(pg*(1-pg))) * ( pG * (1- pG) * (1- rho^2)^3) ) )# Print Resultspaste0('delta = ',round(delta,2),', SE = ',round(SEdelta,2) )
[1] "delta = 0.8, SE = 0.18"
Univariate Indirect Range Restriction and Measurement Error
When the selection process is correlated with one of the variables of interest then the resulting sample with have a reduced variance due to indirect range restriction. For example, suppose a company is hiring employees based on their performance in a test that is correlated with their IQ. If the company only hires employees who score above a certain threshold on the test, then the range of IQ scores in the selected sample will be indirectly restricted. This is because the IQ scores of the selected employees will be higher than the IQ scores of the general population due to the correlation between the test scores and IQ.
Standard Error \[\displaystyle{se_{\hat{\rho}} = se_r \left( \frac{\hat{\rho}}{r} \right)}\]
# Parametersr =0.50# observed correlationSEr =0.10# standard error of observed correlationrxx =0.70# reliability of xryy =0.80# reliability of yux =0.85# ratio of observed standard deviation of x to reference standard deviation of x (ux = SDsample/SDreference)uy =0.80# ratio of observed standard deviation of y to reference standard deviation of y (uy = SDsample/SDreference)# Point Estimaterho = r /sqrt(r^2+ ux^2* rxx * (rxx * ryy - r^2) / (1- ux^2* (1-rxx)) )# Standard ErrorSErho = SEr * (rho / r)# Print Resultspaste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )
[1] "rho = 0.75, SE = 0.15"
Standardized Mean Difference (d)
Point Estimate
-Step 1. Calculate \(\phi_{gG}\) from the contingency table between observed and actual group membership: \[\phi_{gG} = \sqrt{\frac{\chi^2}{n}} \]or\[\phi_{gG} \approx 1 - 2 \cdot p_{mis} \]
-Step 2. Transform d to r using probability of observed group membership (\(p_g\)): \[ r = \frac{d}{ \sqrt{\frac{1}{p_g ( 1-p_g )} + d^2}} \]
-Step 3. Correct r for direct range restriction: \[\hat{\rho} = \frac{r}{\sqrt{r^2 + \frac{u_x^2 r_{xx'}(r_{xx'}r_{yy'}-r^2)}{1 - u_x^2 (1-r_{xx'})}}}\]
-Step 4. Back-transform \(\hat{\rho}\) to \(\hat{\delta}\) using probability of actual group membership (\(p_G\)). Observed group membership (\(p_g\)) can be used instead assuming equal misclassification rate between groups: \[\hat{\delta} = \frac{\hat{\rho}}{p_G (1-p_G )(1-\hat{\rho}^2)} \]
# Parametersd =0.50# observed standardized mean differenceSEd =0.10# standard error of observed standardized mean differenceryy =0.80# reliability of yux =0.85# ratio of observed standard deviation of x to reference standard deviation of x (ux = SDsample/SDreference)uy =0.80# ratio of observed standard deviation of y to reference standard deviation of y (uy = SDsample/SDreference)pg =0.50# observed proportion of individuals in group 1 or 2pG =0.50# actual proportion of individuals in group 1 or 2 (pG=pg when misclassification rate is equal among groups)chi2 =36# chi squared statistic for actual vs observed group contingency table n =100# sample size# Point Estimatephi =sqrt(chi2 / n) # step 1# phi = 1 - 2 * p_mis # step 1 (alternative)r = d /sqrt(1/ (pg * (1- pg)) + d^2)# step 2rho = r /sqrt(r^2+ ux^2* rxx * (rxx * ryy - r^2) / (1- ux^2* (1-rxx)) ) # step 3delta = rho /sqrt( pG * (1- pG) * (1- rho^2) ) # step 4# Standard ErrorSEdelta = SEd * (rho / r) / ( (1+ d^2* pg * (1- pg)) *sqrt( (d^2+1/(pg*(1-pg))) * ( pG * (1- pG) * (1- rho^2)^3) ) )# Print Resultspaste0('delta = ',round(delta,2),', SE = ',round(SEdelta,2) )
[1] "delta = 0.85, SE = 0.19"
Bivariate Direct Range Restriction and Measurement Error
In some instances, direct selection can be placed on both x and y variables. This may happen in instances where a researcher requires subjects to be within the “normal” range of x and y, which tends to restrict the range by excluding individuals at the tails of x and y.
Step 2. Correct r for univariate direct range restriction: \[\displaystyle{\hat{\rho} = \frac{-\Gamma u_x u_y + \mathrm{sign}(r)\sqrt{\Gamma^2 + 1} }{\sqrt{1-u_x^2(1-r_{xx'})}\sqrt{1-u_y^2(1-r_{yy'})}} }\]
Standard Error\[\displaystyle{se_{\hat{\rho}} = se_r \left( \frac{\hat{\rho}}{r} \right)}\]
# Parametersr =0.50# observed correlation between x and ySEr =0.10# standard error of observed correlation between x and yrxx =0.80# reliability of xryy =0.70# reliability of yux =0.85# ratio of observed standard deviation of x to reference standard deviation of x (ux = SDsample/SDreference)uy =0.80# ratio of observed standard deviation of y to reference standard deviation of y (uy = SDsample/SDreference)# Point EstimateGamma = (1- r^2) / (2*r) * ux * uy # step 1rho = (-Gamma +sign(r) *sqrt(Gamma^2+1)) / (sqrt(1- ux^2* (1- rxx)) *sqrt(1- uy^2* (1- ryy)))# step 2# Standard ErrorSErho = SEr * (rho / r)# Print Resultspaste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )
[1] "rho = 0.74, SE = 0.15"
Bivariate Indirect Range Restriction and Measurement Error
When the selection process is correlated with both of the variables of interest then the resulting sample with have a reduced variance due to indirect range restriction. This is often the case in college admissions testing where both the predictor (e.g., SAT scores) and the outcome variable (e.g., first year GPA) are correlated with the college admissions process.
-Step 2. Correct r for bivariate indirect range restriction: \[\hat{\rho} = \frac{r u_x u_y + \lambda \sqrt{\left|1-u_x^2 \right|\left|1-u_y^2 \right|} }{\sqrt{1 - u_x^2 (1-r_{xx'})} \sqrt{1 - u_y^2 (1 - r_{yy'})}}\]
Standard Error
Step 1. Calculate the measurement quality index for X in the restricted sample: \[ q_x = \sqrt{r_{xx'}}\]
Step 2. Calculate the measurement quality index for Y in the restricted sample: \[ q_y = \sqrt{r_{yy'}}\]
Step 3. Estimated the measurement quality index for X in the unrestricted population: \[q_X = \sqrt{1 - u_x^2 (1 - r_{xx'})} \]
Step 4. Estimate the measurement quality index for Y in the unrestricted population: \[ q_Y = \sqrt{1 - u_y^2 (1 - r_{yy'})} \]
Step 5. Calculate first order partial derivative of \(\hat{\rho}\) with respect to \(\rho_{xx'}\)\[ \beta_1 = -\frac{u_x u_y r + \lambda \sqrt{\left(1 - u_x^2 \right)\left(1 - u_y^2 \right)} }{q_X^2 q_Y} \]
Step 6. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(\rho_{yy'}\)\[ \beta_2 = -\frac{u_x u_y r + \lambda \sqrt{\left|1 - u_x^2 \right|\left|1 - u_y^2 \right|} }{q_Y^2 q_X} \]
Step 7. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(u_x\)\[ \beta_3 = \frac{u_y r}{q_X q_Y} - \frac{\lambda u_x (1 - u_x^2) \sqrt{\left|1 - u_x^2 \right|}}{q_X q_Y\sqrt{\left|1 - u_y^2 \right|^3}}\]
Step 8. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(u_y\)\[\beta_4 = \frac{u_x r}{q_X q_Y} - \frac{\lambda u_y (1 - u_y^2) \sqrt{\left|1 - u_y^2 \right|}}{q_X q_Y\sqrt{\left|1 - u_x^2 \right|^3}} \]
Step 9. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(r\)\[\beta_5 = \frac{u_x u_y}{q_X q_Y} \]
Step 12. Calculate standard error for \(u_x\) (note: sample size for reference sample is \(n_\mathrm{ref}\)): \[\displaystyle{ se_{u_x} = u_x \sqrt{ \frac{1}{2(n-1)} + \frac{1}{2(n_{\mathrm{ref}} -1)}} }\]
Step 13. Calculate standard error for \(u_y\) (note: sample size for reference sample is \(n_\mathrm{ref}\)): \[\displaystyle{ se_{u_y} = u_y \sqrt{ \frac{1}{2(n-1)} + \frac{1}{2(n_{\mathrm{ref}} -1)}} }\]
Step 14. Calculate standard error for \(r\)\[\displaystyle{ se_r = \frac{1-r^2}{\sqrt{n-1}} }\]
Step 15. Calculate standard error of \(\hat{\rho}\) using a Taylor Series Approximation \[\displaystyle{se_{\hat{\rho}} \approx \sqrt{\beta_1^2 se_{q_X}^2 + \beta_2^2 se_{q_Y}^2 + \beta_3^2 se_{u_x}^2 + \beta_4^2 se_{u_y}^2 + \beta_5^2 se_{r}^2 } }\]
# Parameters neededr =0.50# observed correlation between x and ySEr =0.10# observed correlation between x and yrxx =0.80# reliability of x within study sampleryy =0.70# reliability of y within study samplen =200# sample sizena =100# sample size of reference sampleux =0.80# observed u-ratio of x uy =0.85# observed u-ratio of yrsy =1# direction of correlation between selector and y (-1 = negative, 0 = no correlation, 1 = positive)rsx =1# direction of correlation between selector and x (-1 = negative, 0 = no correlation, 1 = positive)# Point Estimatelambda =sign( rsx * rsy * (1-ux) * (1-uy) ) * ( sign(1- ux) *min(c(ux,1/ux)) +sign(1- uy) *min(c(uy,1/uy)) ) / ( min(c(ux,1/ux)) +min(c(uy,1/uy)) )rho = r * ux * uy + lambda *sqrt( abs(1- ux^2) *abs(1- uy^2) )# Standard Error (Taylor Series Approximation)qx =sqrt(rxx)qy =sqrt(ryy)qX =sqrt(ux^2* (rxx -1))
Warning in sqrt(ux^2 * (rxx - 1)): NaNs produced
qY =sqrt(uy^2* (ryy -1))
Warning in sqrt(uy^2 * (ryy - 1)): NaNs produced
# First order partial derivitive of qXb1 =- ( ux * uy * r + lambda *sqrt(abs(1- ux^2) *abs(1- uy^2)) ) / (qX^2* qY) # First order partial derivitive of qYb2 =- ( ux * uy * r + lambda *sqrt(abs(1- ux^2) *abs(1- uy^2)) ) / (qY^2* qX) # First order partial derivitive of uxb3 = (r * uy) / (qX * qY) - ( lambda * ux *(1- ux^2) *sqrt( abs(1- uy^2) ) ) / (qX * qY *sqrt(abs(1- uy^2)^3)) # First order partial derivitive of uyb4 = (r * ux) / (qX * qY) - ( lambda * uy *(1- uy^2) *sqrt( abs(1- ux^2) ) ) / (qX * qY *sqrt(abs(1- ux^2)^3)) # First order partial derivitive of rb5 = (ux*uy) / (qX * qY)# Standard error of qXSEqX =sqrt( .5* ux^4* ( (1-qx^2)^2/ (1-ux^2* (1-qx^2)) ) * (1/(n-1) +1/(na-1)) + ( (ux^2* qx^2* (1-qx^2)^2) ) / ((1- ux^2* (1-qx^2)) * (n-1)) )# Standard error of qYSEqY =sqrt( .5* uy^4* ( (1-qy^2)^2/ (1-uy^2* (1-qy^2)) ) * (1/(n-1) +1/(na-1)) + ( (uy^2* qy^2* (1-qy^2)^2) ) / ((1- uy^2* (1-qy^2)) * (n-1)) )# Standard error of uxSEux = ux *sqrt( 1/ (2*(n-1)) +1/ (2*(na-1)) )# Standard error of uySEuy = uy *sqrt( 1/ (2*(n-1)) +1/ (2*(na-1)) )# Standard error of rSEr = (1- r^2) /sqrt(n -1)# Taylor series approximationSErho =sqrt( b1^2* SEux^2+ b2^2* SEuy^2+ b3^2* SEr^2 )# Print Resultspaste0('rho = ',round(delta,2),', SE = ',round(SErho,2) )
[1] "rho = 0.85, SE = NaN"
References
Cite This Page
APA: Jané, M. B. (2023). Artifact corrections for effect sizes: equations, code, and interactive app. https://www.matthewbjane.com/ArtifactCorrections/, accessed DD-MM-YYYY
BibTeX:
@misc{,
title = "Artifact corrections for effect sizes: equations, code, and interactive app",
author = "Matthew B. Jan{\'{e}}",
howpublished = 0.50"\url{https://matthewbjane.github.io/artifact-corrections/}",
year= 2023
note= "accessed DD-MM-YYYY"
}
References
Dahlke, Jeffrey A., and Brenton M. Wiernik. 2019. “Psychmeta: An R Package for Psychometric Meta-Analysis.”Applied Psychological Measurement 43 (5): 415–16. https://doi.org/10.1177/0146621618795933.
Hunter, John E., and Frank L. Schmidt. 1990. Methods of meta-analysis: correcting error and bias in research findings. Newbury Park: Sage Publications.
Wiernik, Brenton M., and Jeffrey A. Dahlke. 2020. “Obtaining Unbiased Results in Meta-Analysis: The Importance of Correcting for Statistical Artifacts.”Advances in Methods and Practices in Psychological Science 3 (1): 94–123. https://doi.org/10.1177/2515245919885611.