Artifact Corrections for Effect Sizes

Author

Matthew B. Jané

Published

August 2, 2023

Effect sizes are indices that help researchers, stakeholders, and policymakers understand the relationship between variables and draw meaningful conclusions from data. However, the usefulness of an effect size is only as good as its estimation. To ensure the accuracy of effect size estimates, it is important to mitigate the attenuation induced by various statistical artifacts, such as measurement error and range restriction. This page provides documentation on these artifacts and provides corrections that can be applied to attenuated effect sizes to obtain unbiased estimates of the true population effect size. Equations and R code are provided for each correction. For additional information on these corrections and their application in meta-analysis, consult the book by Hunter and Schmidt (1990) and the paper by Wiernik and Dahlke (2020) . The {psychmeta} package (Dahlke and Wiernik 2019) contains many of these corrections with convenient implementation in R.

Interactive Visualization

To visualize how various statistical artifacts can bias effect size estimates, play around with this shiny app!

Artifact Simulator Shiny App

Small Samples

Effect sizes estimated from small samples often show systematic bias. The following corrections for r and d are both very close approximations to the more complex unbiased estimators that area bit more computationally expensive. It is often suggested that they should only be applied in situations with low sample sizes (e.g., \(n < 20\)), however these corrections can be applied at any sample size. This is because \(n\) is built into the correction factor therefore the magnitude of the correction is negligable in large sample sizes. The following corrections can be applied prior any of the other corrections on this page.

Small Samples

Below are the correction factors that may be applied to the correlation coefficient (r) and the standardized mean difference (d). Note that the correction for r is not conventionally used as it tends to be a very small adjustment.

Correlation Coefficient (r)

Point Estimate \[\displaystyle{ \hat{\rho} = r \cdot \left( 1 + \frac{1-r^2}{2(n-4)} \right) }\]

Standard Error \[\displaystyle{ se_{\hat{\rho}} = se_{r}\cdot \left( 1 + \frac{1-r^2}{2(n-4)} \right)}\]

# Parameters needed 
r =  0.50 # observed correlation between x and y 
n =  20 # sample size
SEr = (1 - r^2) / sqrt(n - 1)  # standard error of observed correlation between x and y 

#Point Estimate 
rho = r * (1 + (1 - r^2) / (2 * (n - 4))) 

#Standard Error
SErho = SEr *  (1 + (1 - r^2) / (2 * (n - 4)))

# Print Results
paste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )

[1] "rho = 0.51, SE = 0.18"

Standardized Mean Difference (d)

Point Estimate \[\displaystyle{ \hat{\delta} = d \left( 1-\frac{3}{4n-9}\right) }\]

Standard Error \[\displaystyle{ se_{\hat{\delta}} = se_{d}\left( 1-\frac{3}{4n-9}\right) }\]

# Parameters needed 
d = 0.50 # observed standardized mean difference 
n = 20 # total sample size
SEd = 0.10 # standard error of observed standardized mean difference 

# Point Estimate
delta = d * ( 1 - 3 / (4 * (n - 9)) ) 

# Standard Error
SEdelta = SEd * ( 1 - 3 / (4 * (n - 9)) ) 

# Print Results
paste0('delta = ',round(delta,2),', SE = ',round(SEdelta,2) )

[1] "delta = 0.47, SE = 0.09"

Measurement Error

Measurement error is ubiquitous in science and it can cause severe bias in standardized parameter estimates. Measurement error is often thought of as random error/noise, however this is generally not the case. Measurement error variance can be attributed to many factors such as transient error (e.g., fluctuations in fatigue or mood), test item interpretations (e.g., varied interpretation of the levels in a Likert scale), extraneous response biases (e.g., participants’ tendency to select socially desirable options). Different estimators of reliability coefficients will capture these different sources of measurement error. For example, test-retest reliability will capture various transient errors that will not be captured by an internal consistency reliability coefficient (i.e., Cronbach’s alpha, McDonald’s omega). To avoid over/under-correcting effect size estimates, researchers should take great care in selecting the appropriate reliability coefficient for their research design.

Univariate measurement error in continuous variables

Measurement error will likely exist in both variables under invistigation (i.e., \(x\) and \(y\)), however applying a correction may depend on the research question. For example, if you would like to know how related two psychological constructs are, correcting both variables for measurement error is appropriate, however it may not be appropriate if you would like to use one variable to predict the other (e.g., exam scores to predict college grades). Since observed scores are all that is available to us, therefore correcting the predictor variable for measurement error will not capture its real world predictive utility.

Correlation Coefficient (r)

Point Estimate \[\displaystyle{ \hat{\rho} = \frac{r}{\sqrt{r_{xx'}}} }\]

Standard Error \[\displaystyle{ se_{\hat{\rho}} = \frac{se_{r}}{\sqrt{r_{xx'}}} }\]

# Parameters needed
r = 0.50 # observed correlation between x and y
SEr = 0.10 # standard error of observed correlation between x and y
rxx = 0.80 # reliability of x within sample

# Point Estimate
rho = r / sqrt(rxx)

# Standard Error
SErho = SEr / sqrt(rxx)

# Print Results
paste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )

[1] "rho = 0.56, SE = 0.11"

Standardized Mean Difference (d)

Point Estimate
\[\displaystyle{ \hat{\delta} = \frac{d}{\sqrt{r_{yy'}}} }\]

Standard Error
\[\displaystyle{ se_{\hat{\delta}} = \frac{se_{d}}{\sqrt{r_{yy'}}} }\]

# Parameters needed
d = 0.50 # observed standardized mean difference
SEd = 0.50 # standard error of observed standardized mean difference
ryy = 0.80 # reliability of y within sample

# Point Estimate
delta = d / sqrt(ryy)

# Standard Error
SEdelta = SEd / sqrt(ryy)

# Print Results
paste0('delta = ',round(delta,2),', SE = ',round(SEdelta,2) )

[1] "delta = 0.56, SE = 0.56"

Bivariate measurement error for continuous variables

Unreliability of \(x\) and \(y\) will bias a pearson correlation coefficient by adding random, uncorrelated noise into the bivariate relationship. If the goal is to understand the relationship between the true, uncontaminated scores between two things, then measurement error on both variables should be corrected for.

Correlation Coefficient (r)

Point Estimate
\[\displaystyle{ \hat{\rho} = \frac{r}{\sqrt{r_{xx'} r_{yy'} }} }\]

Standard Error
\[\displaystyle{ se_{\hat{\rho}} = \frac{se_{r}}{\sqrt{r_{xx'} r_{yy'} }} }\]

# Parameters needed
r = 0.50 # observed correlation
SEr = 0.10 # standard error of observed correlation
rxx = 0.80 # reliability of x within sample
ryy = 0.70 # reliability of y within sample

# Point Estimate
rho = r / sqrt(rxx * ryy)

# Standard Error
SErho = SEr / sqrt(rxx * ryy)

# Print Results
paste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )

[1] "rho = 0.67, SE = 0.13"

Misclassification

Misclassification encompasses measurement errors for categorical variables. For example, our ability to identify individuals with major depressive disorder is only as accurate as our measurement of depression, therefore if the measure for depression contains measurement error then so will the assignment of individuals in the major depressive group and the control group (i.e., some people with major depressive disorder will be mis-labeled as a control and vice versa). To correct for group misclassification, you must first have an estimate of the phi coefficient (\(\phi_{gG}\)) from the 2x2 contingency table comparing actual vs observed group membership. Phi can also be approximated directly from the misclassification rate (\(p_{mis}\)), however this may cause undercorrections when misclassification rates differ between groups.

Standardized Mean Difference (d)

Point Estimate

-Step 1. Calculate \(\phi\) coefficient from the contingency table between actual group membership (G) and observed group membership (g):
\[\displaystyle{ \phi_{gG} = \sqrt{\frac{\chi_{gG}^{2}}{n}} }\] or \[\displaystyle{\phi_{gG} \approx 1 - 2 \cdot p_{mis} }\]

-Step 2. Transform d to r using probability of observed group membership (\(p_g\)): \[\displaystyle{ r = \frac{d}{ \sqrt{\frac{1}{p_g ( 1-p_g ) + d^2}}} }\]

-Step 3. Dissatenuate correlation for misclassification:
\[\displaystyle{ \hat{\rho} = \frac{r}{\phi_{gG}} }\]

-Step 4. Back-transform \(\hat{\rho}\) to \(\hat{\delta}\) using probability of actual group membership (\(p_G\)). Observed group membership (\(p_g\)) can be used instead assuming equal misclassification rate between groups:
\[\displaystyle{ \hat{\delta} = \frac{\hat{\rho}}{p_G (1-p_G )(1-\hat{\rho}^2)} }\]

Standard Error

\[\displaystyle{ se_{\hat{\delta}} = \frac{se_d \left( \frac{\hat{\rho}}{r} \right) }{\left( 1+d^2 p_g[1-p_g] \right)\sqrt{\left(d^2 + \frac{1}{p_g(1-p_g)}\right) p_G (1-p_G)(1-\hat{\rho}^2)^3} } }\]

# Parameters needed
d = 0.50 # observed standardized mean difference
SEd = 0.50 # standard error of observed standardized mean difference
rxx = 0.80 # reliability of x within sample
ryy = 0.70 # reliability of y within sample
pg = 0.50 # observed proportion of individuals in group 1 or 2
pG = 0.50 # actual proportion of individuals in group 1 or 2 (pG=pg when misclassification rate is equal among groups)
n = 50 # sample size
chi2 = 32 # chi squared statistic for actual vs observed group contingency table
# p_mis = .20 # misclassification rate (only if alternative step 1 is used)

# Point Estimate
phi = sqrt(chi2 / n)# step 1
# phi = 1 - 2 * p_mis  # step 1 (assume equal misclassification)
r = d / sqrt(1 / (pg * (1 - pg)) ) # step 2
rho = r / phi # step 3
delta = rho / ( pG * (1 - pG) * (1 - rho^2) )  # step 4

# Standard Error
SEdelta = SEd * (rho / r) / ( (1 + d^2 * pg * (1 - pg)) * sqrt( (d^2 + 1/(pg*(1-pg))) * ( pG * (1 - pG) * (1 - rho^2)^3) ) )

# Print Results
paste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )

[1] "rho = 0.31, SE = 0.13"

Univariate dichotomization of naturally continuous variables

In some cases, researchers may categorize naturally continuous variables into two groups in order to facilitate interpretation or to perform specific statistical analyses. For example, congenital amusia, also known as tone deafness, is often diagnosed by scoring below a predetermined cutoff in a pitch or melodic discrimination task. This cutoff is arbitrary (although potentially useful) since pitch discrimination ability typically follows a normal distribution. Before someone can adjust the effect sizes for this artificial dichotomization, it is necessary to determine the proportions of participants above or below the cutoff (\(p_x\)) as well as the cutoff value (\(c_{x}\)), which can be estimated using the quantile function of a standard normal distribution at \(p_{x}\).

Correlation Coefficient (r)

Point Estimate Note: \(\Phi\) indicates the normal ordinate of a standard normal distribution
\[\displaystyle{ \hat{\rho} = \frac{r\sqrt{p_x(1-p_x)}}{\Phi(c_x)} }\]

Standard Error
\[\displaystyle{ se_{\hat{\rho}} = \frac{se_{r}\sqrt{p_x(1-p_x)}}{\Phi(c_x)} }\]

# Parameters
r = 0.50 # observed correlation
SEr = 0.10 # standard error of observed correlation
px = 0.50 # proportion of individuals in upper or lower group of X


# Point Estimate
cx = qnorm(px) # Find cut point 
rho = r * sqrt(px * (1 - px)) / dnorm(cx)

# Standard Error
SErho = SEr * sqrt(px * (1 - px)) / dnorm(cx)

# Print Results
paste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )

[1] "rho = 0.63, SE = 0.13"

Standardized Mean Difference (d)

Point Estimate - Step 1. Transform d to r using probability of observed group membership (\(p_g\)):
\[\displaystyle{ r = \frac{d}{ \sqrt{\frac{1}{p_g ( 1-p_g ) + d^2}}} }\]

-Step 2. Disattenuate correlation for artificial dichotomization:
\[\displaystyle{ \displaystyle{ \hat{\rho} = \frac{r\sqrt{p_x(1-p_x)}}{\Phi(c_x)} } }\]

-Step 3. Back-transform \(\hat{\rho}\) to \(\hat{\delta}\) using probability of actual group membership (\(p_G\)). Observed group membership (\(p_g\)) can be used instead assuming equal misclassification rate between groups:
\[\displaystyle{ \hat{\delta} = \frac{\hat{\rho}}{p_G (1-p_G )(1-\hat{\rho}^2)} }\]

Standard Error

\[\displaystyle{ se_{\hat{\delta}} = \frac{se_d \left( \frac{\hat{\rho}}{r} \right) }{\left( 1+d^2 p_g[1-p_g] \right)\sqrt{\left(d^2 + \frac{1}{p_g(1-p_g)}\right) p_G (1-p_G)(1-\hat{\rho}^2)^3} } }\]

# Parameters
d = 0.50 # observed standardized mean difference
SEd = 0.10 # standard error of observed standardized mean difference
px = 0.50 # proportion of individuals in upper or lower group of X
pg = 0.50 # observed proportion of individuals in group 1 or 2
pG = 0.50 # actual proportion of individuals in group 1 or 2 (pG=pg when misclassification rate is equal among groups)

# Point Estimate
cx = qnorm(px)# Find cut point
r = d / sqrt(1 / (pg * (1 - pg)) ) # step 1
rho = r * sqrt(px * (1 - px)) / dnorm(cx)# step 2
delta = rho / ( pG * (1 - pG) * (1 - rho^2) )  # step 3

# Standard Error
SEdelta = SEd * (rho / r) / ( (1 + d^2 * pg * (1 - pg)) * sqrt( (d^2 + 1/(pg*(1-pg))) * ( pG * (1 - pG) * (1 - rho^2)^3) ) )

# Print Results
paste0('delta = ',round(delta,2),', SE = ',round(SEdelta,2) )

[1] "delta = 1.39, SE = 0.13"

Bivariate dichotomization of naturally continuous variables

I somewhat rarer circumstances, researchers may dichotomize both x and y variables. This will attenuate correlation coefficients more than usual. If a tetrachoric correlation is available, this is more analogous to a pearson correlation coefficient than the correction below.

Correlation Coefficient (r)

Point Estimate Note: \(\Phi\) indicates the normal ordinate of a standard normal distribution
\[\displaystyle{ \hat{\rho} = \frac{r\sqrt{p_x p_y (1-p_x) (1-p_y)}}{\Phi(c_x)\Phi(c_y)} }\]

Standard Error
\[\displaystyle{ se_{\hat{\rho}} = \frac{se_{r}\sqrt{p_x p_y(1-p_x)(1-p_y)}}{\Phi(c_x)\Phi(c_y)} }\]

# Parameters
r = 0.50 # observed correlation
SEr = 0.10 # standard error of observed correlation
px = 0.50 # proportion of individuals in upper or lower group of X
py = 0.50 # proportion of individuals in upper or lower group of Y

# Point Estimate
cx = qnorm(px)# Find cut point on Y variable 
cy = qnorm(py)# Find cut point on Y variable
rho = r * sqrt(px * py * (1 - px) * (1 - py) ) / (dnorm(cx)*dnorm(cy))

# Standard Error
SErho = SEr * sqrt(px * py * (1 - px) * (1 - py) ) / (dnorm(cx)*dnorm(cy))

# Print Results
paste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )

[1] "rho = 0.79, SE = 0.16"

Range Restriction

Range restriction is a phenomenon that occurs when the variability in a sample is limited compared to the larger population it aims to represent. This limitation can lead to biased estimates of effect size. For example, in the context of college admissions, the SAT is frequently utilized as a factor in the selection process. Consequently, a sample of admitted college students may exhibit less variation in SAT scores compared to the entire pool of applicants. When studying the SAT’s predictive ability for first-year GPA, this range restriction can cause an underestimation of the correlation. Correction formulas have been developed to mitigate this issue.

Univariate Direct Range Restriction

In certain situations, the selection of participants can be identical to one of the variables of interest. For instance, if a study is investigating the correlation between school grades and IQ in students with an intellectual disability, and the diagnosis is defined as having an IQ score of less than 70, then the sample would exhibit direct range restriction.

Correlation Coefficient (r)

Point Estimate
\[\displaystyle{ \hat{\rho} = \frac{r}{u_x \sqrt{r^2 \left(\frac{1}{u_x^2} - 1\right) +1}} }\]

Standard Error
\[\displaystyle{ se_{\hat{\rho}} = \frac{se_r}{u_x \sqrt{r^2 \left(\frac{1}{u_x^2} - 1\right) +1}} }\]

# Parameters
r = 0.50 # observed correlation
SEr = 0.10 # standard error of observed correlation
ux = 0.85 # ratio of observed standard deviation to reference standard deviation (ux = SDsample/SDreference)

# Point Estimate
rho = r / (ux * sqrt( r^2 * (1/ux^2 - 1) + 1))

# Standard Error
SErho = SEr / (ux * sqrt( r^2 * (1/ux^2 - 1) + 1))

# Print Results
paste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )

[1] "rho = 0.56, SE = 0.11"

Standardized Mean Difference (d)

Point Estimate

-Step 1. Transform d to r using probability of observed group membership (\(p_g\)): \[\displaystyle{ r = \frac{d}{ \sqrt{\frac{1}{p_g ( 1-p_g ) + d^2}}} }\]

-Step 2. Correct r for direct range restriction:
\[\displaystyle{ \hat{\rho} = \frac{r}{u_x \sqrt{r^2 \left(\frac{1}{u_x^2} - 1\right) +1}} }\]

Standard Error

\[\displaystyle{ se_{\hat{\delta}} = \frac{se_d \left( \frac{\hat{\rho}}{r} \right) }{\left( 1+d^2 p_g[1-p_g] \right)\sqrt{\left(d^2 + \frac{1}{p_g(1-p_g)}\right) p_G (1-p_G)(1-\hat{\rho}^2)^3} } }\]

# Parameters
d = 0.50 # observed correlation
SEd = 0.50 # standard error of observed correlation
ux = 0.85 # ratio of observed standard deviation to reference standard deviation (ux = SDsample/SDreference)
pg = 0.50 # observed proportion of individuals in group 1 or 2
pG = 0.50 # actual proportion of individuals in group 1 or 2 (pG=pg when misclassification rate is equal among groups)

# Point Estimate
r = d / sqrt(1 / (pg * (1 - pg)) ) # step 1
rho = r / (ux * sqrt( r^2 * (1/ux^2 - 1) + 1)) # step 2
delta = rho / ( pG * (1 - pG) * (1 - rho^2) )  # step 3

# Standard Error
SEdelta = SEd * (rho / r) / ( (1 + d^2 * pg * (1 - pg)) * sqrt( (d^2 + 1/(pg*(1-pg))) * ( pG * (1 - pG) * (1 - rho^2)^3) ) )

Univariate Indirect Range Restriction

When the selection process is correlated with one of the variables of interest then the resulting sample with have a reduced variance due to indirect range restriction. For example, suppose a company is hiring employees based on their performance in a test that is correlated with their IQ. If the company only hires employees who score above a certain threshold on the test, then the range of IQ scores in the selected sample will be indirectly restricted. This is because the IQ scores of the selected employees will be higher than the IQ scores of the general population due to the correlation between the test scores and IQ.

Correlation Coefficient (r)

Point Estimate \[\displaystyle{ \hat{\rho} = \frac{r}{\sqrt{r^2 + u_x^2 (1 - r^2)}} }\]

Standard Error \[\displaystyle{se_{\rho} = \frac{se_{r}}{\sqrt{r^2 + u_x^2 (1 - r^2)}} }\]

# Parameters
r = 0.50 # observed correlation
SEr = 0.10 # standard error of observed correlation
ux = 0.85 # ratio of observed standard deviation to reference standard deviation (ux = SDsample/SDreference)

# Point Estimate
rho = r / sqrt( r^2 + ux^2 * (1 - r^2))

# Standard Error
SErho = SEr / sqrt( r^2 + ux^2 * (1 - r^2))

# Print Results
paste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )

[1] "rho = 0.56, SE = 0.11"

Standardized Mean Difference (d)

Point Estimate

-Step 1. Transform d to r using probability of observed group membership (\(p_g\)): \[r = \frac{d}{ \sqrt{\frac{1}{p_g ( 1-p_g ) + d^2}}} \]

-Step 2. Correct r for direct range restriction:
\[\hat{\rho} = \frac{r}{\sqrt{r^2 + u_x^2 (1 - r^2)}} \]

-Step 3. Back-transform \(\hat{\rho}\) to \(\hat{\delta}\) using probability of actual group membership (\(p_G\)). Observed group membership (\(p_g\)) can be used instead assuming equal misclassification rate between groups:
\[\hat{\delta} = \frac{\hat{\rho}}{p_G (1-p_G )(1-\hat{\rho}^2)} \]

Standard Error
\[se_{\hat{\delta}} = \frac{se_d \left( \frac{\hat{\rho}}{r} \right) }{\left( 1+d^2 p_g[1-p_g] \right)\sqrt{\left(d^2 + \frac{1}{p_g(1-p_g)}\right) p_G (1-p_G)(1-\hat{\rho}^2)^3} }\]

# Parameters
d = 0.50 # observed standardized mean difference
SEd = 0.10 # standard error of observed standardized mean difference
ux = 0.85 # ratio of observed standard deviation to reference standard deviation (ux = SDsample/SDreference)
pg = 0.50 # observed proportion of individuals in group 1 or 2
pG = 0.50 # actual proportion of individuals in group 1 or 2 (pG=pg when misclassification rate is equal among groups)

# Point Estimate
r = d / sqrt(1 / (pg * (1 - pg)) + d^2) # step 1
rho = r / sqrt( r^2 + ux^2 * (1 - r^2))  # step 2
delta = rho / sqrt( pG * (1 - pG) * (1 - rho^2) )  # step 3

# Standard Error
SEdelta = SEd * (rho / r) / ( (1 + d^2 * pg * (1 - pg)) * sqrt( (d^2 + 1/(pg*(1-pg))) * ( pG * (1 - pG) * (1 - rho^2)^3) ) )

# Print Results
paste0('delta = ',round(delta,2),', SE = ',round(SEdelta,2) )

[1] "delta = 0.59, SE = 0.12"

Bivariate Direct Range Restriction

In some instances, direct selection can be placed on both x and y variables. This may happen in instances where a researcher requires subjects to be within the “normal” range of x and y, which tends to restrict the range by excluding individuals at the tails of x and y.

Correlation Coefficient (r)

Point Estimate

-Step 1. Define gamma:
\[\displaystyle{ \Gamma = u_x u_y \frac{1 - r^2}{2r}}\]

-Step 2. Correct r for bivariate direct range restriction:
\[\displaystyle{ \hat{\rho} = -\Gamma + \mathrm{sign} (r) \sqrt{\Gamma^2 + 1} }\]

Standard Error
\[\displaystyle{se_{\rho} = se_{r} \left( \frac{\hat{\rho}}{r} \right) }\]

# Parameters
r = 0.50 # observed correlation between x and y
SEr = 0.10 # standard error of observed correlation between x and y
ux = 0.85 # ratio of observed standard deviation of x to reference standard deviation of x (ux = SDsample/SDreference)
uy = 0.80  # ratio of observed standard deviation of y to reference standard deviation of y (uy = SDsample/SDreference)

# Point Estimate
Gamma = (1 - r^2) / (2*r) * ux * uy  # step 1
rho = -Gamma + sign(r) * sqrt(Gamma^2 + 1)# step 2

# Standard Error
SErho = SEr * (rho / r)

# Print Results
paste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )

[1] "rho = 0.61, SE = 0.12"

Bivariate Indirect Range Restriction

When the selection process is correlated with both of the variables of interest then the resulting sample with have a reduced variance due to indirect range restriction. This is often the case in college admissions testing where both the predictor (e.g., SAT scores) and the outcome variable (e.g., first year GPA) are correlated with the college admissions process.

Correlation Coefficient (r)

Point Estimate

-Step 1. Define lambda:
\[\displaystyle{ \lambda = \mathrm{sign} (r_{sx} r_{sy} [1 - u_x] [1 - u_y ])\frac{\mathrm{sign} (1-u_x) \mathrm{min} (u_x, 1/u_x) + \mathrm{sign} (1-u_y) \mathrm{min} (u_y, 1/u_y) }{\mathrm{min} (u_x, 1/u_x) + \mathrm{min} (u_y, 1/u_y)} }\]

-Step 2. Correct r for bivariate indirect range restriction:
\[\displaystyle{ \hat{\rho} = r u_x u_y + \lambda \sqrt{\left|1-u_x^2 \right|\left|1-u_y^2 \right|} }\]

Standard Error

-Step 1. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(u_x\) \[\displaystyle{ \beta_1 = u_y r - \frac{\lambda u_x (1 - u_x^2) \sqrt{\left|1 - u_x^2 \right|}}{\sqrt{\left|1 - u_y^2 \right|^3}} }\]

-Step 2. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(u_y\) \[\displaystyle{ \beta_2 = u_x r - \frac{\lambda u_y (1 - u_y^2) \sqrt{\left|1 - u_y^2 \right|}}{\sqrt{\left|1 - u_x^2 \right|^3}} }\]

-Step 3. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(r\) \[\displaystyle{ \beta_3 = u_x u_y }\] -Step 4. Calculate standard error for \(u_x\)
\[\displaystyle{ se_{u_x} = u_x \sqrt{ \frac{1}{2(n-1)} + \frac{1}{2(n_{\mathrm{ref}} -1)}} }\] -Step 5. Calculate standard error for \(u_y\) (note: sample size for reference sample is \(n_\mathrm{ref}\)): \[\displaystyle{ se_{u_y} = u_y \sqrt{ \frac{1}{2(n-1)} + \frac{1}{2(n_{\mathrm{ref}} -1)}} }\] -Step 6. Calculate standard error for \(r\)
\[\displaystyle{ se_r = \frac{1-r^2}{\sqrt{n-1}} }\] -Step 7. Calculate standard error of \(\hat{\rho}\) using a Taylor Series Approximation \[\displaystyle{se_{\hat{\rho}} \approx \sqrt{b_1^2 se_{u_x}^2 + b_2^2 se_{u_y}^2 + b_3^2 se_{r}^2 } }\]

# Parameters
r = 0.50 # observed correlation between x and y
SEr = 0.10 # standard error for observed correlation between x and y
rxx = 0.70  # reliability of x within study sample
ryy = 0.80 # reliability of y within study sample
ux  = 0.85 # observed u-ratio of x 
uy  = 0.80 # observed u-ratio of y
rsy = 1 # direction of correlation between selector and y (-1 = negative, 0 = no correlation, 1 = positive)
rsx = 1 # direction of correlation between selector and x (-1 = negative, 0 = no correlation, 1 = positive)
na = 200
n = 100

# Point Estimate
lambda = sign( rsx * rsy * (1-ux) * (1-uy) ) * ( sign(1 - ux) * min(c(ux,1/ux)) + sign(1 - uy) * min(c(uy,1/uy)) ) / ( min(c(ux,1/ux)) + min(c(uy,1/uy)) )
rho = r * ux * uy + lambda * sqrt( abs(1 - ux^2) * abs(1 - uy^2) )

# Standard Error (Taylor Series Approximation)
b1  = r * uy - ( lambda * ux *(1 - ux^2) * sqrt( abs(1 - uy^2) ) ) / sqrt(abs(1 - uy^2)^3)# First order partial derivitive of ux
b2  = r * ux - ( lambda * uy *(1 - uy^2) * sqrt( abs(1 - ux^2) ) ) / sqrt(abs(1 - ux^2)^3)# First order partial derivitive of uy
b3  = ux*uy# First order partial derivitive of r

SEux = ux * sqrt( 1 / (2*(n-1)) + 1 / (2*(na-1)) )
SEuy = uy * sqrt( 1 / (2*(n-1)) + 1 / (2*(na-1)) )

SErho = sqrt( b1^2 * SEux^2 + b2^2 * SEuy^2 + b3^2 * SEr^2 )

# Print Results
paste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )

[1] "rho = 0.66, SE = 0.08"

Range Restriction and Measurement Error

Range restriction and measurement error can act in tandem to bias effect size estimates. When correcting for both range restriction and measurement error, reliability and u-ratios are affected by each other, that is, measurement error variance contributes to the overall variance affecting u and restricted range affects the reliability coefficients. Therefore reliability and u-ratios must be first disattenuated for each correction procedure.

Univariate Direct Range Restriction and Measurement Error

Correlation Coefficient (r)

Point Estimate
\[\displaystyle{ \hat{\rho} = \frac{r}{u_x \sqrt{1 - u_x^2 (1-r_{xx'}) } \sqrt{r^2 \left(\frac{1}{u_x^2} - 1\right)+r_{yy'} } } }\]

Standard Error
\[\displaystyle{se_{\hat{\rho}} = se_r \left( \frac{\hat{\rho}}{r} \right)}\]

# Parameters
r = 0.50 # observed correlation
SEr = 0.10 # standard error of observed correlation
rxx = 0.80 # reliability of x
ryy = 0.70 # reliability of y
ux = 0.85# ratio of observed standard deviation of y to reference standard deviation of x (ux = SDsample/SDreference)

# Point Estimate
rho = r / ( ux * sqrt(1 - ux^2 * (1 - rxx)) * sqrt( r^2 * (1/ux^2 - 1) + ryy) )

# Standard Error
SErho = SEr * (rho / r)

# Print Results
paste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )

[1] "rho = 0.71, SE = 0.14"

Standardized Mean Difference (d)

Point Estimate

-Step 1. Calculate \(\phi_{gG}\) from the contingency table between observed and actual group membership:
\[ \phi_{gG} = \sqrt{\frac{\chi^2}{n}} \] or \[\phi_{gG} \approx 1 - 2 \cdot p_{mis} \]

-Step 2. Transform d to r using probability of observed group membership (\(p_g\)): \[r = \frac{d}{ \sqrt{\frac{1}{p_g ( 1-p_g ) }+d^2}} \]

-Step 3. Correct r for direct range restriction: \[\hat{\rho} = \frac{r}{u_x \sqrt{1 - u_x^2 (1-\phi_{gG}) } \sqrt{r^2 \left(\frac{1}{u_x^2} - 1\right)+r_{yy'} } } \]

-Step 4. Back-transform \(\hat{\rho}\) to \(\hat{\delta}\) using probability of actual group membership (\(p_G\)). Observed group membership (\(p_g\)) can be used instead assuming equal misclassification rate between groups: \[\displaystyle{ \hat{\delta} = \frac{\hat{\rho}}{p_G (1-p_G )(1-\hat{\rho}^2)} }\]

# Parameters
d = 0.50 # observed standardized mean difference
SEd = 0.10# standard error of observed standardized mean difference
n = 100 # sample size
ryy = 0.80 # reliability of y
uy = 0.85 # ratio of observed standard deviation of y to reference standard deviation of y (uy = SDsample/SDreference)
pg = 0.50 # observed proportion of individuals in group 1 or 2
pG = 0.50 # actual proportion of individuals in group 1 or 2 (pG=pg when misclassification rate is equal among groups)
chi2 = 36 # chi squared statistic for actual vs observed group contingency table 
#p_mis = 0.20 # proportion of individuals misclassified (only needed if equal misclassification rate is assumed, and alternative phi calculation)

# Point Estimate
phi =  sqrt(chi2 / n)  # step 1
# phi = 1 - 2 * p_mis  # step 1 (alternative)
r = d / sqrt(1 / (pg * (1 - pg)) + d^2) # step 2
rho = r / ( uy * sqrt(1 - uy^2 * (1 - phi)) * sqrt( r^2 * (1/uy^2 - 1) + ryy) )  # step 3
delta = rho / sqrt( pG * (1 - pG) * (1 - rho^2) )  # step 4

# Standard Error
SEdelta = SEd * (rho / r) / ( (1 + d^2 * pg * (1 - pg)) * sqrt( (d^2 + 1/(pg*(1-pg))) * ( pG * (1 - pG) * (1 - rho^2)^3) ) )

# Print Results
paste0('delta = ',round(delta,2),', SE = ',round(SEdelta,2) )

[1] "delta = 0.8, SE = 0.18"

Univariate Indirect Range Restriction and Measurement Error

Correlation Coefficient (r)

Point Estimate
\[\displaystyle{\hat{\rho} = \frac{r}{\sqrt{r^2 + \frac{u_x^2 r_{xx'}(r_{xx'}r_{yy'}-r^2)}{1 - u_x^2 (1-r_{xx'})}}}}\]

Standard Error
\[\displaystyle{se_{\hat{\rho}} = se_r \left( \frac{\hat{\rho}}{r} \right)}\]

# Parameters
r = 0.50 # observed correlation
SEr = 0.10 # standard error of observed correlation
rxx = 0.70 # reliability of x
ryy = 0.80 # reliability of y
ux = 0.85 # ratio of observed standard deviation of x to reference standard deviation of x (ux = SDsample/SDreference)
uy = 0.80  # ratio of observed standard deviation of y to reference standard deviation of y (uy = SDsample/SDreference)

# Point Estimate
rho = r / sqrt(r^2 +  ux^2 * rxx * (rxx * ryy - r^2) / (1 - ux^2 * (1-rxx))  )

# Standard Error
SErho = SEr * (rho / r)

# Print Results
paste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )

[1] "rho = 0.75, SE = 0.15"

Standardized Mean Difference (d)

Point Estimate

-Step 1. Calculate \(\phi_{gG}\) from the contingency table between observed and actual group membership: \[\phi_{gG} = \sqrt{\frac{\chi^2}{n}} \] or \[\phi_{gG} \approx 1 - 2 \cdot p_{mis} \]

-Step 2. Transform d to r using probability of observed group membership (\(p_g\)): \[ r = \frac{d}{ \sqrt{\frac{1}{p_g ( 1-p_g )} + d^2}} \]

-Step 3. Correct r for direct range restriction: \[\hat{\rho} = \frac{r}{\sqrt{r^2 + \frac{u_x^2 r_{xx'}(r_{xx'}r_{yy'}-r^2)}{1 - u_x^2 (1-r_{xx'})}}}\]

-Step 4. Back-transform \(\hat{\rho}\) to \(\hat{\delta}\) using probability of actual group membership (\(p_G\)). Observed group membership (\(p_g\)) can be used instead assuming equal misclassification rate between groups: \[\hat{\delta} = \frac{\hat{\rho}}{p_G (1-p_G )(1-\hat{\rho}^2)} \]

Standard Error \[\displaystyle{ se_{\hat{\delta}} = \frac{se_d \left( \frac{\hat{\rho}}{r} \right) }{\left( 1+d^2 p_g[1-p_g] \right)\sqrt{\left(d^2 + \frac{1}{p_g(1-p_g)}\right) p_G (1-p_G)(1-\hat{\rho}^2)^3} } }\]

# Parameters
d = 0.50 # observed standardized mean difference
SEd = 0.10 # standard error of observed standardized mean difference
ryy = 0.80 # reliability of y
ux = 0.85 # ratio of observed standard deviation of x to reference standard deviation of x (ux = SDsample/SDreference)
uy = 0.80 # ratio of observed standard deviation of y to reference standard deviation of y (uy = SDsample/SDreference)
pg = 0.50 # observed proportion of individuals in group 1 or 2
pG = 0.50 # actual proportion of individuals in group 1 or 2 (pG=pg when misclassification rate is equal among groups)
chi2 = 36 # chi squared statistic for actual vs observed group contingency table 
n = 100 # sample size

# Point Estimate
phi =  sqrt(chi2 / n) # step 1
# phi = 1 - 2 * p_mis # step 1 (alternative)
r = d / sqrt(1 / (pg * (1 - pg)) + d^2)# step 2
rho = r / sqrt(r^2 +  ux^2 * rxx * (rxx * ryy - r^2) / (1 - ux^2 * (1-rxx))  ) # step 3
delta = rho / sqrt( pG * (1 - pG) * (1 - rho^2) ) # step 4

# Standard Error
SEdelta = SEd * (rho / r) / ( (1 + d^2 * pg * (1 - pg)) * sqrt( (d^2 + 1/(pg*(1-pg))) * ( pG * (1 - pG) * (1 - rho^2)^3) ) )

# Print Results
paste0('delta = ',round(delta,2),', SE = ',round(SEdelta,2) )

[1] "delta = 0.85, SE = 0.19"

Bivariate Direct Range Restriction and Measurement Error

Correlation Coefficient (r)

Point Estimate - Step 1. Define Gamma: \[\displaystyle{ \Gamma = u_x u_y \frac{1 - r^2}{2r}}\]

Step 2. Correct r for univariate direct range restriction: \[\displaystyle{\hat{\rho} = \frac{-\Gamma u_x u_y + \mathrm{sign}(r)\sqrt{\Gamma^2 + 1} }{\sqrt{1-u_x^2(1-r_{xx'})}\sqrt{1-u_y^2(1-r_{yy'})}} }\]

Standard Error \[\displaystyle{se_{\hat{\rho}} = se_r \left( \frac{\hat{\rho}}{r} \right)}\]

# Parameters
r = 0.50 # observed correlation between x and y
SEr = 0.10 # standard error of observed correlation between x and y
rxx = 0.80 # reliability of x
ryy = 0.70 # reliability of y
ux = 0.85 # ratio of observed standard deviation of x to reference standard deviation of x (ux = SDsample/SDreference)
uy = 0.80 # ratio of observed standard deviation of y to reference standard deviation of y (uy = SDsample/SDreference)

# Point Estimate
Gamma = (1 - r^2) / (2*r) * ux * uy # step 1
rho = (-Gamma + sign(r) * sqrt(Gamma^2 + 1)) / (sqrt(1 - ux^2 * (1 - rxx)) * sqrt(1 - uy^2 * (1 - ryy)))# step 2

# Standard Error
SErho = SEr * (rho / r)

# Print Results
paste0('rho = ',round(rho,2),', SE = ',round(SErho,2) )

[1] "rho = 0.74, SE = 0.15"

Bivariate Indirect Range Restriction and Measurement Error

Correlation Coefficient (r)

Point Estimate

-Step 2. Correct r for bivariate indirect range restriction:
\[\hat{\rho} = \frac{r u_x u_y + \lambda \sqrt{\left|1-u_x^2 \right|\left|1-u_y^2 \right|} }{\sqrt{1 - u_x^2 (1-r_{xx'})} \sqrt{1 - u_y^2 (1 - r_{yy'})}}\]

Standard Error

Step 1. Calculate the measurement quality index for X in the restricted sample: \[ q_x = \sqrt{r_{xx'}}\]
Step 2. Calculate the measurement quality index for Y in the restricted sample: \[ q_y = \sqrt{r_{yy'}}\]
Step 3. Estimated the measurement quality index for X in the unrestricted population:
\[q_X = \sqrt{1 - u_x^2 (1 - r_{xx'})} \]
Step 4. Estimate the measurement quality index for Y in the unrestricted population: \[ q_Y = \sqrt{1 - u_y^2 (1 - r_{yy'})} \]
Step 5. Calculate first order partial derivative of \(\hat{\rho}\) with respect to \(\rho_{xx'}\) \[ \beta_1 = -\frac{u_x u_y r + \lambda \sqrt{\left(1 - u_x^2 \right)\left(1 - u_y^2 \right)} }{q_X^2 q_Y} \]
Step 6. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(\rho_{yy'}\) \[ \beta_2 = -\frac{u_x u_y r + \lambda \sqrt{\left|1 - u_x^2 \right|\left|1 - u_y^2 \right|} }{q_Y^2 q_X} \]
Step 7. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(u_x\) \[ \beta_3 = \frac{u_y r}{q_X q_Y} - \frac{\lambda u_x (1 - u_x^2) \sqrt{\left|1 - u_x^2 \right|}}{q_X q_Y\sqrt{\left|1 - u_y^2 \right|^3}}\]
Step 8. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(u_y\) \[\beta_4 = \frac{u_x r}{q_X q_Y} - \frac{\lambda u_y (1 - u_y^2) \sqrt{\left|1 - u_y^2 \right|}}{q_X q_Y\sqrt{\left|1 - u_x^2 \right|^3}} \]
Step 9. Calculate first order partial derivitive of \(\hat{\rho}\) with respect to \(r\) \[\beta_5 = \frac{u_x u_y}{q_X q_Y} \]
Step 10. Calculate standard error for unrestricted measure quality index (\(q_X\)) \[se_{q_X} = \sqrt{\frac{1}{2} u_x^4 \left[ \frac{(1-q_x^2)^2}{1-u_x^2 (1-q_x^2)} \right] \left[ \frac{1}{n-1} + \frac{1}{n_{\mathrm{ref}}-1} \right] + \frac{u_x^2 q_x^2(1 - q_x^2)^2}{\left[1-u_x^2 (1-q_x^2)\right] (n-1)} } \]
Step 11. Calculate standard error for unrestricted measure quality index (\(q_Y\)) \[se_{q_Y} = \sqrt{\frac{1}{2} u_y^4 \left[ \frac{(1-q_y^2)^2}{1-u_y^2 (1-q_y^2)} \right] \left[ \frac{1}{n-1} + \frac{1}{n_{\mathrm{ref}}-1} \right] + \frac{u_y^2 q_y^2(1 - q_y^2)^2}{\left[1-u_y^2 (1-q_y^2)\right] (n-1)} } \]
Step 12. Calculate standard error for \(u_x\) (note: sample size for reference sample is \(n_\mathrm{ref}\)): \[\displaystyle{ se_{u_x} = u_x \sqrt{ \frac{1}{2(n-1)} + \frac{1}{2(n_{\mathrm{ref}} -1)}} }\]
Step 13. Calculate standard error for \(u_y\) (note: sample size for reference sample is \(n_\mathrm{ref}\)): \[\displaystyle{ se_{u_y} = u_y \sqrt{ \frac{1}{2(n-1)} + \frac{1}{2(n_{\mathrm{ref}} -1)}} }\]
Step 14. Calculate standard error for \(r\) \[\displaystyle{ se_r = \frac{1-r^2}{\sqrt{n-1}} }\]
Step 15. Calculate standard error of \(\hat{\rho}\) using a Taylor Series Approximation \[\displaystyle{se_{\hat{\rho}} \approx \sqrt{\beta_1^2 se_{q_X}^2 + \beta_2^2 se_{q_Y}^2 + \beta_3^2 se_{u_x}^2 + \beta_4^2 se_{u_y}^2 + \beta_5^2 se_{r}^2 } }\]

# Parameters needed
r = 0.50 # observed correlation between x and y
SEr = 0.10 # observed correlation between x and y
rxx = 0.80 # reliability of x within study sample
ryy = 0.70 # reliability of y within study sample
n = 200 # sample size
na = 100# sample size of reference sample
ux  = 0.80# observed u-ratio of x 
uy  = 0.85 # observed u-ratio of y
rsy = 1 # direction of correlation between selector and y (-1 = negative, 0 = no correlation, 1 = positive)
rsx = 1 # direction of correlation between selector and x (-1 = negative, 0 = no correlation, 1 = positive)

# Point Estimate
lambda = sign( rsx * rsy * (1-ux) * (1-uy) ) * ( sign(1 - ux) * min(c(ux,1/ux)) + sign(1 - uy) * min(c(uy,1/uy)) ) / ( min(c(ux,1/ux)) + min(c(uy,1/uy)) )
rho = r * ux * uy + lambda * sqrt( abs(1 - ux^2) * abs(1 - uy^2) )

# Standard Error (Taylor Series Approximation)
qx = sqrt(rxx)
qy = sqrt(ryy)
qX = sqrt(ux^2 * (rxx - 1))

Warning in sqrt(ux^2 * (rxx - 1)): NaNs produced

qY = sqrt(uy^2 * (ryy - 1))

Warning in sqrt(uy^2 * (ryy - 1)): NaNs produced

# First order partial derivitive of qX
b1  = - ( ux * uy * r + lambda * sqrt(abs(1 - ux^2) * abs(1 - uy^2)) ) / (qX^2 * qY) 
# First order partial derivitive of qY
b2  = - ( ux * uy * r + lambda * sqrt(abs(1 - ux^2) * abs(1 - uy^2)) ) / (qY^2 * qX) 
# First order partial derivitive of ux
b3  = (r * uy) / (qX * qY) - ( lambda * ux *(1 - ux^2) * sqrt( abs(1 - uy^2) ) ) / (qX * qY * sqrt(abs(1 - uy^2)^3))  
# First order partial derivitive of uy
b4  = (r * ux) / (qX * qY) - ( lambda * uy *(1 - uy^2) * sqrt( abs(1 - ux^2) ) ) / (qX * qY * sqrt(abs(1 - ux^2)^3))  
# First order partial derivitive of r
b5  = (ux*uy) / (qX * qY)

# Standard error of qX
SEqX = sqrt( .5 * ux^4 * ( (1-qx^2)^2 / (1-ux^2 * (1-qx^2)) ) * (1/(n-1) + 1/(na-1)) + ( (ux^2 * qx^2 * (1-qx^2)^2) ) / ((1 - ux^2 * (1-qx^2)) * (n-1)) )

# Standard error of qY
SEqY = sqrt( .5 * uy^4 * ( (1-qy^2)^2 / (1-uy^2 * (1-qy^2)) ) * (1/(n-1) + 1/(na-1)) + ( (uy^2 * qy^2 * (1-qy^2)^2) ) / ((1 - uy^2 * (1-qy^2)) * (n-1)) )

# Standard error of ux
SEux = ux * sqrt( 1 / (2*(n-1)) + 1 / (2*(na-1)) )

# Standard error of uy
SEuy = uy * sqrt( 1 / (2*(n-1)) + 1 / (2*(na-1)) )

# Standard error of r
SEr =  (1 - r^2) / sqrt(n - 1)

# Taylor series approximation
SErho = sqrt( b1^2 * SEux^2 + b2^2 * SEuy^2 + b3^2 * SEr^2 )

# Print Results
paste0('rho = ',round(delta,2),', SE = ',round(SErho,2) )

[1] "rho = 0.85, SE = NaN"

References

Cite This Page

APA: Jané, M. B. (2023). Artifact corrections for effect sizes: equations, code, and interactive app. https://www.matthewbjane.com/ArtifactCorrections/, accessed DD-MM-YYYY

BibTeX:

@misc{,
  title  = "Artifact corrections for effect sizes: equations, code, and interactive app",
  author = "Matthew B. Jan{\'{e}}",
  howpublished = 0.50"\url{https://matthewbjane.github.io/artifact-corrections/}",
  year= 2023
  note= "accessed DD-MM-YYYY"
}

References

Dahlke, Jeffrey A., and Brenton M. Wiernik. 2019. “Psychmeta: An R Package for Psychometric Meta-Analysis.” Applied Psychological Measurement 43 (5): 415–16. https://doi.org/10.1177/0146621618795933.

Hunter, John E., and Frank L. Schmidt. 1990. Methods of meta-analysis: correcting error and bias in research findings. Newbury Park: Sage Publications.

Wiernik, Brenton M., and Jeffrey A. Dahlke. 2020. “Obtaining Unbiased Results in Meta-Analysis: The Importance of Correcting for Statistical Artifacts.” Advances in Methods and Practices in Psychological Science 3 (1): 94–123. https://doi.org/10.1177/2515245919885611.