Friday, August 18, 2017

Using Effects Coding to Scale Latent Constructs in Multigroup Analyses with MPlus

Hi folks! Today’s post will be about the metric of latent variables in confirmatory factor analysis (CFA). As a quick reminder, a latent variable is a variable that was not directly observed but was rather inferred from other variables that were observed. E.g., intelligence (latent) is inferred from the observed performance on several tasks aka items (the manifest indicators). But, since the latent variable was not observed it has no metric (units of measurement) to start with. Within the CFA framework this means that the latent variable is not identified because it has no scale. This is normally fixed in one of two ways (although there is actually a third way which I will describe later on):

1. Use a Marker Variable. In this case, the latent variables is set to the same metric a one of the indicators by fixing the indicator’s loading to one and its intercept to zero. MPlus uses this method with the first indicator as default.

2. Use Latent Standardization. This method defines the scale and provides the identification by fixing the variance of the latent variable to one and its mean to zero.

Both methods are equivalent, so it is up to you which one to use. However, I use the marker variable method only if I have a good marker variable, i.e., one for which I or my client have a good understanding of what a change of one measurement unit means. Otherwise, I prefer latent standardization. In this case one standard deviation is the unit of measurement, which is explainable to (most) clients, and you can perform analyses on all indicators (none is fixed).

Unfortunately, there is one very important case in which both approaches are some what unsatisfying in my view. This use case is measurement invariance analysis. In a nutshell measurement invariance is a statistical property that indicates that the same construct is measured across groups.

There are several consecutive stricter levels of measurement invariance: Configural, weak, strong, and strict invariance (Meredith, 1993). The most lenient level is configural invariance aiming to proof that the same corresponding indicators and constructs must be present in all groups providing only a basis for qualitative cross-group comparisons. If the next level, weak invariance, holds true, then the respective loadings (λ) are equal across groups and one can assume that the assessed constructs are qualitatively equal across groups. However, only if strong invariance is established with both the loadings and the intercepts (τ) of the indicators equal across groups, then quantitative comparisons across groups can be made in a meaningful and truly comparable manner. Strict invariance further assumes that the residual variances (θ) are equal, but is often considered as too restrictive and not necessary for mean value comparisons. – To sum this up: If you want to compare a latent variable across groups, then strong measurement invariance is required. And this is something we (researchers in the social sciences) want to do quite often.

So, what is my problem with the metric of the latent variable and measurement invariance analysis in the CFA-framework? When using a marker variable you have to fix loading and intercept of one indicator in all groups. Consequently, this indicator cannot be tested for equivalence. But, this is the whole point of the analysis, so this approach is not useful. The reason why I do not like the latent standardization approach for measurement invariance analysis is a little bit harder to describe. Mainly two things are bugging me: 1.) unless not only measurement invariance but also equivalence of the latent variable’s variance across groups holds true the interpretation of results in all but the first group gets difficult, because the standard deviation of the first group is used as metric for all groups. Hence, you don’t have a correlation but a covariance structure between the latent variables. If you are willing to go the extra mile, then you can bypass this problem with phantom variables (see Little, 1997), but somehow this approach does not sit well with me. 2.) Evaluation is normally based on χ² difference tests. However, χ² difference tests require the models to be nested, but with latent standardization they are not. It would probably be the best if I demonstrated what I mean in a little example.

For the example we will use simulated data which you can download from here.

Measurement Invariance with Latent standardization

data:

file = cfa_data.dat;

variable:

names =
y1-y5
groupid
;

usevariable =
y1-y5
;

group =
groupid (1 = group_a 2 = group_b)
;

analysis:
estimator = ml;

model:

factor by
y1*       !free - default is 1
y2
y3
y4
y5
;

!!estimate intercepts
!free all intercepts [not necessary, because default for first group]

[y1*];
[y2*];
[y3*];
[y4*];
[y5*];

!!latent standardization

!fix variance to 1
factor@1;

!fix mean to 0
[factor@0];

model group_b:

factor by
y1*
y2
y3
y4
y5
;

!!estimate intercepts
!free all intercepts

[y1*];
[y2*];
[y3*];
[y4*];
[y5*];

!!latent standardization

!fix variance to 1
factor@1;

!fix mean to 0
[factor@0];


A short walk trough the syntax:

1. data: we tell MPlus where to find the data
2. variable: we declare the structure of the data and which data to use
3. analysis: we choose an estimator (Note: normally I would choose an robust estimator, but this complicates the χ² difference test a little, so I stick with good old maximum likelihood for the example)
4. model: we define the model. To separately estimate the model within the second group we have to repeat the model definition (otherwise MPlus’ default level of measurement invariance would be strong invariance).

Running the model gives us (abbr.):

THE MODEL ESTIMATION TERMINATED NORMALLY

MODEL FIT INFORMATION

Number of Free Parameters                       30

Loglikelihood

H0 Value                       -4789.456
H1 Value                       -4784.920

Information Criteria

Akaike (AIC)                    9638.912
Bayesian (BIC)                  9770.820
(n* = (n + 2) / 24)

Chi-Square Test of Model Fit

Value                              9.073
Degrees of Freedom                    10
P-Value                           0.5252

Chi-Square Contributions From Each Group

GROUP_A                            4.589
GROUP_B                            4.484

RMSEA (Root Mean Square Error Of Approximation)

Estimate                           0.000
90 Percent C.I.                    0.000  0.058
Probability RMSEA <= .05           0.903

CFI/TLI

CFI                                1.000
TLI                                1.002

Chi-Square Test of Model Fit for the Baseline Model

Value                            968.241
Degrees of Freedom                    20
P-Value                           0.0000

SRMR (Standardized Root Mean Square Residual)

Value                              0.014

MODEL RESULTS

Two-Tailed
Estimate       S.E.  Est./S.E.    P-Value

Group GROUP_A

FACTOR   BY
Y1                 0.906      0.076     11.948      0.000
Y2                 1.082      0.080     13.498      0.000
Y3                 1.016      0.080     12.652      0.000
Y4                 1.109      0.078     14.130      0.000
Y5                 0.942      0.077     12.304      0.000

Means
FACTOR             0.000      0.000    999.000    999.000

Intercepts
Y1                -0.316      0.079     -4.009      0.000
Y2                -0.814      0.086     -9.504      0.000
Y3                 0.680      0.084      8.068      0.000
Y4                 0.847      0.085     10.001      0.000
Y5                 0.052      0.080      0.655      0.513

Variances
FACTOR             1.000      0.000    999.000    999.000

Residual Variances
Y1                 1.039      0.101     10.288      0.000
Y2                 1.029      0.109      9.410      0.000
Y3                 1.099      0.111      9.856      0.000
Y4                 0.921      0.104      8.872      0.000
Y5                 1.030      0.102     10.079      0.000

Group GROUP_B

FACTOR   BY
Y1                 0.956      0.076     12.589      0.000
Y2                 0.932      0.076     12.304      0.000
Y3                 0.958      0.081     11.806      0.000
Y4                 0.959      0.074     12.972      0.000
Y5                 0.932      0.076     12.316      0.000

Means
FACTOR             0.000      0.000    999.000    999.000

Intercepts
Y1                -0.112      0.079     -1.418      0.156
Y2                -0.527      0.078     -6.732      0.000
Y3                 0.926      0.083     11.131      0.000
Y4                 1.299      0.077     16.786      0.000
Y5                 0.309      0.078      3.947      0.000

Variances
FACTOR             1.000      0.000    999.000    999.000

Residual Variances
Y1                 0.958      0.100      9.603      0.000
Y2                 0.973      0.100      9.750      0.000
Y3                 1.161      0.115     10.063      0.000
Y4                 0.876      0.094      9.310      0.000
Y5                 0.975      0.100      9.803      0.000


According to all fit criteria (Chi-Square Test of Model Fit, RMSEA, CFI, …) the fit is excellent. Looking a little closer we see that the factor has the same indicators in both groups but that their parameters (loading, intercept, residual variance) differ slightly.

Next, let’s fix the loadings to achieve weak invariance (only the model part changes, so I spare you the rest for brevity)

model:

factor by
y1* (l1)
y2  (l2)
y3  (l3)
y4  (l4)
y5  (l5)
;

!!estimate intercepts
!free all intercepts [not necessary, because default for first group]

[y1*];
[y2*];
[y3*];
[y4*];
[y5*];

!!latent standardization

!fix variance to 1
factor@1;

!fix mean to 0
[factor@0];

model group_b:

factor by
y1* (l1)
y2  (l2)
y3  (l3)
y4  (l4)
y5  (l5)
;

!!estimate intercepts
!free all intercepts

[y1*];
[y2*];
[y3*];
[y4*];
[y5*];

!!latent standardization

!free variance <- that's bugging me
factor*;

!fix mean to 0
[factor@0];

By placing the same words in brackets behind the parameters we force them to be equal. Further, we have to free the variance of the factor in group B. Otherwise, we would not only test invariance of the loadings, but equivalence of the factor variance, too. And this is what is bugging me, because this model should be nested within the previous model, so more restrictions are OK, but new free parameters are not (Note: freeing the variance in the configural invariance model is not an option due to identification issues).

Anyways, running the model gives us (abbr.):

THE MODEL ESTIMATION TERMINATED NORMALLY

MODEL FIT INFORMATION

Number of Free Parameters                       26

Loglikelihood

H0 Value                       -4791.119
H1 Value                       -4784.920

Information Criteria

Akaike (AIC)                    9634.238
Bayesian (BIC)                  9748.558
(n* = (n + 2) / 24)

Chi-Square Test of Model Fit

Value                             12.398
Degrees of Freedom                    14
P-Value                           0.5743

Chi-Square Contributions From Each Group

GROUP_A                            6.190
GROUP_B                            6.208

RMSEA (Root Mean Square Error Of Approximation)

Estimate                           0.000
90 Percent C.I.                    0.000  0.050
Probability RMSEA <= .05           0.950

CFI/TLI

CFI                                1.000
TLI                                1.002

Chi-Square Test of Model Fit for the Baseline Model

Value                            968.241
Degrees of Freedom                    20
P-Value                           0.0000

SRMR (Standardized Root Mean Square Residual)

Value                              0.023

MODEL RESULTS

Two-Tailed
Estimate       S.E.  Est./S.E.    P-Value

Group GROUP_A

FACTOR   BY
Y1                 0.960      0.064     14.969      0.000
Y2                 1.039      0.068     15.251      0.000
Y3                 1.023      0.068     14.997      0.000
Y4                 1.069      0.068     15.806      0.000
Y5                 0.969      0.065     15.004      0.000

Means
FACTOR             0.000      0.000    999.000    999.000

Intercepts
Y1                -0.316      0.080     -3.923      0.000
Y2                -0.814      0.084     -9.646      0.000
Y3                 0.680      0.084      8.051      0.000
Y4                 0.847      0.083     10.148      0.000
Y5                 0.052      0.081      0.648      0.517

Variances
FACTOR             1.000      0.000    999.000    999.000

Residual Variances
Y1                 1.019      0.099     10.243      0.000
Y2                 1.055      0.108      9.801      0.000
Y3                 1.092      0.109     10.009      0.000
Y4                 0.945      0.101      9.340      0.000
Y5                 1.016      0.100     10.144      0.000

Group GROUP_B

FACTOR   BY
Y1                 0.960      0.064     14.969      0.000
Y2                 1.039      0.068     15.251      0.000
Y3                 1.023      0.068     14.997      0.000
Y4                 1.069      0.068     15.806      0.000
Y5                 0.969      0.065     15.004      0.000

Means
FACTOR             0.000      0.000    999.000    999.000

Intercepts
Y1                -0.112      0.077     -1.445      0.149
Y2                -0.527      0.080     -6.623      0.000
Y3                 0.927      0.083     11.160      0.000
Y4                 1.299      0.079     16.533      0.000
Y5                 0.310      0.078      3.993      0.000

Variances
FACTOR             0.870      0.122      7.123      0.000

Residual Variances
Y1                 0.990      0.098     10.109      0.000
Y2                 0.960      0.098      9.820      0.000
Y3                 1.158      0.113     10.286      0.000
Y4                 0.857      0.091      9.373      0.000
Y5                 0.989      0.097     10.143      0.000

Now, the loadings are equal across groups and the fit is still excellent. Further, we see that the factor variances differs now (in the previous model it was 1 in both groups, but a comparison was lacking the foundation and therefor was not sensible). To test whether weak invariance holds true we can test the χ² difference of both models for significance.

This can easily be done in R:

chi2_configural <- 9.073
df_configural <- 10

chi2_weak <- 12.398
df_weak <- 14

pchisq(chi2_weak - chi2_configural, df_weak - df_configural, lower.tail = F)
#> [1] 0.504981

The χ² difference test is not significant (p > .05), hence weak does not significantly fit worse than configural.

Let’s move on to strong invariance (again, only the model part changes):

model:

factor by
y1* (l1)
y2  (l2)
y3  (l3)
y4  (l4)
y5  (l5)
;

!!estimate intercepts
!fix intercepts across groups

[y1*]   (t1);
[y2*]   (t2);
[y3*]   (t3);
[y4*]   (t4);
[y5*]   (t5);

!!latent standardization

!fix variance to 1
factor@1;

!fix mean to 0
[factor@0];

model group_b:

factor by
y1* (l1)
y2  (l2)
y3  (l3)
y4  (l4)
y5  (l5)
;

!!estimate intercepts
!fix intercepts across groups

[y1*]   (t1);
[y2*]   (t2);
[y3*]   (t3);
[y4*]   (t4);
[y5*]   (t5);

!!latent standardization

!free variance
factor*;

!free mean <- that's bugging me
[factor*];

Now, the intercepts have the constraint to be equal across groups. Again, this model is not nested within the previous, because this time we had to free the mean in group B, to avoid confounding our measurement invariance hypothesis of equal intercepts with the structural hypothesis of equal latent means across groups.

Running the model gives us (abbr.):

MODEL FIT INFORMATION

Number of Free Parameters                       22

Loglikelihood

H0 Value                       -4793.372
H1 Value                       -4784.920

Information Criteria

Akaike (AIC)                    9630.745
Bayesian (BIC)                  9727.477
(n* = (n + 2) / 24)

Chi-Square Test of Model Fit

Value                             16.905
Degrees of Freedom                    18
P-Value                           0.5296

Chi-Square Contributions From Each Group

GROUP_A                            8.067
GROUP_B                            8.839

RMSEA (Root Mean Square Error Of Approximation)

Estimate                           0.000
90 Percent C.I.                    0.000  0.048
Probability RMSEA <= .05           0.959

CFI/TLI

CFI                                1.000
TLI                                1.001

Chi-Square Test of Model Fit for the Baseline Model

Value                            968.241
Degrees of Freedom                    20
P-Value                           0.0000

SRMR (Standardized Root Mean Square Residual)

Value                              0.028

MODEL RESULTS

Two-Tailed
Estimate       S.E.  Est./S.E.    P-Value

Group GROUP_A

FACTOR   BY
Y1                 0.953      0.064     14.979      0.000
Y2                 1.038      0.068     15.335      0.000
Y3                 1.018      0.068     15.039      0.000
Y4                 1.083      0.068     16.009      0.000
Y5                 0.967      0.064     15.078      0.000

Means
FACTOR             0.000      0.000    999.000    999.000

Intercepts
Y1                -0.354      0.071     -4.976      0.000
Y2                -0.823      0.075    -10.904      0.000
Y3                 0.655      0.075      8.674      0.000
Y4                 0.917      0.077     11.927      0.000
Y5                 0.039      0.072      0.545      0.586

Variances
FACTOR             1.000      0.000    999.000    999.000

Residual Variances
Y1                 1.023      0.100     10.278      0.000
Y2                 1.055      0.108      9.815      0.000
Y3                 1.096      0.109     10.036      0.000
Y4                 0.944      0.102      9.237      0.000
Y5                 1.017      0.100     10.157      0.000

Group GROUP_B

FACTOR   BY
Y1                 0.953      0.064     14.979      0.000
Y2                 1.038      0.068     15.335      0.000
Y3                 1.018      0.068     15.039      0.000
Y4                 1.083      0.068     16.009      0.000
Y5                 0.967      0.064     15.078      0.000

Means
FACTOR             0.293      0.088      3.332      0.001

Intercepts
Y1                -0.354      0.071     -4.976      0.000
Y2                -0.823      0.075    -10.904      0.000
Y3                 0.655      0.075      8.674      0.000
Y4                 0.917      0.077     11.927      0.000
Y5                 0.039      0.072      0.545      0.586

Variances
FACTOR             0.869      0.122      7.120      0.000

Residual Variances
Y1                 0.996      0.098     10.151      0.000
Y2                 0.961      0.098      9.827      0.000
Y3                 1.160      0.113     10.309      0.000
Y4                 0.858      0.093      9.268      0.000
Y5                 0.990      0.097     10.158      0.000

Now, loadings and intercepts are equal and factor variance and mean differ between groups. Again, the model is not nested in the previous, but we use the χ² difference test anyway:

chi2_strong <- 16.905
df_strong <- 18

pchisq(chi2_strong - chi2_weak, df_strong - df_weak, lower.tail = F)
#> [1] 0.3417183

The test result is p > .05 so we assume strong invariance to hold true. Hence, comparisons of the latent mean is justifiable. We could do that in the next step by fixing the mean in group B to zero and compare the resulting model χ² to the current value. However, we will skip that for now and instead take a look at an alternative parametrization that makes me feel more at ease. Further, I will skip testing strict measurement invariance, because strong invariance is sufficient in most scenarios.

Measurement Invariance with Effects Coding

Effects Coding is an alternative scaling approach by Little, et al., 2006 which is a little laborious but really shines in the context of measurement invariance testing. The basic idea is to constraint parameter classes to have a certain value on average, e.g. 1 for factor loadings and 0 for intercepts.

Taking a look at the according model probably makes it more understandable. Let’s start with configural invariance:

model:

factor by
y1* (l1a)
y2  (l2a)
y3  (l3a)
y4  (l4a)
y5  (l5a)
;

!!estimate intercepts

[y1*]   (t1a);
[y2*]   (t2a);
[y3*]   (t3a);
[y4*]   (t4a);
[y5*]   (t5a);

!free mean
[factor*];

model group_b:

factor by
y1* (l1b)
y2  (l2b)
y3  (l3b)
y4  (l4b)
y5  (l5b)
;

!!estimate intercepts

[y1*]   (t1b);
[y2*]   (t2b);
[y3*]   (t3b);
[y4*]   (t4b);
[y5*]   (t5b);

!free mean
[factor*];

model constraint:
0 = l1a + l2a + l3a + l4a - 4;
0 = l1b + l2b + l3b + l4b - 4;

0 = t1a + t2a + t3a + t4a;
0 = t1b + t2b + t3b + t4b;

We see that the model now contains a addition model constraint block. In that we set the average loading to 1 and the average intercept to 0. We do this for both models separately allowing them to differ.

Running the model gives us (abbr.):

THE MODEL ESTIMATION TERMINATED NORMALLY

MODEL FIT INFORMATION

Number of Free Parameters                       29

Loglikelihood

H0 Value                       -4790.579
H1 Value                       -4784.920

Information Criteria

Akaike (AIC)                    9639.157
Bayesian (BIC)                  9766.668
(n* = (n + 2) / 24)

Chi-Square Test of Model Fit

Value                             11.318
Degrees of Freedom                    11
P-Value                           0.4170

Chi-Square Contributions From Each Group

GROUP_A                            6.835
GROUP_B                            4.484

RMSEA (Root Mean Square Error Of Approximation)

Estimate                           0.010
90 Percent C.I.                    0.000  0.062
Probability RMSEA <= .05           0.868

CFI/TLI

CFI                                1.000
TLI                                0.999

Chi-Square Test of Model Fit for the Baseline Model

Value                            968.241
Degrees of Freedom                    20
P-Value                           0.0000

SRMR (Standardized Root Mean Square Residual)

Value                              0.027

MODEL RESULTS

Two-Tailed
Estimate       S.E.  Est./S.E.    P-Value

Group GROUP_A

FACTOR   BY
Y1                 0.881      0.056     15.643      0.000
Y2                 1.052      0.057     18.500      0.000
Y3                 0.988      0.059     16.855      0.000
Y4                 1.078      0.056     19.293      0.000
Y5                 0.914      0.072     12.784      0.000

Means
FACTOR             0.000      0.000    999.000    999.000

Intercepts
Y1                -0.406      0.051     -7.933      0.000
Y2                -0.918      0.051    -18.041      0.000
Y3                 0.580      0.052     11.157      0.000
Y4                 0.743      0.049     15.115      0.000
Y5                -0.021      0.064     -0.330      0.741

Variances
FACTOR             1.067      0.109      9.823      0.000

Residual Variances
Y1                 1.039      0.101     10.285      0.000
Y2                 1.029      0.109      9.408      0.000
Y3                 1.099      0.112      9.854      0.000
Y4                 0.921      0.104      8.871      0.000
Y5                 1.030      0.102     10.084      0.000

Group GROUP_B

FACTOR   BY
Y1                 1.005      0.061     16.532      0.000
Y2                 0.980      0.061     16.113      0.000
Y3                 1.007      0.064     15.748      0.000
Y4                 1.009      0.060     16.904      0.000
Y5                 0.979      0.078     12.558      0.000

Means
FACTOR             0.397      0.062      6.400      0.000

Intercepts
Y1                -0.510      0.055     -9.310      0.000
Y2                -0.916      0.055    -16.639      0.000
Y3                 0.527      0.058      9.040      0.000
Y4                 0.899      0.053     16.842      0.000
Y5                -0.079      0.071     -1.114      0.265

Variances
FACTOR             0.905      0.095      9.539      0.000

Residual Variances
Y1                 0.958      0.100      9.602      0.000
Y2                 0.973      0.100      9.750      0.000
Y3                 1.161      0.115     10.062      0.000
Y4                 0.876      0.094      9.309      0.000
Y5                 0.976      0.100      9.803      0.000

We see that the model was identified in both groups without constraining any specifical parameter to a fixed value.

To test weak invariance we reformulate the model to:

model:

factor by
y1* (l1)
y2  (l2)
y3  (l3)
y4  (l4)
y5  (l5)
;

!!estimate intercepts

[y1*]   (t1a);
[y2*]   (t2a);
[y3*]   (t3a);
[y4*]   (t4a);
[y5*]   (t5a);

!free mean
[factor*];

model group_b:

factor by
y1* (l1)
y2  (l2)
y3  (l3)
y4  (l4)
y5  (l5)
;

!!estimate intercepts

[y1*]   (t1b);
[y2*]   (t2b);
[y3*]   (t3b);
[y4*]   (t4b);
[y5*]   (t5b);

!free mean
[factor*];

model constraint:
0 = l1 + l2 + l3 + l4 - 4;

0 = t1a + t2a + t3a + t4a;
0 = t1b + t2b + t3b + t4b;

In contrast to the configural invariance model the constraints of the factor loadings are now equal across groups.

Running the model gives us (abbr.):

THE MODEL ESTIMATION TERMINATED NORMALLY

MODEL FIT INFORMATION

Number of Free Parameters                       25

Loglikelihood

H0 Value                       -4792.249
H1 Value                       -4784.920

Information Criteria

Akaike (AIC)                    9634.498
Bayesian (BIC)                  9744.422
(n* = (n + 2) / 24)

Chi-Square Test of Model Fit

Value                             14.659
Degrees of Freedom                    15
P-Value                           0.4762

Chi-Square Contributions From Each Group

GROUP_A                            8.440
GROUP_B                            6.219

RMSEA (Root Mean Square Error Of Approximation)

Estimate                           0.000
90 Percent C.I.                    0.000  0.053
Probability RMSEA <= .05           0.930

CFI/TLI

CFI                                1.000
TLI                                1.000

Chi-Square Test of Model Fit for the Baseline Model

Value                            968.241
Degrees of Freedom                    20
P-Value                           0.0000

SRMR (Standardized Root Mean Square Residual)

Value                              0.033

MODEL RESULTS

Two-Tailed
Estimate       S.E.  Est./S.E.    P-Value

Group GROUP_A

FACTOR   BY
Y1                 0.939      0.041     22.627      0.000
Y2                 1.016      0.042     24.342      0.000
Y3                 1.000      0.043     23.139      0.000
Y4                 1.045      0.041     25.502      0.000
Y5                 0.946      0.053     17.918      0.000

Means
FACTOR             0.000      0.000    999.000    999.000

Intercepts
Y1                -0.410      0.051     -8.087      0.000
Y2                -0.915      0.051    -17.854      0.000
Y3                 0.579      0.052     11.175      0.000
Y4                 0.745      0.049     15.068      0.000
Y5                -0.023      0.063     -0.367      0.713

Variances
FACTOR             1.056      0.106      9.943      0.000

Residual Variances
Y1                 1.019      0.100     10.237      0.000
Y2                 1.055      0.108      9.795      0.000
Y3                 1.092      0.109     10.005      0.000
Y4                 0.945      0.101      9.335      0.000
Y5                 1.016      0.100     10.146      0.000

Group GROUP_B

FACTOR   BY
Y1                 0.939      0.041     22.627      0.000
Y2                 1.016      0.042     24.342      0.000
Y3                 1.000      0.043     23.139      0.000
Y4                 1.045      0.041     25.502      0.000
Y5                 0.946      0.053     17.918      0.000

Means
FACTOR             0.397      0.062      6.385      0.000

Intercepts
Y1                -0.484      0.052     -9.232      0.000
Y2                -0.930      0.052    -17.900      0.000
Y3                 0.530      0.055      9.595      0.000
Y4                 0.885      0.050     17.581      0.000
Y5                -0.066      0.067     -0.983      0.326

Variances
FACTOR             0.911      0.093      9.798      0.000

Residual Variances
Y1                 0.990      0.098     10.108      0.000
Y2                 0.960      0.098      9.818      0.000
Y3                 1.158      0.113     10.284      0.000
Y4                 0.857      0.091      9.372      0.000
Y5                 0.989      0.097     10.146      0.000

chi2_configural <- 11.318
df_configural <- 11

chi2_weak <- 14.659
df_weak <- 15

pchisq(chi2_weak - chi2_configural, df_weak - df_configural, lower.tail = F)
#> [1] 0.5024625

indicates that weak invariance can be assumed.

Lastly, we test strong invariance with effects coding:

model:

factor by
y1* (l1)
y2  (l2)
y3  (l3)
y4  (l4)
y5  (l5)
;

!!estimate intercepts

[y1*]   (t1);
[y2*]   (t2);
[y3*]   (t3);
[y4*]   (t4);
[y5*]   (t5);

!free mean
[factor*];

model group_b:

factor by
y1* (l1)
y2  (l2)
y3  (l3)
y4  (l4)
y5  (l5)
;

!!estimate intercepts

[y1*]   (t1);
[y2*]   (t2);
[y3*]   (t3);
[y4*]   (t4);
[y5*]   (t5);

!free mean
[factor*];

model constraint:
0 = l1 + l2 + l3 + l4 - 4;

0 = t1 + t2 + t3 + t4;

Now, the constraints of the intercepts are equal across groups.

Running the model gives us (abbr.):

THE MODEL ESTIMATION TERMINATED NORMALLY

MODEL FIT INFORMATION

Number of Free Parameters                       21

Loglikelihood

H0 Value                       -4794.508
H1 Value                       -4784.920

Information Criteria

Akaike (AIC)                    9631.016
Bayesian (BIC)                  9723.351
(n* = (n + 2) / 24)

Chi-Square Test of Model Fit

Value                             19.177
Degrees of Freedom                    19
P-Value                           0.4456

Chi-Square Contributions From Each Group

GROUP_A                           10.266
GROUP_B                            8.910

RMSEA (Root Mean Square Error Of Approximation)

Estimate                           0.006
90 Percent C.I.                    0.000  0.051
Probability RMSEA <= .05           0.945

CFI/TLI

CFI                                1.000
TLI                                1.000

Chi-Square Test of Model Fit for the Baseline Model

Value                            968.241
Degrees of Freedom                    20
P-Value                           0.0000

SRMR (Standardized Root Mean Square Residual)

Value                              0.037

MODEL RESULTS

Two-Tailed
Estimate       S.E.  Est./S.E.    P-Value

Group GROUP_A

FACTOR   BY
Y1                 0.931      0.041     22.817      0.000
Y2                 1.015      0.041     24.734      0.000
Y3                 0.995      0.042     23.419      0.000
Y4                 1.059      0.040     26.198      0.000
Y5                 0.942      0.052     18.169      0.000

Means
FACTOR             0.000      0.000    999.000    999.000

Intercepts
Y1                -0.446      0.037    -12.067      0.000
Y2                -0.923      0.037    -25.048      0.000
Y3                 0.556      0.038     14.517      0.000
Y4                 0.814      0.036     22.730      0.000
Y5                -0.044      0.047     -0.953      0.341

Variances
FACTOR             1.058      0.106      9.965      0.000

Residual Variances
Y1                 1.025      0.100     10.281      0.000
Y2                 1.056      0.107      9.822      0.000
Y3                 1.097      0.109     10.043      0.000
Y4                 0.939      0.102      9.241      0.000
Y5                 1.018      0.100     10.172      0.000

Group GROUP_B

FACTOR   BY
Y1                 0.931      0.041     22.817      0.000
Y2                 1.015      0.041     24.734      0.000
Y3                 0.995      0.042     23.419      0.000
Y4                 1.059      0.040     26.198      0.000
Y5                 0.942      0.052     18.169      0.000

Means
FACTOR             0.397      0.062      6.453      0.000

Intercepts
Y1                -0.446      0.037    -12.067      0.000
Y2                -0.923      0.037    -25.048      0.000
Y3                 0.556      0.038     14.517      0.000
Y4                 0.814      0.036     22.730      0.000
Y5                -0.044      0.047     -0.953      0.341

Variances
FACTOR             0.910      0.093      9.797      0.000

Residual Variances
Y1                 0.996      0.098     10.154      0.000
Y2                 0.961      0.098      9.825      0.000
Y3                 1.160      0.112     10.311      0.000
Y4                 0.857      0.093      9.258      0.000
Y5                 0.992      0.097     10.173      0.000

Accordingly, the intercepts are now equal across groups. The χ² difference test

chi2_strong <- 19.177
df_strong <- 19

pchisq(chi2_strong - chi2_weak, df_strong - df_weak, lower.tail = F)
#> [1] 0.3404185

indicates that we can assume strong measurement invariance. We see that in measurement invariance analysis scaling via effects coding yields the same results as latent standardization (or scaling via marker variable).This is not unexpected, because all scaling versions are equivalent to one another. Nonetheless, each has its merits and demerits and in the context of measurement invariance analysis I myself like effects coding the most.

Closing Remarks

What ever scaling version you like best, please consider performing a measurement invariance analysis so that you don’t end up comparing apples with oranges. I hope my post gave you enough practical guidance to do that (in MPlus).

If you want to know more about why measurement invariance analysis is an important topic take a look at Denny Borsboom’s wonderful paper The attack of the psychometricians.

If you need more practical guidance conducting a measurement invariance analysis, then I can recommend Statistical Approaches to Measurement Invariance by Robert Millsap.

If you want to read more about scaling via effects coding, then I can recommend the chapter Representing Contextual Effects in Multiple-Group MACS Models in Modeling Contextual Effects in Longitudinal Studies.

1. One thing which bothers me in measurement invariance analysis using effects coding is the case with few items on multiple factors. Example

Factor A: Item1, Item2, Item3
Factor B: Item4, Item5

Lets imagine a case where metric invariance holds across all groups, but full scalar invariance does not. In that case we are potentially interested in partial scalar invariance, i.e. to free up the intercept for one/several items in some/all groups. However, due to the restriction that 3=t1+t2+t3 and 2=t4+t5, freeing up for example t4 or t5 is equivalent to the full scalar invariance model. Actually, in this specific example, only three different partial scalar invariance models are possible to test. More specifically when either Item1 OR Item2 OR Item3 is set equal across groups, letting all other intercetps vary.

Any thoughts?

1. hm... partial invariance should be obtainable by adding constraints only to the assumed invariant parameters. E.g. in the model no (t3) behind [y3*] and in the constraints just 0=t1+t2. This would result in invariance of the intercepts 1 and 2, while 3 can vary between group.

Further, please keep in mind that at least two parameter of the same group (e.g. intercepts) need to be invariant to claim partial invariance. One is not sufficient.

2. While you are technically correct, the main argument (for me) when using effects coding in MGCFA is to obtain latent variable means which are on the same scale as the indicators, rather than fixing the latent mean for group 1 and let the others vary.

If we are to remove the constraint for Item 3, i.e. setting it free to vary such that 0=t1+t2, then the scale of the latent varaible becomes somewhat arbitrary, which to me removes the point of effects coding (imo).

This would not be a direct issue if we have fx 8 items on one factor and freed up two items (e.g. item 2 and item 4). Then we would be able to keep the constraint with 0=t1+t2a+t3+t4a+t5+t6+t7+t8 for group A and just not 0=t1+t2b+t3+t4b+t5+t6+t7+t8 for group B.

In my first comment I forgot to praise you blog - nice work!

3. Edit:
...keep the constraint with 0=t1+t2a+t3+t4a+t5+t6+t7+t8 for group A and 0=t1+t2b+t3+t4b+t5+t6+t7+t8 for group B.

4. True, that would be an option for two varying intercepts. With just one item averaging the intercepts in all groups to 0 is imho not possible with partial invariance. But you could stick with 0=t1+t2+t3, but remove the (t3) constraint in the varying group (mplus accepts it although not all constraining parameters are present in all groups). This way you have average 0 in the full invariant groups and 0 + bias = the differences for this item in the partial invariant groups. The means should still be comparable under consideration that the bias is removed. Does this make sense to you?

5. On a second thought I am no longer sure whether your solution for two items really works. The problem is that it assumes t2a+t4a == t2b+t4b. This allows the individual parameters to differ between groups, but still is a hard constraint. E.g., if both item 2 and 4 are harder in A than, then this can't be done with this constraint. Even when one item is easier and one is harder in one group it would still be a great coincidence that the differences exactly cancel each other out.
Imho to constrain in one or multiple groups all intercepts to average 0 and omit the problematic parameter constraints in the rest (as described in my previous comment) is the better solution for multiple items with DIF too.

6. Hmm, I guess you are correct. How the removal of the constraint for non-invariant items/groups affects the whole estimation procedure and the interpretation of the latent means is another story that I will have to investigate further. Thanks for the input!

Recommended Post

Follow the white robot - Exploring retweets of Austrian politicians with Botometer in R

botometer_publish.utf8.md Hi folks! I guess you are aware that social medi...