Multiple Factor Analysis

Multiple Factor Analysis is dedicated to datasets where variables are structured into groups. Several sets of variables (continuous or categorical) are therefore simultaneously studied.

This specific method is useful in many fields where variables are structured into groups, for example:

  • Genomic: protein variables, DNA variables
  • Sensory analysis: sensorial and physico-chemical variables
  • Questionnaires: student health (addicted consumptions variables, psychological conditions variables, sleep, identification variables...)
  • Comparison of coding (continous variables, categorical variables)

Taking into account the structure of the data allows to:

  • Balance the influence of each group of variables
  • Study the links between the sets of variables
  • Give the classical graphs but also specific ones:
    • Partial representation (individuals seen by one group of variables)
    • Groups of variables

For further information on the MFA, see the book:
Pagès J. (2015) Multiple Factor Analysis by Example Using R.. Chapman & Hall/CRC. (see more details here)
or the following tutorials: SFDS 2008 slides about FactoMineR
User! 2007 slides about FactoMineR

The example illustrated here deals with sensory evaluation of red wines.
Load the data set as a text file by clicking here.

Presentation of the data

The data set is made of 21 rows (wines) and 31 columns.
The first two columns are categorical variables : label (Saumur, Bourgueil or Chinon) and soil (Reference, Env1, Env2 or Env4).
The 29 next columns are continuous sensory variables. For each wine, the value is the mean for all the judges.

Dataset wine click to view

To load the package and the data set, type the following line code: library(FactoMineR)
data(wine)

Objectives

We want to characterize the wines. We are looking for a typology of the wines.

The appropriate principal component method to characterize the wines by continuous variables is the Principal Components Analysis.
However, we can see that the data set is structured. Variables form different groups:

  • One categorical group (variables label and soil)
  • One group concerning the odor before shaking (variables Odor.Intensity.before.shaking, Aroma.quality.before.shaking, Fruity.before.shaking, Flower.before.shaking and Spice.before.shaking)
  • One group concerning visual evaluation (variables Visual.intensity, Nuance and Surface.feeling)
  • One group concerning the odor after shaking (variables Odor.Intensity, Quality.of.odour, Fruity, Flower, Spice, Plante, Phenolic, Aroma.intensity, Aroma.persistency and Aroma.quality)
  • One group concerning the taste (variables Attack.intensity, Acidity, Astringency, Alcohol, Balance, Smooth, Bitterness, Intensity and Harmony)
  • And one last group concerning an overall judgement (variables Overall.quality and Typical)

New objectives arise like comparing groups of variables (two sets of variables are close to each other if two wines which are close to each other according to the first set of variables are close to each other according to the second one) and highlight a typology of the groups or comparing simultaneously the typologies of wines seen by each group of variables taken one by one.

MFA

We are going to study the wines' profiles according to sensory evaluation. We will use as active groups the odor, visual, odor after shaking and taste groups and origin and overall groups as supplementary ones.

Type: res = MFA(wine, group=c(2,5,3,10,9,2), type=c("n",rep("s",5)), ncp=5, name.group=c("origin","odor","visual","odor.after.shaking", "taste","overall"), num.group.sup=c(1,6)) #wine: the data set used
#group: a vector indicating the number of variables in each group
#type: the type of the variables in each group. "s" for scaled continuous variables, "c" for centered (unscaled) continuous variables and "n" for categorical variables
#ncp: number of dimensions kept in the result
#name.group: names of the groups
#num.group.sup: indexes of the supplementary groups

Multiple Factor Analysis: scatterplot of variablesclick to viewMultiple Factor Analysis: scatterplot of individuals and categories

These first results can be interpreted the same way as the ones of a PCA.

The representation of variables shows that most of the variables are highly correlated to the first dimension, whatever group the belong to. This dimension represents "intensity" and "harmony", positive notions which are commonly used when speaking of wines.
The variables the most correlated to the second dimension are Spice before shaking and Odor intensity before shaking for the odor group, Spice, Plant and Odor intensity for the odor after shaking group and Bitterness for the taste group. This dimension represents a spicy, vegetal characteristic essentially due to olfaction.

The coordinates of individuals and categories can be linked to this interpretation of the two first principal components through the second graph.
The wine 1DAM was evaluated as the most "intense" and "harmonious" contrary to wines 1VAU and 2ING which are the least "intense" and "harmonious". The second axis is essentially due to wines T1 and T2. As these two wines were in fact the same one evaluated twice by the assessors, the second dimension will be designed as the "particular case of the wine T".
Most of the categories are close to the origin of the principal component map, which means that these categories are not related to "intensity", "harmony" or the "wine T". The category Env4 has high coordinates on the second axis but only related to T1 and T2. The category Reference, a priori related to an excellent wine-producing soil, has high coordinates on the first axis and thus positively correlated to "intensity" and "harmony", which confirms the a priori.

Multiple Factor Analysis: scatterplot of partial individualsclick to view Scatterplot of partial categories

The graph of partial individuals represents each wine viewed by each group and its barycenter. By default, the two wines with the smallest within inertia and the two wines with the largest within inertia are represented. If you want to plot all the partial points, use the folowwing line of code:
plot(res,choix="ind",partial="all")

1DAM was evaluated as particularly "intense" and "harmonious" especially by the odor group: its coordinates on the first axis are more extreme from this group's point of view than from the other groups' one. From the odor group's point of view, 2ING was more "intense" and "harmonious" than 1VAU but from the taste group's point of view, 1VAU was more "intense" and "harmonious" than 2ING.

All the groups have quite the same vision of the categories except for Env4 which resumes what we already said from the individuals principal component map.

Multiple Factor Analysis: scatterplot of groupsclick to view

This graph shows the quality of representation of each group.
The four active groups have close coordinates on the first dimension which means that their contribution to the first principal component is quite the same. It also means that the first principal component of the MFA is common to all the groups.
As for the second dimension, it is the olfactory groups that have the highest coordinates. These two groups contribute the most to the second principal component.

Scatterplot of groups' dimensionsclick to view

This graph is diplayed to look at the link between the principal components of the MFA and the ones of each single group.
Except for the origin group, the first dimension of each group is highly correlated to the MFA's first one.
The second dimension of the MFA is essentially correlated to the second dimension of the olfactory groups.