F.A.Q

You will find here the answers to various questions about R and FactoMineR and more specific ones about graphical options.

Various questions

How do I install the R software for the first time?

Click here to see an animated tutorial.

How do I install the FactoMineR Rcmdr plug-in with Rcmdr?

Download the package RcmdrPlugin.FactoMineR to add the FactoMineR GUI in Rcmdr:

  • download the FactoMineR package (on the CRAN or on the FactoMineR Website)
  • download the Rcmdr package (on the CRAN)
  • download the RcmdrPlugin.FactoMineR package (on the CRAN or on FactoMineR Website)
  • open an R session then type: library(FactoMineR)
  • open an Rcmdr session: library(Rcmdr)
  • click on Tools -> download Rcmdr plug-ins and choose the RcmdrPlugin.FactoMineR

How are missing values taken into account?

By default, missing values in FactoMineR are replaced by the mean of each variable which is not a very proper and convenient way to deal with missing values, especially when there are a lot of them in your dataset. We have implemented a package missMDA to deal with missing values in PCA, in CA, in MCA, in FAMD, and in MFA.

How does PCA behave in high dimension?

For the moment, FactoMineR is not an efficient tool to deal with very high dimensional datasets. The graphical representations are not created to cope such datasets. However, it will be possible (soon) to collect only few scores and loadings of big datasets in order to make a preprocessing of big data.

What is a supplementary variable?

A supplementary variable is a variable which will not be taken into account during the construction of the factorial axes i.e. the calculation of distances between the individuals.
Whatever method you use, only active variables will be taken into account for the construction of the factorial plane.

Where do I find scores and loadings in res.pca?

Scores (i.e. principal coordinates) are in: res.pca$ind$coord The variance of the individuals' coordinates for a dimension corresponds to the eigenvalue of this dimension.

Loadings (i.e. standard coordinates) are not given by FactoMineR's methods. They return principal coordinates.
You can calculate them by dividing variables' coordinates on a dimension by this dimension's eigenvalue's square root.
Just type: sweep(res.pca$var$coord,2,sqrt(res.pca$eig[1:ncol(res.pca$var$coord),1]),FUN="/")

What are contributions?

The contribution of a point to the inertia of an axis is the quotient between the inertia of its projection and the inertia of the whole scatterplot's projection on this axis.

I deleted some individuals and thus suppressed some categories which were taken only by those individuals. But R has still got these categories with 0 individuals in memory, how do I recode the variables?

Suppose your variable of interest is variable X with three levels: A, B and C. After you delete individuals, B has got 0 individuals left.
To delete level B from R, type the following code: dataset[,X] <- factor(as.character(dataset[,X]))

Do I have to standardize the variables when doing a PCA?

If variables do not have the same units, it is essential to standardize them.
If variables have the same units, their influence in the calculus is balanced according to their standard deviation. To standardize them gives them all the same importance. Knowing that, standardization or no standardization is your choice.

This package does PCA only based on correlation matrix. Is it possible to use covariance matrix instead of correlation one?

When you choose to perform an unscaled PCA, a covariance matrix is used instead of a correlation one. Just choose scale.unit=FALSE when typing PCA(...).

I was wondering in the AFM function, what the different types "c", "n" and "s" mean?

"c" and "s" are for quantitative variables: for "s" variables are scaled to unit variance, for "c" they are just centered.
"n" is for qualitative variables.
By default, all quantitative variables are scaled to unit variance.

Graphical options

How can I add a title to my graph? Can I change the range of the axes in my graph?

All the graphs are plotted with the functions plot.PCA(), plot.MCA(), plot.CA(), ... To change the graphical options, you should see the help of these functions.
For example, to add a title in a PCA graph and change the range in the x-axis, you make the PCA using the option graph=F, and then plot the graph with the plot.PCA() function:

res.pca = PCA(mydata, graph=FALSE)
plot(res.pca, main="Title of my graph", xlim=c(-2,3))

How do I gather several graphs into one single plot?

You should use for example the function plot.PCA if you are doing a PCA (else you use the other plotting functions) with the argument new.plot = FALSE.
For example: data(decathlon)
res.pca <- PCA(decathlon, quanti.sup = 11:12, quali.sup=13,graph=FALSE)
par(mfrow=c(1,2))
plot(res.pca,choix="ind",new.plot=FALSE)
plot(res.pca,choix="var",new.plot=FALSE)

I have got too many variables to represent and cannot see anything on my graph, how do I represent only the variables which are the best represented?

Use the lim.cos2.var option of the function graph.var(). It allows you yo choose the value of the square cosinus under which the variables are not drawn.

I would like to represent only supplementary individuals on the graph, how do I remove active ones?

Use the invisible option of plot.PCA() (or plot.MCA,...). plot.PCA(res.pca, choix="ind",invisible="ind") For more details, see: help(plot.PCA)