Continuous variables description

The condes() function allows to characterize a continuous variable by other continuous or categorical ones and categories.

Objectives

We are going to use the data set "wine" and characterize the "Overall quality" variable.

Which continuous and categorical variables and which categories describe the best overall quality ?

condes

First load the package and the data set by typing: library(FactoMineR)
data(wine)

Then launch the condes() function: res = condes(wine, num.var=30, proba=0.05) #wine: the data set used
#num.var: the indice of the variable to characterize
#proba: the significance threshold considered to characterize the variable (by default 0.05)

Description by continuous variables

The correlation coefficient between each continuous variable and the Overall quality variable is calculated. Then, the correlation coefficients significantly different from zero are sorted and returned.

Continuous variables description: description by continuous variables Click to view

Overall quality is best described by Balance then Smooth then Harmony, etc... Wines with high scores for these variables will tend to have high scores for Overall quality too.

Plante is significant and negatively correlated to Overall quality. It means that the more a wine smells like plant after shaking, the less it pleases the assessors.

Description by categorical variables and categories

An anova model with one factor is done for each categorical variable; Overall quality is explained by the categorical variable.
A F-test is derived to see whether the variable has an influence on Overall quality and T-tests are done category by category (with the contrast sum alpha_i=0).
The variables and the categories are sorted by p-value and only the significant ones are kept.

Continuous variables description: description by categorical variables Click to view Continuous variables description: description by categories

Soil is the only significant categorical variable for Overall quality.
Reference has a positive coefficient whereas Env4 has a negative one. This means that wines grown on Reference are more appreciated (higher score for Overall quality) and wines grown on Env4 are less appreciated (lower scores) than average wines.