# Categories Description

Multidimensional analyses are often completed with unidimensional ones to characterize some particular variables.
To characterize a categorical variable and the groups of individuals its categories define, one can use continuous variables, categorical ones or also categories.

## Objectives

We are going to use the dataset "tea" and characterize the "age_Q" variable.
"age_Q" is a categorical variable corresponding to age groups. Its categories are "15-24", "25-34", "35-44", "45-59" and "+60".

The main question that arises here is: are those different categories specifically linked to other variables/categories of the data set ?

Each category of "age_Q" defines a sub-population: the group of the individuals who possess the category. The use of the catdes function is going to allow us to see whether each sub-population can be characterized by the categorical variables, categories and continuous variables of the data set.

## catdes

First load the package and the data set by typing: ```library(FactoMineR) data(tea)```

Then launch the catdes() function: `res = catdes(tea, num.var=23, proba=0.05)` ```#tea: the data set used #num.var: the indice of the variable to characterize #proba: the significance threshold considered to characterize the category (by default 0.05)```

### Description by categorical variables

To evaluate the link between each category of the "age_Q" variable and other categorical variables, a chi-square test is performed. The more significant the test is, the more the considered category and categorical variable are linked.

The results of this test are in: `res\$test.chi2`

Click to view

The categorical variable the most linked to "age_Q" is "Socio-Professional Category", then "Tea", "sugar", "work" and so on.

### Description by categories

To study the link between one category of "age_Q" and another category of another categorical variable of the data set, the function compares two proportions:

• the proportion of individuals who possess the second category among those who possess the first
• the global percentage of individuals who possess the second category

The categories significantly linked to the categories of "age_Q" are in: `res\$category`

Let's have a look at two sub-populations: the groups of individuals corresponding to categories "15-24" and "+60".

Click to view

The category "student" is over represented (v-test>0) among individuals aged between 15 and 24 whereas "senior" is under represented (v-test<0).
On the contrary, "senior" is over represented among individuals aged over 60 and "student" is under represented.

For the sub-population "15-24":

• 84.3% of the individuals who possess "student" possess to "15-24"
• 64.1% of the individuals who possess "15-24" possess "student"
• 23.3% of the whole population possess "student"

### Description by continuous variables

For each category of "age_Q" and each continuous variable, a test value is calculated.

The results are in: `res\$quanti`

Here the results for categories "15-24" and "+60":
Click to view

There is only one continuous variable in the data set: the "age" variable.
This variable is significantly linked to both "15-24" and "+60"; individuals aged between 15 and 24 are significantly younger than the whole population and those aged over 60 are significantly older.