Visualización básica de datos en R

logo-rGenerar datos a utilizar

Define los datos con los que posteriormente se realizarán los ejemplos.

set.seed(20)
x = c(runif(13)*3+10,runif(12)*4+8,runif(15)*2+15,runif(10)*3+2)
y = c(runif(13)*3+5,runif(12)*3+15,runif(15)*10+15,runif(10)*4+6)
color = c(rep("red",13), rep("green",12),rep("blue",15),rep("orange",10))
size = round(runif(length(x))*100,0)
plot(x,y, col=color)
df <- data.frame(x,y,color,size)
           x         y  color size
1  12.632564  5.211506    red   86
2  12.305600  6.453315    red   95
3  10.836889  5.826603    red   49
4  11.587491  7.858635    red  100
5  12.888721  5.377258    red   58
6  12.941064  6.531751    red   67
7  10.273998  5.020006    red   91
8  10.212248  6.381184    red   74
9  10.982782  5.809310    red   77
10 11.110224  5.066758    red   93

 

Ejemplo: Visualizar la estructura de datos

str(df)
'data.frame':	50 obs. of  4 variables:
 $ x    : num  12.6 12.3 10.8 11.6 12.9 ...
 $ y    : num  5.21 6.45 5.83 7.86 5.38 ...
 $ color: Factor w/ 4 levels "blue","green",..: 4 4 4 4 4 4 4 4 4 4 ...
 $ size : num  86 95 49 100 58 67 91 74 77 93 ...

 

Ejemplo: Visualizar los N primeros elementos del dataframe

head(df, 5)
           x         y  color size
1  12.632564  5.211506    red   86
2  12.305600  6.453315    red   95
3  10.836889  5.826603    red   49
4  11.587491  7.858635    red  100
5  12.888721  5.377258    red   58
6  12.941064  6.531751    red   67
7  10.273998  5.020006    red   91
8  10.212248  6.381184    red   74
9  10.982782  5.809310    red   77
10 11.110224  5.066758    red   93

 

Ejemplo: Visualizar los N últimos elementos del dataframe

tail(df, 5)
          x        y  color size
46 3.307021 9.482327 orange   90
47 3.932822 9.390446 orange   25
48 4.246866 9.029016 orange   88
49 3.213093 7.405948 orange   87
50 3.973746 6.138242 orange    0

 

Ejemplo: Contar el número de elementos cada categorías

table(df$color)
 blue  green orange    red 
    15     12     10     13 

 

Ejemplo: Categorizar una variable numérica

install.packages("agricolae")
library("agricolae")
aux <- table.freq( hist(df$y, plot=FALSE, breaks = 4) )
df$category<-cut(df$y, c(aux$Lower,max(aux$Upper)))
[1] (5,10]  (5,10]  (5,10]  (5,10]  (5,10]  (5,10]  (5,10]  (5,10]  (5,10]  (5,10]  (5,10]  (5,10]  (5,10] 
[14] (15,20] (15,20] (15,20] (15,20] (15,20] (15,20] (15,20] (15,20] (15,20] (15,20] (15,20] (15,20] (15,20]
[27] (20,25] (15,20] (20,25] (15,20] (15,20] (20,25] (15,20] (20,25] (20,25] (15,20] (15,20] (20,25] (15,20]
[40] (15,20] (5,10]  (5,10]  (5,10]  (5,10]  (5,10]  (5,10]  (5,10]  (5,10]  (5,10]  (5,10] 
Levels: (5,10] (10,15] (15,20] (20,25]

 

Ejemplo: Crear de una tabla de contingencias

Representa el número de observaciones muestrales según (dos) de sus categorías.
Cada celda representa el número de elementos existentes para la categoria X y la categoría Y
con_table <- table(df$color,df$category)
con_table
         (5,10] (10,15] (15,20] (20,25]
  blue        0       0       9       6
  green       0       0      12       0
  orange     10       0       0       0
  red        13       0       0       0

Otros artículos que pueden ser de interés:

Autor: Diego Calvo