R 因子

因子是用于对数据进行分类并将其存储为级别的数据对象。 它们可以存储字符串和整数。 它们在具有有限数量的唯一值的列中很有用。 像“男性”,“女性”和True,False等。它们在统计建模的数据分析中很有用。使用factor()函数通过将向量作为输入创建因子。

  1. # Create a vector as input.
  2. data <- c("East","West","East","North","North","East","West","West","West","East","North")
  3. print(data)
  4. print(is.factor(data))
  5. # Apply the factor function.
  6. factor_data <- factor(data)
  7. print(factor_data)
  8. print(is.factor(factor_data))


  1. [1] "East" "West" "East" "North" "North" "East" "West" "West" "West" "East" "North"
  2. [1] FALSE
  3. [1] East West East North North East West West West East North
  4. Levels: East North West
  5. [1] TRUE



  1. # Create the vectors for data frame.
  2. height <- c(132,151,162,139,166,147,122)
  3. weight <- c(48,49,66,53,67,52,40)
  4. gender <- c("male","male","female","female","male","female","male")
  5. # Create the data frame.
  6. input_data <- data.frame(height,weight,gender)
  7. print(input_data)
  8. # Test if the gender column is a factor.
  9. print(is.factor(input_data$gender))
  10. # Print the gender column so see the levels.
  11. print(input_data$gender)


  1. height weight gender
  2. 1 132 48 male
  3. 2 151 49 male
  4. 3 162 66 female
  5. 4 139 53 female
  6. 5 166 67 male
  7. 6 147 52 female
  8. 7 122 40 male
  9. [1] TRUE
  10. [1] male male female female male female male
  11. Levels: female male



  1. data <- c("East","West","East","North","North","East","West","West","West","East","North")
  2. # Create the factors
  3. factor_data <- factor(data)
  4. print(factor_data)
  5. # Apply the factor function with required order of the level.
  6. new_order_data <- factor(factor_data,levels = c("East","West","North"))
  7. print(new_order_data)


  1. [1] East West East North North East West West West East North
  2. Levels: East North West
  3. [1] East West East North North East West West West East North
  4. Levels: East West North


我们可以使用gl()函数生成因子级别。 它需要两个整数作为输入,指示每个级别有多少级别和多少次。


  1. gl(n, k, labels)

以下是所使用的参数的说明 -

  • n是给出级数的整数。

  • k是给出复制数目的整数。

  • labels是所得因子水平的标签向量。


  1. v <- gl(3, 4, labels = c("Tampa", "Seattle","Boston"))
  2. print(v)


  1. Tampa Tampa Tampa Tampa Seattle Seattle Seattle Seattle Boston
  2. [10] Boston Boston Boston
  3. Levels: Tampa Seattle Boston