Convert data.frame columns from factors to characters



Answers

To replace only factors:

i <- sapply(bob, is.factor)
bob[i] <- lapply(bob[i], as.character)

In package dplyr in version 0.5.0 new function mutate_if was introduced:

library(dplyr)
bob %>% mutate_if(is.factor, as.character) -> bob

Package purrr from RStudio gives another alternative:

library(purrr)
library(dplyr)
bob %>% map_if(is.factor, as.character) %>% as_data_frame -> bob

(keep in mind it's fresh package)

Question

I have a data frame. Let's call him bob:

> head(bob)
                 phenotype                         exclusion
GSM399350 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399351 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399352 3- 4- 8- 25- 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399353 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399354 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-
GSM399355 3- 4- 8- 25+ 44+ 11b- 11c- 19- NK1.1- Gr1- TER119-

I'd like to concatenate the rows of this data frame (this will be another question). But look:

> class(bob$phenotype)
[1] "factor"

Bob's columns are factors. So, for example:

> as.character(head(bob))
[1] "c(3, 3, 3, 6, 6, 6)"       "c(3, 3, 3, 3, 3, 3)"      
[3] "c(29, 29, 29, 30, 30, 30)"

I don't begin to understand this, but I guess these are indices into the levels of the factors of the columns (of the court of king caractacus) of bob? Not what I need.

Strangely I can go through the columns of bob by hand, and do

bob$phenotype <- as.character(bob$phenotype)

which works fine. And, after some typing, I can get a data.frame whose columns are characters rather than factors. So my question is: how can I do this automatically? How do I convert a data.frame with factor columns into a data.frame with character columns without having to manually go through each column?

Bonus question: why does the manual approach work?




At the beginning of your data frame include stringsAsFactors = FALSE to ignore all misunderstandings.




Update: Here's an example of something that doesn't work. I thought it would, but I think that the stringsAsFactors option only works on character strings - it leaves the factors alone.

Try this:

bob2 <- data.frame(bob, stringsAsFactors = FALSE)

Generally speaking, whenever you're having problems with factors that should be characters, there's a stringsAsFactors setting somewhere to help you (including a global setting).




I know this answer is a bit late, but if you understand how factors are stored, you can avoid using apply-based functions to accomplish this. Which isn't at all to imply that the apply solutions don't work well.

Factors are structured as numeric indices tied to a list of 'levels'. This can be seen if you convert a factor to numeric. So:

> fact <- as.factor(c("a","b","a","d")
> fact
[1] a b a d
Levels: a b d

> as.numeric(fact)
[1] 1 2 1 3

The numbers returned in the last line correspond to the levels of the factor.

> levels(fact)
[1] "a" "b" "d"

Notice that levels() returns an array of characters. You can use this fact to easily and compactly convert factors to strings or numerics like this:

> fact_character <- levels(fact)[as.numeric(fact)]
> fact_character
[1] "a" "b" "a" "d"

This also works for numeric values, provided you wrap your expression in as.numeric().

> num_fact <- factor(c(1,2,3,6,5,4))
> num_fact
[1] 1 2 3 6 5 4
Levels: 1 2 3 4 5 6
> num_num <- as.numeric(levels(num_fact)[as.numeric(num_fact)])
> num_num
[1] 1 2 3 6 5 4



I typically make this function apart of all my projects. Quick and easy.

unfactorize <- function(df){
  for(i in which(sapply(df, class) == "factor")) df[[i]] = as.character(df[[i]])
  return(df)
}



If you would use data.table package for the operations on data.frame then the problem is not present.

library(data.table)
dt = data.table(col1 = c("a","b","c"), col2 = 1:3)
sapply(dt, class)
#       col1        col2 
#"character"   "integer" 

If you have a factor columns in you dataset already and you want to convert them to character you can do the following.

library(data.table)
dt = data.table(col1 = factor(c("a","b","c")), col2 = 1:3)
sapply(dt, class)
#     col1      col2 
# "factor" "integer" 
upd.cols = sapply(dt, is.factor)
dt[, names(dt)[upd.cols] := lapply(.SD, as.character), .SDcols = upd.cols]
sapply(dt, class)
#       col1        col2 
#"character"   "integer" 



Links



Tags

r r   dataframe