i have following data frame in r:
c1 c2 1 10 2 20 3 30 b 4 40 b
i split
follows: z = lapply(split(test$c1, test$c2), function(x) {cut(x,2)})
. z
then:
$a [1] (9.99,15] (15,20] levels: (9.99,15] (15,20] $b [1] (30,35] (35,40] levels: (30,35] (35,40]
i merge factors unsplitting list unsplit(z, test$c2)
. generates warning:
[1] (9.99,15] (15,20] <na> <na> levels: (9.99,15] (15,20] warning message: in `[<-.factor`(`*tmp*`, i, value = 1:2) : invalid factor level, nas generated
i take union of factor levels , unsplit error not happen:
z$a = factor(z$a, levels=c(levels(z$a), levels(z$b))) unsplit(z, test$c2) [1] (9.99,15] (15,20] (30,35] (35,40] levels: (9.99,15] (15,20] (30,35] (35,40]
in real data frame have big list need iterate on list elements (not two). best way this?
if understood question properly, think making bit more complicated needed. here's 1 solution using plyr
. group c2
variable:
require(plyr) ddply(test, "c2", transform, newvar = cut(c1, 2))
which returns:
c1 c2 newvar 1 10 (9.99,15] 2 20 (15,20] 3 30 b (30,35] 4 40 b (35,40]
and has structure of:
'data.frame': 4 obs. of 3 variables: $ c1 : num 10 20 30 40 $ c2 : factor w/ 2 levels "a","b": 1 1 2 2 $ newvar: factor w/ 4 levels "(9.99,15]","(15,20]",..: 1 2 3 4
Comments
Post a Comment