r - Breaking up a character string into multiple character strings on different lines -
i have data frame contains long character string each associated 'sample':
sample data 1 000000000000000000000000000n01000000000000n0n000000000n00n0000nn00n0n000000100000n00n0n0000000nnnn011111111111111111111111111111110000000000000000000n000000n0000000000n 2 000000000000000000000000000n01000000000000n0n000000000n00n0000nn00n0n000000100000n00n0n0000000nnnn011111111111111111111111111111110000000000000000000n000000n0000000000n
i code easy way break string 5 pieces in following format:
sample x cct6 - characters 1-33 gat1 - characters 34-68 imd3 - characters 69-99 pdr3 - characters 100-130 rim15 - characters 131-168
giving output looks each sample:
sample 1 cct6 - 000000000000000000000000000n01000 gat1 - 000000000n0n000000000n00n0000nn00n0 imd3 - n000000100000n00n0n0000000nnnn0 pdr3 - 1111111111111111111111111111111 rim15 - 0000000000000000000n000000n0000000000n
i've been able use substr
function break long string individual pieces id able automate can 5 pieces in 1 output. ideally output data frame.
this ?read.fwf
for.
first data looks question:
x <- data.frame(sample = c(1, 2), data = c("000000000000000000000000000n01000000000000n0n000000000n00n0000nn00n0n000000100000n00n0n0000000nnnn011111111111111111111111111111110000000000000000000n000000n0000000000n", "000000000000000000000000000n01000000000000n0n000000000n00n0000nn00n0n000000100000n00n0n0000000nnnn011111111111111111111111111111110000000000000000000n000000n0000000000n"), stringsasfactors = false)
now use read.fwf
, specify widths of each field , names, , should of mode character
. wrap text column of example data in textconnection
can treat connection understood read.*
, other functions.
(strs <- read.fwf(textconnection(x$data), widths = c(33, 35, 31, 31, 38), colclasses = "character", col.names = c("cct6", "gat1", "imd3", "pdr3", "rim15"))) cct6 gat1 imd3 pdr3 rim15 1 000000000000000000000000000n01000 000000000n0n000000000n00n0000nn00n0 n000000100000n00n0n0000000nnnn0 1111111111111111111111111111111 0000000000000000000n000000n0000000000n 2 000000000000000000000000000n01000 000000000n0n000000000n00n0000nn00n0 n000000100000n00n0n0000000nnnn0 1111111111111111111111111111111 0000000000000000000n000000n0000000000n
now loop on rows , print out each 1 per example:
for (i in 1:nrow(strs)) { writelines(paste("sample", i)) writelines(paste(names(strs), strs[i, ], sep = " - ")) }
giving, example:
sample 2 cct6 - 000000000000000000000000000n01000 gat1 - 000000000n0n000000000n00n0000nn00n0 imd3 - n000000100000n00n0n0000000nnnn0 pdr3 - 1111111111111111111111111111111 rim15 - 0000000000000000000n000000n0000000000n
Comments
Post a Comment