r - Breaking up a character string into multiple character strings on different lines -
i have data frame contains long character string each associated 'sample':
sample data 1 000000000000000000000000000n01000000000000n0n000000000n00n0000nn00n0n000000100000n00n0n0000000nnnn011111111111111111111111111111110000000000000000000n000000n0000000000n 2 000000000000000000000000000n01000000000000n0n000000000n00n0000nn00n0n000000100000n00n0n0000000nnnn011111111111111111111111111111110000000000000000000n000000n0000000000n i code easy way break string 5 pieces in following format:
sample x cct6 - characters 1-33 gat1 - characters 34-68 imd3 - characters 69-99 pdr3 - characters 100-130 rim15 - characters 131-168 giving output looks each sample:
sample 1 cct6 - 000000000000000000000000000n01000 gat1 - 000000000n0n000000000n00n0000nn00n0 imd3 - n000000100000n00n0n0000000nnnn0 pdr3 - 1111111111111111111111111111111 rim15 - 0000000000000000000n000000n0000000000n i've been able use substr function break long string individual pieces id able automate can 5 pieces in 1 output. ideally output data frame.
this ?read.fwf for.
first data looks question:
x <- data.frame(sample = c(1, 2), data = c("000000000000000000000000000n01000000000000n0n000000000n00n0000nn00n0n000000100000n00n0n0000000nnnn011111111111111111111111111111110000000000000000000n000000n0000000000n", "000000000000000000000000000n01000000000000n0n000000000n00n0000nn00n0n000000100000n00n0n0000000nnnn011111111111111111111111111111110000000000000000000n000000n0000000000n"), stringsasfactors = false) now use read.fwf, specify widths of each field , names, , should of mode character. wrap text column of example data in textconnection can treat connection understood read.* , other functions.
(strs <- read.fwf(textconnection(x$data), widths = c(33, 35, 31, 31, 38), colclasses = "character", col.names = c("cct6", "gat1", "imd3", "pdr3", "rim15"))) cct6 gat1 imd3 pdr3 rim15 1 000000000000000000000000000n01000 000000000n0n000000000n00n0000nn00n0 n000000100000n00n0n0000000nnnn0 1111111111111111111111111111111 0000000000000000000n000000n0000000000n 2 000000000000000000000000000n01000 000000000n0n000000000n00n0000nn00n0 n000000100000n00n0n0000000nnnn0 1111111111111111111111111111111 0000000000000000000n000000n0000000000n now loop on rows , print out each 1 per example:
for (i in 1:nrow(strs)) { writelines(paste("sample", i)) writelines(paste(names(strs), strs[i, ], sep = " - ")) } giving, example:
sample 2 cct6 - 000000000000000000000000000n01000 gat1 - 000000000n0n000000000n00n0000nn00n0 imd3 - n000000100000n00n0n0000000nnnn0 pdr3 - 1111111111111111111111111111111 rim15 - 0000000000000000000n000000n0000000000n
Comments
Post a Comment