Appending multiple vcf


If you have multiple vcf files split by chromosome from the same samples, this is the case when performing joint variant calls of multiple samples in GATK. At the end of the day, if you want to have a single vcf file from this project, CatVariants tool (a command line tool in GATK) is pretty fast. Although I think this might be just the case of simple cat of multiple files except the vcf header, this tools still come in handy especially when you already have GATK installed. Continue reading Appending multiple vcf

Advertisements

Transpose Table Sideway


I’ve come across a problem needing to transpose to wide table into a long format. I’m not talking about the longitudinal data quite yet, the one where you have one individual getting multiple measurements over time.

The question then is get a lot simpler than having to manipulate longitudinal data, which you can do with

library(reshape)

in R. See:

 

?melt
?cast

 

 

Recently, the

library(data.table)

has come into my rescue. With fread function reading in large data frame (or data table) has become much faster. Therefore, base on the simple fread and write.table. here comes the transpose function. You can get the script from my short script transposeR.r github Genetics Library (which has just recently been updated).

Rscript transposeR.r data_1.txt data_2.txt

You can also use wildcard.

Rscript transposeR.r data_?.txt

I mostly tested it on mac, if your windows machine doesn’t play with ls command then, the script might not work with multiple file wildcard.