If you have multiple vcf files split by chromosome from the same samples, this is the case when performing joint variant calls of multiple samples in GATK. At the end of the day, if you want to have a single vcf file from this project, CatVariants tool (a command line tool in GATK) is pretty fast. Although I think this might be just the case of simple cat of multiple files except the vcf header, this tools still come in handy especially when you already have GATK installed. Continue reading “Appending multiple vcf”
If Rstudio complains about failure to set default locale,
$ defaults write org.R-project.R force.LANG=en_US.UTF-8
sudo usermod -a -G groupName userName
I’ve come across a problem needing to transpose to wide table into a long format. I’m not talking about the longitudinal data quite yet, the one where you have one individual getting multiple measurements over time.
The question then is get a lot simpler than having to manipulate longitudinal data, which you can do with
in R. See:
has come into my rescue. With fread function reading in large data frame (or data table) has become much faster. Therefore, base on the simple fread and write.table. here comes the transpose function. You can get the script from my short script
transposeR.r github Genetics Library (which has just recently been updated).
Rscript transposeR.r data_1.txt data_2.txt
You can also use wildcard.
Rscript transposeR.r data_?.txt
I mostly tested it on mac, if your windows machine doesn’t play with
ls command then, the script might not work with multiple file wildcard.