As the doctor gone rogue

November 27, 2015

Appending multiple vcf

Filed under: Uncategorized — Tags: , , — hypotheses @ 12:10 am

If you have multiple vcf files split by chromosome from the same samples, this is the case when performing joint variant calls of multiple samples in GATK. At the end of the day, if you want to have a single vcf file from this project, CatVariants tool (a command line tool in GATK) is pretty fast. Although I think this might be just the case of simple cat of multiple files except the vcf header, this tools still come in handy especially when you already have GATK installed. (more…)

February 16, 2011

Working with date on Stata

Filed under: data management, STATA — Tags: , , — hypotheses @ 3:25 pm

Recently, I received a data coding date, for example, as 8/8/2011 for Aug 8, 2011. When importing this file into STATA, STATA automatically treated this as string. To be able to manipulate “date” easily, I converted it to numeric variable

gen int ndate = date(date, “MDY”)

Then for STATA to display this variable properly in a human readable format

format ndate %td

I, then, I have a question about the mid-visit date, which can now be easily calculated as

mean ndate if visit==2

This gave me number like 18302

Till now, the only way to find out what day this is for me is

display day(18342) month(18342) year(18342)

I think there might be an easier way to split out the DMY format, but I don’t know how yet.

For further information about working in date in STATA, I refer you to the resource on UCLA website, and help files in STATA.

Blog at WordPress.com.