Tag Archives: data manipulation

Appending multiple vcf


If you have multiple vcf files split by chromosome from the same samples, this is the case when performing joint variant calls of multiple samples in GATK. At the end of the day, if you want to have a single vcf file from this project, CatVariants tool (a command line tool in GATK) is pretty fast. Although I think this might be just the case of simple cat of multiple files except the vcf header, this tools still come in handy especially when you already have GATK installed. Continue reading Appending multiple vcf

Advertisements

Working with date on Stata


Recently, I received a data coding date, for example, as 8/8/2011 for Aug 8, 2011. When importing this file into STATA, STATA automatically treated this as string. To be able to manipulate “date” easily, I converted it to numeric variable

gen int ndate = date(date, “MDY”)

Then for STATA to display this variable properly in a human readable format

format ndate %td

I, then, I have a question about the mid-visit date, which can now be easily calculated as

mean ndate if visit==2

This gave me number like 18302

Till now, the only way to find out what day this is for me is

display day(18342) month(18342) year(18342)

I think there might be an easier way to split out the DMY format, but I don’t know how yet.

For further information about working in date in STATA, I refer you to the resource on UCLA website, and help files in STATA.