If you have multiple vcf files split by chromosome from the same samples, this is the case when performing joint variant calls of multiple samples in GATK. At the end of the day, if you want to have a single vcf file from this project, CatVariants tool (a command line tool in GATK) is pretty fast. Although I think this might be just the case of simple cat of multiple files except the vcf header, this tools still come in handy especially when you already have GATK installed. Continue reading “Appending multiple vcf”
Recently, I received a data coding date, for example, as 8/8/2011 for Aug 8, 2011. When importing this file into STATA, STATA automatically treated this as string. To be able to manipulate “date” easily, I converted it to numeric variable
gen int ndate = date(date, “MDY”)
Then for STATA to display this variable properly in a human readable format
format ndate %td
I, then, I have a question about the mid-visit date, which can now be easily calculated as
mean ndate if visit==2
This gave me number like 18302
Till now, the only way to find out what day this is for me is
display day(18342) month(18342) year(18342)
I think there might be an easier way to split out the DMY format, but I don’t know how yet.
For further information about working in date in STATA, I refer you to the resource on UCLA website, and help files in STATA.