As the doctor gone rogue

April 20, 2012

Understanding Random Number Generator in R

Filed under: Uncategorized — Tags: , — hypotheses @ 5:04 pm

“Mersenne-Twister”is default method for generating random number. The brief description is

 

“From Matsumoto and Nishimura (1998). A twisted GFSR with period 2^19937 – 1 and equidistribution in 623 consecutive dimensions (over the whole period). The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.”

 

At any given moment, “the current seed” is stored in “.Random.seed”. For a short example below, here is what the .Random.seed looks like.

> runif(5)
[1] 0.5214241 0.6072482 0.8581209 0.1057113 0.4943451
> length(.Random.seed)
[1] 626

# It is 626 long.

> head(.Random.seed)
  [1]         403           5   524275616  -891164866   839495241  2076756731  -695076150

# The first value is the type of random number generator, in this case “Mersenne-Twister”.

# The second value is the “current position” in the set

# The other 624 numbers stored in .Random.seed are what they call “seed”.

# So, if you simulated another 5 random numbers,

> runif(5)
[1] 0.9766740 0.9481603 0.1786681 0.4026092 0.7110552

# The current position changed from 5 (above) to 10 (see below). Notice that other 624 numbers in the set remained the same.

> head(.Random.seed)
[1] 403 10 619611805 -1461824745 -1054662018 -1340796360

# Once you simulate 624th random number, the index of current position will be 624.

 

 

> runif(1)
[1] 0.8074707
> head(.Random.seed)
[1] 403 624 619611805 -1461824745 -1054662018 -1340796360

# When you simulate one more random number, a new seed will then be used.

> runif(1)
[1] 0.6062263
> head(.Random.seed)
[1] 403 1 -1016206940 813281659 -1786346428 -363208600

# So, any given moment, you can save the current seed that will be used for the next simulation by

 

> my.seed <- .Random.seed
> runif(10)
 [1] 0.8797226 0.2143323 0.6826555 0.4366940 0.4330555 0.5745667 0.7406138 0.4646236 0.8014502 0.9818474

# This saved the current seed to be used for the next simulation in “my.seed”

# You can then restore the current seed and reproduce the above simulated numbers by

> .Random.seed <- t(my.seed)
> runif(10)
[1] 0.8797226 0.2143323 0.6826555 0.4366940 0.4330555 0.5745667 0.7406138 0.4646236 0.8014502 0.9818474

 

# See that the runif(10) here gave the same numbers as the one above.

April 18, 2012

New hope for patients with type I diabetes

Filed under: Uncategorized — hypotheses @ 6:17 pm

If I read this and understand correctly, patients with type I diabetes will have chance to be cured by stem cell/gene therapy soon.

Developmental biology: Re-evaluating gut insulin instinct : Article : Nature Reviews Genetics.

According to this article

  • Talchai, C. et al. Generation of functional insulin-producing cells in the gut by Foxo1 ablation. Nature Genet. 7 Feb 2012 (doi: 10.1038/ng.2215) Article

Talchai, et al were seeking to understand the role of Foxo1 ablation in mice. They found that by knocking out Foxo1, progenitor cells in the gut (which are the same cell lineage as the insulin producing beta cells in pancreas) will develop into insulin producing cells.

A lot more work to be done, but this same idea is very interesting and can probably be applied to other diseases as well. By knowing how to modify a common progenitor cells into the right target cells, any diseases could probably be treated.

 

April 11, 2012

How to “tee” stderr

Filed under: bash — hypotheses @ 5:38 pm

So, I want to capture the standard error to the log file.

The post below was by pfiynn, from the http://www.unix.com forum:

If your shell is Bash or similar, this set of redirections will do the job


command 3>&1 1>&2 2>&3 | tee file

What does it mean?

The redirection operator n>&m, makes file descriptor n to be a copy of file descriptor m.

So, we are:

– Opening a new file descriptor, 3, that is a copy of file descriptor 1, the standard output;

– Making file descriptor 1 a copy of file descriptor 2, the standard error output;

– Making file descriptor 2 to be a copy of file descriptor 3 the “backup” of the standard outputin a short: we swapped the standard output and the standard error output.

via How to “tee” stderr – The UNIX and Linux Forums.

April 6, 2012

Convert plink format to vcf

Filed under: data management, plink, plinkseq, vcf, vcftools — hypotheses @ 12:20 pm

So, I recently got to work with vcf file format for the first time. It is been quite a mess to start with, but after all the helps I can gather around here (Thank you very much, YK & HS !!!) I finally came down to a solution.

You probably have heard about “plink”, “plink/seq”, and “vcftools”. Among the three programs, “plink” has been around the longest, and one of the reason may be because it is versatile, has good documentation, and quite easy to use. Although plink/seq was inspired by plink, its documentation is still during development (I tried to be optimistic here).

For this short tutorial, I am going to use PLINK/BED file as a medium for file conversion, since it is the fastest file to manipulate and the most efficient to use in terms of disc space and memory requirement (from my experience).

“So, my question is how do I convert my GWAS data to vcf format, with a specific reference allele that I want to use i.e. not the default ‘major allele’ as a reference as we typically do”

So, I now have PLINK/Binary format file call csOmni25 and a file containing reference allele csOmni25.refAllele

As a short note: csOmni25.refAllele looks like this


-------- csOmni25.refAllele ----------------

rs12345 A

rs12958 G

rs10596 C

rs18607 T

...

...

---------end csOmni25.refAllele--------------

  1. First, we will convert PLINK/Binary format file so that A1 [reference allele] correspond to the reference allele that we want
    NOTE: when you create the reference allele file, make sure that all reference alleles are in UPPER CASE.
  2. Second, we will import plink/bed to plink/seq and write out vcf format file

Here’s the script that I use. It’s that simple. And for your reference, it took me a week to figure this out, and tested it.


#!/bin/sh

##-- SCRIPT PARAMETER TO MODIFY--##

PLINKFILE=csOmni25

REF_ALLELE_FILE=csOmni25.refAllele

NEWPLINKFILE=csOmni25Ref

PLINKSEQ_PROJECT=csGWAS

## ------END SCRIPT PARAMETER------ ##

#1. convert plink/binary to have the specify reference allele

plink --noweb --bfile $PLINKFILE --reference-allele $REF_ALLELE_FILE --make-bed --out $NEWPLINKFILE

#2. create plink/seq project

pseq $PLINKSEQ_PROJECT new-project

#3. load plink file into plink/seq

pseq $PLINKSEQ_PROJECT load-plink --file $NEWPLINKFILE --id $NEWPLINKFILE

#4. write out vcf file, as of today 4/6/2012  using vcftools version 0.1.8, although the documentation says that you can write out a compressed vcf format using --format BGZF option, vcftools doesn't recognize what this option is. So, I invented my own solution

pseq $PLINKSEQ_PROJECT write-vcf | gzip > $NEWPLINKFILE.vcf.gz

At the end, this will create a compressed vcf file “csOmni25Ref.vcf.gz” with the specified reference alleles.

April 5, 2012

R FAQ: How can I format a string containing a date into R “Date” object??

Filed under: data management, R — hypotheses @ 3:39 pm

This sounds like a problem that statisticians occasionally have to deal with. It is quite simple, just convert the string to date or time if you remember what command to use. If your date variable has a value like this 2007-05-31 for May 31, 2007. You can simply use the example below.


as.Date(DATE_VARIABLE,format='%Y-%m-%d')

For more example on date, I refer you to UCLA R-help page:

R FAQ: How can I format a string containing a date into R “Date” object??.

For a complete list of date-time formatting conversion see this


?strptime

Blog at WordPress.com.