Improving the quality of cancer tissues for research

Through careful characterization of specimens, a new study has come up with some conclusion on how we can improve the quality of cancer specimens for research.

Read the summary on NCI Blog post

Full article is published in Journal of Oncology Practice:



Using public exome database as your control in WES association studies

Checkout the new software release TRAPD, which stands for (Test Rare vAriants with Public Data)

Read the detail on the article published in AJHG this month at

A note on getting start with samtools on Mac OsX 10.8.5 Mountain Lion

samtools is a handy tool for sequence alignment and mapping (

For more information, please refer to the original article here:

  • Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N., Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data Processing Subgroup (2009) The Sequence alignment/map (SAM) format and SAMtools. Bioinformatics, 25, 2078-9. [PMID: 19505943]

To get it to work on mac, I’ve decided to compile it.

First, get the sourcecode from

Then, decompress the files in ~/bin

Compile it as usual

make install

You might need to have administrative right to write to system folder as well.

Next, we'll need to add a path to ~/.bash_profile

Include this in ~/.bash_profile to point to SAMTOOLS_HOME

export SAMTOOLS_HOME=~/bin/samtools-0.1.19
export PATH=$SAMTOOLS_HOME/bcftools/:$PATH

Now, samtools should be ready for you to use.

How to wget with proxy authentication?

Once again, I have a problem with proxy server authentication through my university network. Trying to install the new KGGSeq software to do next-generation sequencing data analysis.

As a quick fix, with cygwin, here is what I did.

1. Need to tell bash that  that we are using a proxy server

## Add these to ~/.bashrc for my bash start up shell

## Add these to ~/.bashrc for my bash start up shell

export http_proxy=$proxy

2. Need to tell wget what username and password to use with the proxy server.

As an example to download KGGSeq through cygwin, here’s what I did.

wget --proxy-user "bhoom" --proxy-password "bhoom_password"

Wget – ArchWiki.

I’m still not quite sure why they still use it. There seems to be several other enterprise authentication system, but all other systems are probably pricy? But does price justify all the other troubles we all have with slow connection for every website, problems running many bioinformatics software that cannot connect through proxy-server, etc?

Ion Torrent PGM vs PacBio vs MiSeq

“It’s a lot cheaper to buy PGM compared to other sequencing platform. So, should we buy it?”

A common concern regarding this question is whether the sequencing quality is alright? This is one of the first concern Ion Torrent seems to have experience since they first launched their first sequencer.

Quail, et al took a look at three platform in their paper. Although they have only looked at microbial genome with variable GC/AT content. They showed that there still seems to be a problem with Ion Torrent PGM platform when they sequence Plasmodium genome. Moreover, the false positive rate of base calling from Ion Torrent platform is still higher.

This data may not be applicable to human genome sequencing, but it deserves a closer look in my opinion.

BMC Genomics | Full text | A tale of three next generation sequencing platforms: comparison of Ion torrent, pacific biosciences and illumina MiSeq sequencers.

Modifier Genes to Protect From Alzheimer’s Disease, Serious Infection in Cystic Fibrosis: Topol on Genomics

Modifier Genes to Protect From Alzheimer’s Disease, Serious Infection in Cystic Fibrosis: Topol on Genomics.

Dr. Topol talked about APP gene and its preservative effect on cognitive function, increase longevity, and protect against Alzheimer disease. You can find the full article here:


The bad news is this APP-A673T variant only exists in 0.5% of the studied Scandinavian population.

UCSC genome browser human gene location?

To download UCSC RefSeq Gene info with HUGO gene name, start and stop codon or transcription start-end site, locate the refflat.txt.gz file on the annotation database
– Hg18:
– Hg19:
The reFlat.txt.gz contains the following columns as described in the reFlat.sql file
– 1) Hugo gene name
– 2) chromosome
– 3) strand (+/-)
– 4) Transcription start position
– 5) Transcription end position
– 6) Coding region start
– 7) Coding region end
– 8) Number of exons
– 9) Exon start positions
– 10) Exon end positions

These info should come in handy when you have to map the location of genetic marker to genes. I post the script I wrote to do this mapping latter if there is anyone interested.