As the doctor gone rogue

May 19, 2010

Personal R library on the cluster

Filed under: R — Tags: , — hypotheses @ 4:50 pm

I’m trying to set up my own personal R library on our HPCCC. The current configuration does not allow me to use .Renviron to specify the location of my library as the home directory on the compute node is set up to be different from the home directory on the frontend node [side note: I wish our HPCCC admin is smarter than what he is now, and set this up correctly, so that we won’t have all these troubles again and again.

So, the only solution I have is to install my personal library on the frontend node. Then, add the path to this folder in all my R script that I want to run.
This can be done by first : Find out what is the full path of my personal R library on the frontend node

[1] "/users/bhoom/Rlibs"
[2] "/usr/bin/lib/Rlib/" # <- this is the default of my HPCCC

I then have to combine the output from the above command with what exist on the system already.


This way, on the compute node, my personal R library can still be located.


May 17, 2010

Alternate color manhattan plot

Filed under: genetics, R — hypotheses @ 8:31 pm

I am trying to make my own manhattan plot the way I want for a long time, and this one seems the be the shortest version that I come up with. Here’s the feature of my function.
1. Should be transportable. Only requires “CHR”, “BP”, “P” to run at a minimum.
2. You can choose to draw your own reference lines, using — cutoffs
3. You can define how tall your graph will be for comparison, –using maxy
4. It is based on basic plot function. So, you can add title to your plot later on. Add more color points, using points [need to convert your coordinate to the same way first].
The down sides.
1. I use the order on chromosome, instead of chromosomal position. So, it is not good to use this code if you only have one gene, and want to the signal location on the chromosome.
2. The signals are overlapping between chromosome in the low p-value region. However, this does not obscure your signal in the top. Improvement can be added.
3. Not all chromosomes are labelled on the X-axis. This is probably the most annoying things that need to be improved.
4. It doesn’t have much advantage in terms of speed. It still took 83 seconds to plot 860,000 signals. That’s roughly 10,000 points per second, though.
Besides all these it still seems pretty good.
manhattan <- function(data,color=c("black","grey"),maxy,cutoffs= c(5e-8)) {
if (! sum(c("CHR","BP","P") %in% names(data)) == 3) stop("Data must have CHR BP P column")
data <- data[order(data$CHR,data$BP),]
data <- data[!$P),]
nsnps <- nrow(data)
data$COOR <- 1:nsnps
ticks <- tapply(data$COOR,data$CHR,median)
data$nlogp <- -log10(data$P)
chr <- sort(unique(data$CHR))
if (missing(maxy)) {
	} else {
	xaxt="n",type="n",xlab="chromosome",ylab="-log10(p)")	}
axis(1,at=ticks,labels=chr) <- ifelse(data$CHR%%2,1,0)
for (coi in 1:length(cutoffs)) {

May 5, 2010

Boxplot with data points

Filed under: R — hypotheses @ 7:38 pm


Create a free website or blog at