As the doctor gone rogue

January 31, 2017

Import data from Excel to R using gdata library

Filed under: data management, R — Tags: , , , , — hypotheses @ 9:00 pm

There are several ways to import data into R.

The standard way, what it used to be, is from a text file using read.table() function.

For excel files, the most famous spreadsheet software on the world, several libraries can be used to import data from .xls file, for example

 RODBC 

. In the past, the problem was with the xlsx file, which was not supported yet.

Recently, I discovered that

 gdata 

can be used to import xlsx file now. So, this bypass the step that I normally have to save the excel file to text file and do the regular file import.

Here’s how:

library(gdata)
data <- read.xls(xls="myData.xlsx",sheet=1,header=TRUE, as.is=TRUE)
Advertisements

November 1, 2015

Transpose Table Sideway

Filed under: data management, R — Tags: , , — hypotheses @ 2:51 am

I’ve come across a problem needing to transpose to wide table into a long format. I’m not talking about the longitudinal data quite yet, the one where you have one individual getting multiple measurements over time.

The question then is get a lot simpler than having to manipulate longitudinal data, which you can do with

library(reshape)

in R. See:

 

?melt
?cast

 

 

Recently, the

library(data.table)

has come into my rescue. With fread function reading in large data frame (or data table) has become much faster. Therefore, base on the simple fread and write.table. here comes the transpose function. You can get the script from my short script transposeR.r github Genetics Library (which has just recently been updated).

Rscript transposeR.r data_1.txt data_2.txt

You can also use wildcard.

Rscript transposeR.r data_?.txt

I mostly tested it on mac, if your windows machine doesn’t play with ls command then, the script might not work with multiple file wildcard.

June 17, 2014

A new way to install Bioconductor

Filed under: R — Tags: , , — hypotheses @ 7:39 am

In the past, whenever we want to install a package on Bioconductor. The first thing that we have to do is to “source” the

source("http://bioconductor.org/biocLite.R")

The good news is that there is a new package “BiocInstaller”, which can help you install Bioconductor package.

Here’s an example to install a library that will help to work with NHGRI’s GWAS catalog

install.packages("BiocInstaller")
library(BiocInstaller)
biocLite("gwascat")

May 20, 2012

Obtaining R Object name

Filed under: Uncategorized — Tags: , — hypotheses @ 9:07 am

While programming in R, occasionally I wish I can just get the name of an object that I am working with and print out the name, or use the name as a name for the current plot that I am creating.

This is more convenient when creating a function that your user might not have an idea yet what the title should be. By default, the title name in most plot functions in R is created using “deparse(substitute(x))” function

mydata <- cbind(c(1:10),c(1:10))
plot(mydata,main=deparse(substitute(mydata)))
deparse(substitute(mydata))

Name the plot after the name of the data frame used to create this plot.

May 3, 2012

Getting help finding sourcecode in R

Filed under: Uncategorized — Tags: , — hypotheses @ 3:41 pm

(From:http://cran.r-project.org/doc/Rnews/Rnews_2006-4.pdf  page 43)

 It’s always a good idea to look under the hood and see how things work! Sometimes, that’s the only way to make sure that other people’s codes or programs work the way you really expect it to be, especially the free ones.

 

  1. The simplest case is typeing the function’s name without “()” following the function’s name
     

> matrix
function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
{
    if (is.object(data) || !is.atomic(data))
        data <- as.vector(data)
    .Internal(matrix(data, nrow, ncol, byrow, dimnames, missing(nrow),
        missing(ncol)))
}
<environment: namespace:base>
 

However, the comments are not included,  and it might still be better to really look at the sourcecode.
You can find the original sources after unpakcing the source package, in the directory “PackageName/R/”.
For R’s base pakcages, the R code is in$R_HOME/src/library/PakageName/R/.
 

  1. For codes hidden in a namespace,  type getAnywhere(“FunctionName”) to find out
     

> plot.factor
Error: object ‘plot.factor’ not found
> getAnywhere(plot.factor)
A single object matching ‘plot.factor’ was found
It was found in the following places
  registered S3 method for plot from namespace graphics
  namespace:graphics
with value

function (x, y, legend.text = NULL, …)
{


}

<environment: namespace:graphics>
 

It is possible

to ask for available methods with methods(print).

The function of interest is the S3 method print.lm()
 

> methods(print)
  [1] print.acf*                              
  [2] print.anova                             
  [3] print.lm
 

A method hidden in a namespace

can be accessed (and therefore printed) directly

using the ::: operator as in stats:::print.lm.

For S4 related sources, it is advisable to look at the package’s source files directly!

  1. For “COMPILED” code sources, these are always more problematic. Why you type the name of these functions, what you will see are something like these: .C(), .Call(), .Fortran(), .External(), or .Internal(), or .Primitive().
     

– The first step is to look up the entry point in file ‘$R_HOME/src/main/names.c’, if the calling R function is either .Primitive() or .Internal().
You will find something like this
{“rnorm”,        do_random2,        8,        11,        3,        {PP_FUNCALL, PREC_FN,        0}},
This tells you to find “do_random2” in the source files.
– Then, grep ‘do_random2″ *.* in the source directory should point you to the correct file to look at. In this case, the source is in “random.c”

 

Now it’s time to learn “c” in order to understand more about these sources. At least, you will have to know the program structure well enough to be able to read other people’s codes.
  

April 20, 2012

Understanding Random Number Generator in R

Filed under: Uncategorized — Tags: , — hypotheses @ 5:04 pm

“Mersenne-Twister”is default method for generating random number. The brief description is

 

“From Matsumoto and Nishimura (1998). A twisted GFSR with period 2^19937 – 1 and equidistribution in 623 consecutive dimensions (over the whole period). The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.”

 

At any given moment, “the current seed” is stored in “.Random.seed”. For a short example below, here is what the .Random.seed looks like.

> runif(5)
[1] 0.5214241 0.6072482 0.8581209 0.1057113 0.4943451
> length(.Random.seed)
[1] 626

# It is 626 long.

> head(.Random.seed)
  [1]         403           5   524275616  -891164866   839495241  2076756731  -695076150

# The first value is the type of random number generator, in this case “Mersenne-Twister”.

# The second value is the “current position” in the set

# The other 624 numbers stored in .Random.seed are what they call “seed”.

# So, if you simulated another 5 random numbers,

> runif(5)
[1] 0.9766740 0.9481603 0.1786681 0.4026092 0.7110552

# The current position changed from 5 (above) to 10 (see below). Notice that other 624 numbers in the set remained the same.

> head(.Random.seed)
[1] 403 10 619611805 -1461824745 -1054662018 -1340796360

# Once you simulate 624th random number, the index of current position will be 624.

 

 

> runif(1)
[1] 0.8074707
> head(.Random.seed)
[1] 403 624 619611805 -1461824745 -1054662018 -1340796360

# When you simulate one more random number, a new seed will then be used.

> runif(1)
[1] 0.6062263
> head(.Random.seed)
[1] 403 1 -1016206940 813281659 -1786346428 -363208600

# So, any given moment, you can save the current seed that will be used for the next simulation by

 

> my.seed <- .Random.seed
> runif(10)
 [1] 0.8797226 0.2143323 0.6826555 0.4366940 0.4330555 0.5745667 0.7406138 0.4646236 0.8014502 0.9818474

# This saved the current seed to be used for the next simulation in “my.seed”

# You can then restore the current seed and reproduce the above simulated numbers by

> .Random.seed <- t(my.seed)
> runif(10)
[1] 0.8797226 0.2143323 0.6826555 0.4366940 0.4330555 0.5745667 0.7406138 0.4646236 0.8014502 0.9818474

 

# See that the runif(10) here gave the same numbers as the one above.

May 19, 2010

Personal R library on the cluster

Filed under: R — Tags: , — hypotheses @ 4:50 pm

I’m trying to set up my own personal R library on our HPCCC. The current configuration does not allow me to use .Renviron to specify the location of my library as the home directory on the compute node is set up to be different from the home directory on the frontend node [side note: I wish our HPCCC admin is smarter than what he is now, and set this up correctly, so that we won’t have all these troubles again and again.

So, the only solution I have is to install my personal library on the frontend node. Then, add the path to this folder in all my R script that I want to run.
This can be done by first : Find out what is the full path of my personal R library on the frontend node

.libPaths()
[1] "/users/bhoom/Rlibs"
[2] "/usr/bin/lib/Rlib/" # <- this is the default of my HPCCC

I then have to combine the output from the above command with what exist on the system already.

.libPaths(c(.libPaths(),"/users/bhoom/Rlibs"))

This way, on the compute node, my personal R library can still be located.

Ref http://www.biostat.jhsph.edu/bit/R-personal-library.html

April 30, 2010

Image processing in R — Convert EPS to PDF and others

Filed under: bash, genetics, R — Tags: , , , — hypotheses @ 1:53 am

I do a lot of plots and graphs in R, and I found that most fonts will look disproportionate if I originally save the graphic output from R using a function such as png(). So the solution I have till now is to save all of my plots using


postscript("file",paper="letter")
#as I normally want the plot to be in the size of most paper used to print here in the US
#by default the paper size is "A4" though.

Then, the next problem seems to be how do I convert these EPS files into PDF or other format.On Linux, so far my solution seems to be using ImageMagick,

convert

. However, on OsX, if you don’t want to install ImageMagick, they already have a command built-in.

Through a series of these command, you will convert your file into a subfolder with the appropriate extension.


#echo *.eps | xargs -n1 pstopdf && mkdir png; sips -s format png *.pdf --out png/
<pre>echo *.eps | xargs -n1 pstopdf && mkdir png; sips -s format png *.pdf --out png</pre>
# "YOU NEED TO GET RID OF THE / AT THE END TO AVOID ERROR ABOUT DIRECTORY NOT FOUND"

Although you will get a warning that the extension has been change to png, look inside “png” directory, and you will find all your files inside neatly converted to the format you want. You can use

sips

to convert to jpeg or other format as well.

Blog at WordPress.com.