[Re-blog] Set proxy and unset proxy for RStudio

If you want to set proxy server using RStudio, you might google and land on RStudio support site. However, the obvious solution might not be clear, so I have blogged about it almost 10 years ago.

Today, the solution still comes in handy.

So, to unset the proxy this time, simply do the following!


Import data from Excel to R using gdata library

There are several ways to import data into R.

The standard way, what it used to be, is from a text file using read.table() function.

For excel files, the most famous spreadsheet software on the world, several libraries can be used to import data from .xls file, for example


. In the past, the problem was with the xlsx file, which was not supported yet.

Recently, I discovered that


can be used to import xlsx file now. So, this bypass the step that I normally have to save the excel file to text file and do the regular file import.

Here’s how:

data <- read.xls(xls="myData.xlsx",sheet=1,header=TRUE, as.is=TRUE)

Transpose Table Sideway

I’ve come across a problem needing to transpose to wide table into a long format. I’m not talking about the longitudinal data quite yet, the one where you have one individual getting multiple measurements over time.

The question then is get a lot simpler than having to manipulate longitudinal data, which you can do with


in R. See:





Recently, the


has come into my rescue. With fread function reading in large data frame (or data table) has become much faster. Therefore, base on the simple fread and write.table. here comes the transpose function. You can get the script from my short script transposeR.r github Genetics Library (which has just recently been updated).

Rscript transposeR.r data_1.txt data_2.txt

You can also use wildcard.

Rscript transposeR.r data_?.txt

I mostly tested it on mac, if your windows machine doesn’t play with ls command then, the script might not work with multiple file wildcard.

A new way to install Bioconductor

In the past, whenever we want to install a package on Bioconductor. The first thing that we have to do is to “source” the


The good news is that there is a new package “BiocInstaller”, which can help you install Bioconductor package.

Here’s an example to install a library that will help to work with NHGRI’s GWAS catalog


Obtaining R Object name

While programming in R, occasionally I wish I can just get the name of an object that I am working with and print out the name, or use the name as a name for the current plot that I am creating.

This is more convenient when creating a function that your user might not have an idea yet what the title should be. By default, the title name in most plot functions in R is created using “deparse(substitute(x))” function

mydata <- cbind(c(1:10),c(1:10))

Name the plot after the name of the data frame used to create this plot.

Getting help finding sourcecode in R

(From:http://cran.r-project.org/doc/Rnews/Rnews_2006-4.pdf  page 43)

 It’s always a good idea to look under the hood and see how things work! Sometimes, that’s the only way to make sure that other people’s codes or programs work the way you really expect it to be, especially the free ones.


  1. The simplest case is typeing the function’s name without “()” following the function’s name

> matrix
function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
    if (is.object(data) || !is.atomic(data))
        data <- as.vector(data)
    .Internal(matrix(data, nrow, ncol, byrow, dimnames, missing(nrow),
<environment: namespace:base>

However, the comments are not included,  and it might still be better to really look at the sourcecode.
You can find the original sources after unpakcing the source package, in the directory “PackageName/R/”.
For R’s base pakcages, the R code is in$R_HOME/src/library/PakageName/R/.

  1. For codes hidden in a namespace,  type getAnywhere(“FunctionName”) to find out

> plot.factor
Error: object ‘plot.factor’ not found
> getAnywhere(plot.factor)
A single object matching ‘plot.factor’ was found
It was found in the following places
  registered S3 method for plot from namespace graphics
with value

function (x, y, legend.text = NULL, …)


<environment: namespace:graphics>

It is possible

to ask for available methods with methods(print).

The function of interest is the S3 method print.lm()

> methods(print)
  [1] print.acf*                              
  [2] print.anova                             
  [3] print.lm

A method hidden in a namespace

can be accessed (and therefore printed) directly

using the ::: operator as in stats:::print.lm.

For S4 related sources, it is advisable to look at the package’s source files directly!

  1. For “COMPILED” code sources, these are always more problematic. Why you type the name of these functions, what you will see are something like these: .C(), .Call(), .Fortran(), .External(), or .Internal(), or .Primitive().

– The first step is to look up the entry point in file ‘$R_HOME/src/main/names.c’, if the calling R function is either .Primitive() or .Internal().
You will find something like this
{“rnorm”,        do_random2,        8,        11,        3,        {PP_FUNCALL, PREC_FN,        0}},
This tells you to find “do_random2” in the source files.
– Then, grep ‘do_random2″ *.* in the source directory should point you to the correct file to look at. In this case, the source is in “random.c”


Now it’s time to learn “c” in order to understand more about these sources. At least, you will have to know the program structure well enough to be able to read other people’s codes.

Understanding Random Number Generator in R

“Mersenne-Twister”is default method for generating random number. The brief description is


“From Matsumoto and Nishimura (1998). A twisted GFSR with period 2^19937 – 1 and equidistribution in 623 consecutive dimensions (over the whole period). The ‘seed’ is a 624-dimensional set of 32-bit integers plus a current position in that set.”


At any given moment, “the current seed” is stored in “.Random.seed”. For a short example below, here is what the .Random.seed looks like.

> runif(5)
[1] 0.5214241 0.6072482 0.8581209 0.1057113 0.4943451
> length(.Random.seed)
[1] 626

# It is 626 long.

> head(.Random.seed)
  [1]         403           5   524275616  -891164866   839495241  2076756731  -695076150

# The first value is the type of random number generator, in this case “Mersenne-Twister”.

# The second value is the “current position” in the set

# The other 624 numbers stored in .Random.seed are what they call “seed”.

# So, if you simulated another 5 random numbers,

> runif(5)
[1] 0.9766740 0.9481603 0.1786681 0.4026092 0.7110552

# The current position changed from 5 (above) to 10 (see below). Notice that other 624 numbers in the set remained the same.

> head(.Random.seed)
[1] 403 10 619611805 -1461824745 -1054662018 -1340796360

# Once you simulate 624th random number, the index of current position will be 624.



> runif(1)
[1] 0.8074707
> head(.Random.seed)
[1] 403 624 619611805 -1461824745 -1054662018 -1340796360

# When you simulate one more random number, a new seed will then be used.

> runif(1)
[1] 0.6062263
> head(.Random.seed)
[1] 403 1 -1016206940 813281659 -1786346428 -363208600

# So, any given moment, you can save the current seed that will be used for the next simulation by


> my.seed <- .Random.seed
> runif(10)
 [1] 0.8797226 0.2143323 0.6826555 0.4366940 0.4330555 0.5745667 0.7406138 0.4646236 0.8014502 0.9818474

# This saved the current seed to be used for the next simulation in “my.seed”

# You can then restore the current seed and reproduce the above simulated numbers by

> .Random.seed <- t(my.seed)
> runif(10)
[1] 0.8797226 0.2143323 0.6826555 0.4366940 0.4330555 0.5745667 0.7406138 0.4646236 0.8014502 0.9818474


# See that the runif(10) here gave the same numbers as the one above.