As the doctor gone rogue

January 31, 2017

Import data from Excel to R using gdata library

Filed under: data management, R — Tags: , , , , — hypotheses @ 9:00 pm

There are several ways to import data into R.

The standard way, what it used to be, is from a text file using read.table() function.

For excel files, the most famous spreadsheet software on the world, several libraries can be used to import data from .xls file, for example


. In the past, the problem was with the xlsx file, which was not supported yet.

Recently, I discovered that


can be used to import xlsx file now. So, this bypass the step that I normally have to save the excel file to text file and do the regular file import.

Here’s how:

data <- read.xls(xls="myData.xlsx",sheet=1,header=TRUE,

November 27, 2015

Failed to set default locale

Filed under: R — Tags: , — hypotheses @ 12:00 am

If Rstudio complains about failure to set default locale,

try this

$ defaults write org.R-project.R force.LANG=en_US.UTF-8

November 1, 2015

Transpose Table Sideway

Filed under: data management, R — Tags: , , — hypotheses @ 2:51 am

I’ve come across a problem needing to transpose to wide table into a long format. I’m not talking about the longitudinal data quite yet, the one where you have one individual getting multiple measurements over time.

The question then is get a lot simpler than having to manipulate longitudinal data, which you can do with


in R. See:





Recently, the


has come into my rescue. With fread function reading in large data frame (or data table) has become much faster. Therefore, base on the simple fread and write.table. here comes the transpose function. You can get the script from my short script transposeR.r github Genetics Library (which has just recently been updated).

Rscript transposeR.r data_1.txt data_2.txt

You can also use wildcard.

Rscript transposeR.r data_?.txt

I mostly tested it on mac, if your windows machine doesn’t play with ls command then, the script might not work with multiple file wildcard.

June 17, 2014

A new way to install Bioconductor

Filed under: R — Tags: , , — hypotheses @ 7:39 am

In the past, whenever we want to install a package on Bioconductor. The first thing that we have to do is to “source” the


The good news is that there is a new package “BiocInstaller”, which can help you install Bioconductor package.

Here’s an example to install a library that will help to work with NHGRI’s GWAS catalog


September 19, 2013

Sourcing R script over HTTPS – Stack Overflow

Filed under: R — hypotheses @ 12:44 pm

You can use


to source your script from online resource.
However, source doesn’t work if you have https link,

Here’s some solutions I found on stack overflow.

eval(expr = parse( text = getURL"", ssl.verifypeer=FALSE))

via Sourcing R script over HTTPS – Stack Overflow.

May 27, 2013

Configuring R to Use an HTTP Proxy / FAQ / Knowledge Base – RStudio Support

Filed under: R — hypotheses @ 1:11 am

Feel like I’ve done this a million time since I came back to work in Thailand. This should work for R user not using R-studio as well. Configuring R to Use an HTTP Proxy / FAQ / Knowledge Base – RStudio Support. In brief, you can use the command template below in each of the R session.


I found that this is more convenient for me who carry my laptop around. And the only place that won’t let me work at ease is when I am at the office. So, the work around will need to be used occasionally.

If you would like to check your current proxy setting, you can do


*Note*: Please note the capital “S” at the beginning of the command follows by all small letters afterwards.

April 5, 2012

R FAQ: How can I format a string containing a date into R “Date” object??

Filed under: data management, R — hypotheses @ 3:39 pm

This sounds like a problem that statisticians occasionally have to deal with. It is quite simple, just convert the string to date or time if you remember what command to use. If your date variable has a value like this 2007-05-31 for May 31, 2007. You can simply use the example below.


For more example on date, I refer you to UCLA R-help page:

R FAQ: How can I format a string containing a date into R “Date” object??.

For a complete list of date-time formatting conversion see this


December 7, 2011

Using R with Excel

Filed under: excel, R — hypotheses @ 3:53 pm

Quite often that I get a dataset sent to me in an Excel format, that I want to use R to perform additional analysis afterward.  My solution is always exporting the excel file to CSV format (to ease some of the problem with missing cell value). I recently discovered a new solution to bypass this step and read-in the data from Excel to R directly.

With the help of “RODBC” package, reading the data from excel spreadsheet seems quite simple. I believe, though, that this is still limited to “xls” file format, and not the newer “xlsx” file format.

1. You need to establish a connection channel between R and your data base file “DBF”

> ch <- odbcConnectExcel("datafile.xls")

If you are using the windows GUI for R, you can ask for a dialog box to enter the file directly using

> ch <- odbcConnectExcel()

2. You can then view the spreadsheets within the file using

> sqlTables(ch)

3. To fetch the data into R bypassing “read.table”

> excelData <- sqlFetch(ch,"spreadsheet", ## avoid string2factor conversion

> excelData_withspace <- sqlFetch(ch,'spreadsheet name$' ) ## notice the additional $ sign and single quote

If the spreadsheet name contains a space, you will need to enclose the name within a single quote.

4. Once you finish working on the spreadsheet, you should close the connection.

> close(ch)


September 29, 2011

Print from here to there with “awk”

Filed under: bash, data management, R — hypotheses @ 12:16 pm

This does sound like a common thing to do.  You have a length text file that you only want to get some part of it. For example, I have a file that contain a structure like this



Here, the part I want to grab is between the line with “++++++++++++++” and the blank line.

awk '/\+\+/,/^$/' INFILE

With this small awk trick, you request that awk  print the +++ line to the blank line to your terminal.

Now, you just have to remove the +++++++++ and the blank line. I do this with “Stream EDitor” i.e. sed. So the complete lines become something like this…

awk '/\+\+/,/^$/' INFILE | sed '/\+\+/d;/^$/d'

This can really be applied to extract some part of file with tags such as “XML” file. However, it is probably the a very efficient way to parse XML file manually one tag at a time. In R, you can do this more efficiently, using RSXML []. And, if you are interacting with a website, you can easily combining it with RCurl []

February 28, 2011

Making R “BEEP”

Filed under: R, STATA — hypotheses @ 5:00 pm

I started noticing the beep after an interactive command finish in STATA (I think this is only a Mac specific feature). In STATA, you get a “beep” when the command finish running and you are working in other programs. Some commands take some times to finish, and it is nice to have that feature.

In R, if you want it to beep, some options are “alarm() function”, which is pretty much “cat(“\a”)” . You can make R tweet, too (see

I think I’d rather go with something simple like cat(“\a”).
But, there are lots of comments about “\a” not working unless you have a speaker turned on. So, make sure your computer has a speaker, though.

If you are running a script in Linux, you can use a command to send mail at the end of your R script — something like

system(‘mail -s “Job finished” < logfile’)

[I haven’t tested it out though]… Do you have any other ideas?

Older Posts »

Blog at