As the doctor gone rogue

December 7, 2011

Using R with Excel

Filed under: excel, R — hypotheses @ 3:53 pm

Quite often that I get a dataset sent to me in an Excel format, that I want to use R to perform additional analysis afterward.  My solution is always exporting the excel file to CSV format (to ease some of the problem with missing cell value). I recently discovered a new solution to bypass this step and read-in the data from Excel to R directly.

With the help of “RODBC” package, reading the data from excel spreadsheet seems quite simple. I believe, though, that this is still limited to “xls” file format, and not the newer “xlsx” file format.

1. You need to establish a connection channel between R and your data base file “DBF”


> ch <- odbcConnectExcel("datafile.xls")

If you are using the windows GUI for R, you can ask for a dialog box to enter the file directly using


> ch <- odbcConnectExcel()

2. You can then view the spreadsheets within the file using


> sqlTables(ch)

3. To fetch the data into R bypassing “read.table”


> excelData <- sqlFetch(ch,"spreadsheet",as.is=TRUE) ## avoid string2factor conversion

> excelData_withspace <- sqlFetch(ch,'spreadsheet name$' ) ## notice the additional $ sign and single quote


If the spreadsheet name contains a space, you will need to enclose the name within a single quote.

4. Once you finish working on the spreadsheet, you should close the connection.


> close(ch)

 

Advertisements

August 17, 2011

Convert Column to Table in Excel

Filed under: data management, excel — hypotheses @ 1:47 pm

Copy and paste is pretty simple and easy to do, but it is sometimes more error prone than writing a function to do it (that assumes the function written does what it is supposed to do).

I recently extracted a column of data that I want to convert into a table (matrix) for comparison across replicates. In STATA or R, this is the task called reshape. Since I have exported the data into our favorite spreadsheet software, I would like to do this within Excel. A bit of googling landed me to this website http://www.cpearson.com/excel/ColumnToTable.aspx, the article too long and the description too short. But at least it pointed me to “OFFSET”.

What you need is this formula

=OFFSET(REF_CELL, p*COL_AWAY_FROM_REFERENCE+ROW_AWAY_FROM_REFERENCE,0,1,1)

1. Reference Cell

2. How may rows your target cell is from 1): p*COL_AWAY_FROM_REFERENCE+ROW_AWAY_FROM_REFERENCE

3 How many columns your target cell is from 1): 0

4. How wide is the target region from the target cell.: width,height = 1,1

In this case you want to transpose a column with dimension 1xN to nxp (this include the first p rows of the original column data as the first column of the final nxp table).

by pasting this formula to column next to the original data, you will get the 2nd column containing row “p+1” from the original data column.

For example, your original data is from $B$2:$B$100, containing item 1 to 9 (i.e. p =9, of 11 repeated measurements). You want to cut row 11-19 and put it right next to B2:B10. Instead, put this =OFFSET($B$2,9*(COLUMN()-COLUMN($B$2))+(ROW()-ROW($B$2)),0,1,1) in cell C2, then copy it to C2:L2, then copy it cover C10:L10. This will give you a 9 row by 11 column table.

Blog at WordPress.com.