As the doctor gone rogue

January 31, 2017

Import data from Excel to R using gdata library

Filed under: data management, R — Tags: , , , , — hypotheses @ 9:00 pm

There are several ways to import data into R.

The standard way, what it used to be, is from a text file using read.table() function.

For excel files, the most famous spreadsheet software on the world, several libraries can be used to import data from .xls file, for example

 RODBC 

. In the past, the problem was with the xlsx file, which was not supported yet.

Recently, I discovered that

 gdata 

can be used to import xlsx file now. So, this bypass the step that I normally have to save the excel file to text file and do the regular file import.

Here’s how:

library(gdata)
data <- read.xls(xls="myData.xlsx",sheet=1,header=TRUE, as.is=TRUE)
Advertisements

January 22, 2017

docker-machine connection error

Filed under: SysAdmin — hypotheses @ 6:47 pm

Life is not that simple. Even after I figured out how to create a docker machine to limit the disk usage, cpu, and memory through docker-machine running on Ubuntu host. However, there seems to be a problem that prevent the host to connect directly to the guest docker-machine.


bhoom@mg0:~$ docker-machine create -d virtualbox --virtualbox-disk-size "100000" --virtualbox-memory "32000" --virtualbox-cpu-count "16" fireDock0
Running pre-create checks...
Creating machine...
(fireDock0) Copying /home/bhoom/.docker/machine/cache/boot2docker.iso to /home/bhoom/.docker/machine/machines/fireDock0/boot2docker.iso...
(fireDock0) Creating VirtualBox VM...
(fireDock0) Creating SSH key...
(fireDock0) Starting the VM...
(fireDock0) Check network to re-create if needed...
(fireDock0) Waiting for an IP...
Waiting for machine to be running, this may take a few minutes...
Detecting operating system of created instance...
Waiting for SSH to be available...
Detecting the provisioner...
Provisioning with boot2docker...
Copying certs to the local machine directory...
Copying certs to the remote machine...
Setting Docker configuration on the remote daemon...

This machine has been allocated an IP address, but Docker Machine could not
reach it successfully.

SSH for the machine should still work, but connecting to exposed ports, such as
the Docker daemon port (usually &lt;ip&gt;:2376), may not work properly.

You may need to add the route manually, or use another related workaround.

This could be due to a VPN, proxy, or host file configuration issue.

You also might want to clear any VirtualBox host only interfaces you are not using.
Checking connection to Docker...
Error creating machine: Error checking the host: Error checking and/or regenerating the certs: There was an error validating certificates for host "192.168.99.100:2376": dial tcp 192.168.99.100:2376: i/o timeout
You can attempt to regenerate them using 'docker-machine regenerate-certs [name]'.
Be advised that this will trigger a Docker daemon restart which will stop running containers.

If this is your case, and you have proxy server setup for your general internet connection, try


export http_proxy=""

if you are lucky, you should be able to connect to your docker-machine (locally). Otherwise, life is not that simple.

Running analysis in docker container

Filed under: bash, docker — Tags: , , , — hypotheses @ 6:41 pm

There are generally 4 steps to create an analytics environment on your server that will be separate from the rest of the system. By running your analysis within a container, it might reduce the risk of crashing the server because you might have used up all the resouce and cause the server to freeze up.

  1. Install virtualbox (to create a docker machine)

  2. Install docker-machine

  3. Create a docker-machine this will be the machine to run your container

  4. Map your command to run in the container. Following the kaggle/python tutorial.

Install virtualbox-qt

sudo apt-get install virtualbox-qt

Install docker-machine

curl -L https://github.com/docker/machine/releases/download/v0.8.0/docker-machine-`uname -s-uname -m` > /usr/local/bin/docker-machine && \ chmod +x /usr/local/bin/docker-machine

create a docker-machine

This will create a separate docker machine called docker2

docker-machine create -d virtualbox --virtualbox-disk-size "100000" --virtualbox-cpu-count "8" --virtualbox-memory "32092" docker2
docker-machine start docker2

We then need to specify a new destination where docker container will run, i.e. on docker2

eval $(docker-machine env docker2)

See <https://docs.docker.com/machine/install-machine/&gt; for more info

run kaggle/python

You’re now at a point where you can run stuff in the container. Here’s an extra step that will make it super easy: put these lines in your .bashrc file (or the Windows equivalent)

kpython(){
docker run -v $PWD:/tmp/working -w=/tmp/working --rm -it kaggle/python python "$@"
}
ikpython() {
docker run -v $PWD:/tmp/working -w=/tmp/working --rm -it kaggle/python ipython
}
kjupyter() {
(sleep 3 && open "http://$(docker-machine ip docker2):8888")&
docker run -v $PWD:/tmp/working -w=/tmp/working -p 8888:8888 --rm -it kaggle/python jupyter notebook --no-browser --ip="*" --notebook-dir=/tmp/working
}

Reference

<http://blog.kaggle.com/2016/02/05/how-to-get-started-with-data-science-in-containers/&gt;

Create a free website or blog at WordPress.com.