As the doctor gone rogue

September 29, 2011

Print from here to there with “awk”

Filed under: bash, data management, R — hypotheses @ 12:16 pm

This does sound like a common thing to do.  You have a length text file that you only want to get some part of it. For example, I have a file that contain a structure like this

HEADER
BODY
++++++++++++++++++++++++++++
CONTENT I WANT TO GET

END OF FILE

Here, the part I want to grab is between the line with “++++++++++++++” and the blank line.


awk '/\+\+/,/^$/' INFILE

With this small awk trick, you request that awk  print the +++ line to the blank line to your terminal.

Now, you just have to remove the +++++++++ and the blank line. I do this with “Stream EDitor” i.e. sed. So the complete lines become something like this…


awk '/\+\+/,/^$/' INFILE | sed '/\+\+/d;/^$/d'

This can really be applied to extract some part of file with tags such as “XML” file. However, it is probably the a very efficient way to parse XML file manually one tag at a time. In R, you can do this more efficiently, using RSXML [http://www.omegahat.org/RSXML/]. And, if you are interacting with a website, you can easily combining it with RCurl [http://www.omegahat.org/RCurl/]

September 16, 2011

waiting….

Filed under: bash — hypotheses @ 1:06 am

I recently have been working on several projects that involve waiting for a file to be created first before I can proceed to the next step. One option for doing this is to monitor a process ID of the submitted script and see if it finished or not. However, the previous job might finished with an error and the correct output file might not have been created. So, I came up with a solution to create a file that would contain the output that I can check that the previous job ran correctly. Let’s call this file “JOB.WELLDONE”.

Then, I will have a script running in background to monitor if the “JOB.WELLDONE” exists. Then continue doing what I plan to do.


waiting() {
FILE=$1
TIME=$2
COUNT=0
while [[ ! -e $FILE ]]; do
echo -n -e "\r $COUNT : Waiting for $FILE"
sleep $TIME;
let "COUNT=COUNT+1"
done
echo "$FILE found. Proceed!!! "
}

Then, I include waiting function in the top of all these shell script. Or a better option will be making this a separate command in your ~/bin directory so you can call it again and again.

Here’s an example.

#!/bin/sh
waiting() {
FILE=$1
TIME=$2
COUNT=0
while [[ ! -e $FILE ]]; do
echo -n -e "\r $COUNT : Waiting for $FILE"
sleep $TIME;
let "COUNT=COUNT+1"
done
echo "$FILE found. Proceed!!! "
}

## Wait for JOB.WELLDONE and recheck every 60 seconds

waiting JOB.WELLDONE 60

CONTINUE WITH WHAT YOU WANT TO DO

Blog at WordPress.com.