Print from here to there with “awk”

This does sound like a common thing to do.  You have a length text file that you only want to get some part of it. For example, I have a file that contain a structure like this



Here, the part I want to grab is between the line with “++++++++++++++” and the blank line.

awk '/\+\+/,/^$/' INFILE

With this small awk trick, you request that awk  print the +++ line to the blank line to your terminal.

Now, you just have to remove the +++++++++ and the blank line. I do this with “Stream EDitor” i.e. sed. So the complete lines become something like this…

awk '/\+\+/,/^$/' INFILE | sed '/\+\+/d;/^$/d'

This can really be applied to extract some part of file with tags such as “XML” file. However, it is probably the a very efficient way to parse XML file manually one tag at a time. In R, you can do this more efficiently, using RSXML []. And, if you are interacting with a website, you can easily combining it with RCurl []


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.