Biological Sequence Analysis (1)

NHGRI started a series of lectures on Current Topics in Genome Analysis 2012 two weeks ago. For more info you can find out at  Youtube videos are also available for you to watch. This week’s lecture is about “Biological Sequence Analysis” by Andy Baxevanis.   My notes of the talk are summarized here.  The main topic of the talk involves biological sequence alignment and alignment tools and algorithms, including BLAST.  This is a pretty good lecture if you have been away from BLAST for a while and a good introduction for people who are new to genetics.

As a rule of thumb, and a general idea of what you should remember. When you are doing local sequence alignment, you will have to encounter with several matrix of scoring the sequence similarity.

–          Several alignments scoring matrix exists, e.g. PAM46, BLOSUM62. The number following the scoring matrix name is how the two sequence similarity should be “at most”. To look for more distantly related sequence, use the scoring matrix with lower number.

–          Gap: local alignment should allow at least 1 in every 20 basepair.

–          The return results from BLAST are those results that passed the scoring threshold. This doesn’t imply significant level. Some of these results, however, are considered statistically significant.

–          To assess the biological significance, “Karlin-Altschul Equation”, a normalized probability, as a function of # of letters in the query, # of letters in the database, and the size of search space. This “E-value” represents the number of false positive, and you want this to be as low as possible.

  • Look for E < 10E-6 for nucleotide BLAST
  • Look for E < 10E-3 for protein BLAST

–          As a reference for human genome RefSeq is a good starting place for BLAST.  RefSeq provides a single reference sequence for each molecule of the central dogma (DNA, mRNA, protein).  The database is non-redundant, updated to reflect the current knowledge of sequence data and biology, and is being curated.

–          Options to consider changing

  • Expected threshold: change the E-value as suggested above.
  • Matrix: change this to reflect how similar of the sequence you want to find.
  • Filter: Always filter out region with low complexity, e.g. homopolymeric region. These regions can confound the significant level of the results. (more false positive)

–          Identities: For protein based search, look for at least 25% identity. For nucleotide, look for sequence with at least 75% identity!

–         BLAT is the tool for finding location of an unknown sequence, or gene, e.g. exon, intron, promoter or unknown region in the genome.  BLAT: Blast Like Alignment Tool, much faster than BLAST, can find exact match of sequence down to L=33.  When looking for sequence fragments or unknown genes, BLAT is a good tool to start looking for location of these sequences in the genome. BLAT is available on UCSC Genome Browser.


Error running SAS9.2 in Windows7 – non-adminstrator user

SAS 9.2 error in windows 7 – non administrator user

I recently installed SAS 9.2 on a Windows 7 machine. I used administrator account to install SAS, believing that I chose the option that allow everyone on the computer to use it (if such option exists). However, when I tried to run SAS using the regular user account (my everyday use account), I did get an error message that

“User does not have appropriate authorization level for library SASUSER”

I tried to look for solution on the internet and landed on Larry’s thing page. Larry wrote about a problem with SAS 9.2 when running in non-administrator mode. You can read the original post here.

“So basically the problem is that the sasv9.cfg file has the “MYSASFILES” and SASUSER variable set to the administrator account folders during the installation, not to some generally accessible location.   The easy fix is to create a folder , i.e. c:\sas , make it fully writable by your users, then modify the sasv9.cfg file (located: “C:\Program Files\SAS9_2\SASFoundation\9.2\nls\en\SASV9.CFG” )

Look for MYSASFILES and SASUSER and change it to c:\sas”
The problem for me is that there is no “c:\sas”. So, I’m guessing that it might be a problem with other user specific folder. When I looked inside C:\Users\[ADMIN]\Documents. I saw the extra folder “My SAS Files(32)”, where as under my C:\Users\Bhoom\Documents, there was no such folder.

The fix I did was to copy the whole \My SAS Files(32) from the Admin account to my local account. This seems to fix the problem with SAS, and it started up normally without any additional problem.