blastn Parameters sets


Parameter sets for blastn, with notes. Please refer to Posting 4 ("blastn Parameters for noncoding queries")

What follows are a number of parameter sets that I have explored (primarily for spliceosomal snRNAs). However, most of these can no longer be used at NCBI. As of the end of 2005, NCBI only allows certain very limited parameters to be used ( Blast news 2005/12/06 ). The trade-off is that the E values are more meaningful.

The relevant options under the new constraints are probably these:

-r 5 -q -4 -G 10 -E 6 and -r 5 -q -4 -G 8 -E 6, which are identical (or similar to) F

and

-r 5 -q -4 -G 25 -E 10, which serves the purpose of A (for alignments with few gaps)

If those parameters yield too many hits (a risk especially with high-AT queries), try these:

-r 4 -q -5 -G 6 -E 5 and -r 4 -q -5 -G 12 -E 8, which differ only in gap penalties (see E and H below).

A -r 10 -q -11 -W 8 -G 100 -E 50

B -r 10 -q -11 -W 7 -G 5 -E 20

C -r 10 -q -11 -W 7 -G 15 -E 4

D -r 7 -q -14 -W 7 -G 7 -E 3

E -r 4 -q -5 -W 8 -G 10 -E 2

F -r 5 -q -4 -W 7 -G 10 -E 6

G -r 8 -q -5 -W 7 -G 20 -E 5

H -r 7 -q -8 -W 7 -G 15 -E 2

I -r 8 -q -7 -W 7 -G 16 -E 2

N -r 1 -q -3 -W 11 -G 5 -E 2


Notes (parameters are listed above, comments are provided below):

A Use these if you are sure that there will be no gaps in your alignment

B Use these to permit only very short gaps.

C Standard snRNA parameters (similar to G and H)

D Very low gap penalties, but high mismatch penalty

E

F my recommendation (start here) . When the Arabidopsis genome is searched with human U12 snRNA using these parameters, the correct top hit is found, and the ratio between the E value of the true hit and of the top false hit is approximately 300).

G Chau's initial set. These are promiscuous (extreme r/q). For example, when the Arabidopsis genome is searched with human U12 snRNA using these parameters, the correct top hit is found, but many random hits score nearly as well (The ratio between the E value of the true hit and of the top false hit is only 10).

H Chau's "best" set. These are also promiscuous (low G and especially low E). When the Arabidopsis genome is searched with human U12 snRNA using these parameters, the correct top hit is again found, but many random hits score nearly as well (The ratio between the E value of the true hit and of the top false hit is only 10).

I Very similar to H. I have recommended these defaults to a lot of people over the years. However, as of the end of 2005 NCBI only allows certain very limited parameters to be used (Blast news 2005/12/06) and these parameters cannot be used there (but try F). The trade-off is that the E values are more meaningful.

N NCBI defaults. You will never find noncoding RNA matches using these parameters.

Good luck! If you have experience that bears on this, or can cite relevant literature, please let me know and I'll update this posting.


Email comments SteveMount.com Science home SteveMount.org

Copyright 2005 Steve Mount. This is Posting 5, which refers to Posting 4
Please cite http://www.SteveMount.com/Posting0004.html (posting 4, which links to this one).
This posting may be revised in the future but this version will remain available at http://www.SteveMount.com/Posting0005.04.html