TIP: Click on subject to list as thread! ANSI
echo: evolution
to: All
from: Thomas Isenbarger
date: 2004-12-09 05:57:00
subject: cluster/distance for unre

I want to use some sort of clustering method (multidimensional scaling 
or stochastic proximity embedding, for instance) to group nucleotide 
sequences into clouds of similar sequences on a 2-D plot.  These methods 
require a "dissimilarity matrix", which as far as I can tell is the same 
as a distance matrix (high scores mean less similarity).

I have a set of 700+ sequences that I want to group this way, but the 
set:

1.  contains some homologous groups, but
2.  these groups are unrelated, and
3.  the sequences are of different lengths

If the sequences were related and could be trimmed to the same length, I 
would do an alignment and then use phylip to create a distance matrix, 
but since my sequences are unrelated and cannot really be trimmed to the 
same length, I am at a loss for what to do.

For a set with so many unrelated sequences of different lengths, the 
only thing I have been able to think of is an all-against-all BLAST to 
create a score matrix using the normalised bits score, but this gives 
high scores for similarities.  From there, the only thought I had was to 
use the reciprocal of the BLAST score as some perverse measure of 
distance.

Any ideas?

please email to isen AT plantpath DOT wisc DOT edu

Cheers,
Tom Isenbarger
---
þ RIMEGate(tm)/RGXPost V1.14 at BBSWORLD * Info{at}bbsworld.com

---
 * RIMEGate(tm)V10.2áÿ* RelayNet(tm) NNTP Gateway * MoonDog BBS
 * RgateImp.MoonDog.BBS at 12/9/04 5:57:53 AM
* Origin: MoonDog BBS, Brooklyn,NY, 718 692-2498, 1:278/230 (1:278/230)
SEEN-BY: 633/267 270 5030/786
@PATH: 278/230 10/345 106/1 2000 633/267

SOURCE: echomail via fidonet.ozzmosis.com

Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.