| TIP: Click on subject to list as thread! | ANSI |
| echo: | |
|---|---|
| to: | |
| from: | |
| date: | |
| subject: | cluster/distance for unre |
I want to use some sort of clustering method (multidimensional scaling
or stochastic proximity embedding, for instance) to group nucleotide
sequences into clouds of similar sequences on a 2-D plot. These methods
require a "dissimilarity matrix", which as far as I can tell is the same
as a distance matrix (high scores mean less similarity).
I have a set of 700+ sequences that I want to group this way, but the
set:
1. contains some homologous groups, but
2. these groups are unrelated, and
3. the sequences are of different lengths
If the sequences were related and could be trimmed to the same length, I
would do an alignment and then use phylip to create a distance matrix,
but since my sequences are unrelated and cannot really be trimmed to the
same length, I am at a loss for what to do.
For a set with so many unrelated sequences of different lengths, the
only thing I have been able to think of is an all-against-all BLAST to
create a score matrix using the normalised bits score, but this gives
high scores for similarities. From there, the only thought I had was to
use the reciprocal of the BLAST score as some perverse measure of
distance.
Any ideas?
please email to isen AT plantpath DOT wisc DOT edu
Cheers,
Tom Isenbarger
---
þ RIMEGate(tm)/RGXPost V1.14 at BBSWORLD * Info{at}bbsworld.com
---
* RIMEGate(tm)V10.2áÿ* RelayNet(tm) NNTP Gateway * MoonDog BBS
* RgateImp.MoonDog.BBS at 12/9/04 5:57:53 AM
* Origin: MoonDog BBS, Brooklyn,NY, 718 692-2498, 1:278/230 (1:278/230)SEEN-BY: 633/267 270 5030/786 @PATH: 278/230 10/345 106/1 2000 633/267 |
|
| SOURCE: echomail via fidonet.ozzmosis.com | |
Email questions or comments to sysop@ipingthereforeiam.com
All parts of this website painstakingly hand-crafted in the U.S.A.!
IPTIA BBS/MUD/Terminal/Game Server List, © 2025 IPTIA Consulting™.