Documentation
PROGRAMS USE (important changes since versions 1.2.x)
Two programs with man pages are available:
Filtering+Clustering : The user provides a fasta file and the result file(s) of a all-against-all BLAST search in tabular format ( -outfmt 6 option in blastall, i.e. query id, subject id, percent identity, alignment length, number of mismatches, number of gap openings, query start, query end, subject start, subject end, Expect value, HSP bit score)
silix [OPTIONS] <FASTAFILE> <BLASTFILE>
To get information or help:
man silix
silix --help
Clustering : The user provides an input a list of pairs of sequence IDs
silixx <NB> <FILE>
To get information or help:
man silixx
silixx --help
Running the parallel version of silix
First, the user must have a collection of N blast result files to be processed in parallel.
After having use ./configure with the option —enable-mpi, the user must adopt the classical way to run a program using MPI:
mpirun -np NP silix [OPTIONS] <FASTAFILE> <MULTIBLASTFILE>
with NP the chosen number of processors (in practise, NP<=N)
CLASSICAL SKETCH (important changes since versions 1.2.x)
In the following, we use auxiliary programs that are in the utils/ directory of the package, but not installed.
Blasting all versus all
formatdb -i seq.fasta -n seq.db
blastp -db seq.db -query seq.fasta -outfm 6 -out blastall.out
or for older versions of Blast:
blastall -p blastp -d seq.db -i seq.fasta -m 8 -o blastall.out
Running silix. Requires fasta files. The options are the filtering parameters, with the following default values (see Penel et al, BMC Bioinformatics, 2009):
two sequences in a pair are included in the same family if remaining HSPs (Homologous Segment Pairs) cover at least 80% of the
protein length and if their similarity is over 35%; a partial sequence is included if its length is ≥100 amino-acids or ≥50% of the length of the complete protein.
silix seq.fasta blastall.out -f FAM > seq.fnodes
Here, we specified a prefix "FAM" for the family ids.
Nota Bene: For running the parallel version, the user displays the list of blast results files to be processed in parallel
mpirun -np 4 silix seq.fasta filenames.txt -f FAM > seq.fnodes
where "filenames.txt" is
blastall1.out
blastall2.out
blastall3.out
blastall4.out
Retrieving family sizes
utils/silix-fsize seq.fnodes > seq.fsize
Splitting sequences in multiple fasta files
silix-split seq.fasta seq.fnodes
With —net option, a file blastall.net is created and contains all the pairs taken into account after filtering. A message is sent to STDERR:
Inflating blastall.net
Naming Conventions
".net" are the extension for files of format
SEQID1 SEQID2
SEQID3 SEQID4
".fnodes" are the extension for files of format
FAMID1 SEQID1
FAMID1 SEQID2
FAMID2 SEQID3
FAMID2 SEQID4



