Search

On this website


Home > Software and Databases > SiLiX > Documentation

Documentation

PROGRAMS USE (important changes since versions 1.2.x)

Two programs with man pages are available:

- Filtering+Clustering : The user provides a fasta file and the result file(s) of a all-against-all BLAST search in tabular format ( -outfmt 6 option in blastall, i.e. query id, subject id, percent identity, alignment length, number of mismatches, number of gap openings, query start, query end, subject start, subject end, Expect value, HSP bit score)

silix [OPTIONS] <FASTAFILE> <BLASTFILE>

To get information or help:

man silix
silix --help

- Clustering : The user provides an input a list of pairs of sequence IDs

silixx <NB> <FILE>

To get information or help:

man silixx
silixx --help

Running the parallel version of silix

First, the user must have a collection of N blast result files to be processed in parallel.

After having use ./configure with the option —enable-mpi, the user must adopt the classical way to run a program using MPI:

mpirun -np NP silix [OPTIONS] <FASTAFILE> <MULTIBLASTFILE>

with NP the chosen number of processors (in practise, NP<=N)

CLASSICAL SKETCH (important changes since versions 1.2.x)

In the following, we use auxiliary programs that are in the utils/ directory of the package, but not installed.

- Blasting all versus all

formatdb -i seq.fasta -n seq.db
blastp -db seq.db -query seq.fasta -outfm 6 -out blastall.out

or for older versions of Blast:

blastall -p blastp -d seq.db -i seq.fasta -m 8 -o blastall.out

- Running silix. Requires fasta files. The options are the filtering parameters, with the following default values (seePenel et al, BMC Bioinformatics, 2009):
two sequences in a pair are included in the same family if remaining HSPs (Homologous Segment Pairs) cover at least 80% of the
protein length and if their similarity is over 35%; a partial sequence is included if its length is ≥100 amino-acids or ≥50% of the length of the complete protein.

silix  seq.fasta blastall.out -f FAM > seq.fnodes

Here, we specified a prefix "FAM" for the family ids.

Nota Bene: For running the parallel version, the user displays the list of blast results files to be processed in parallel

mpirun -np 4 silix  seq.fasta filenames.txt -f FAM > seq.fnodes

where "filenames.txt" is

blastall1.out
blastall2.out
blastall3.out
blastall4.out

- Retrieving family sizes

utils/silix-fsize seq.fnodes > seq.fsize

- Splitting sequences in multiple fasta files

silix-split seq.fasta seq.fnodes

- With —net option, a file blastall.net is created and contains all the pairs taken into account after filtering. A message is sent to STDERR:

Inflating blastall.net

Naming Conventions

".net" are the extension for files of format

SEQID1 SEQID2
SEQID3 SEQID4

".fnodes" are the extension for files of format

FAMID1 SEQID1
FAMID1 SEQID2
FAMID2 SEQID3
FAMID2 SEQID4