Rechercher

Sur ce site


Documentation

PROGRAMS USE

(see also Tutorial section)

- THE EASY WAY (most users) :

To get information or help :

hifix --help

The user provides a fasta file FASTAFILE (*less than 256 characters per line* and no "U" character [to be replaced by "X"]), the result file of SiLiX clustering FNODESFILE and the network file NETFILE obtained with SiLiX :

hifix <FASTAFILE> <NETFILE> <FNODESFILE>

If using a multiprocessor machine with NBPROCS cores, it is valuable to specify the option :

hifix -t <NBPROCS> ...

If you have a /dev/shm directory (shared memory), it is recommended to use it (gains of performance) :

hifix -d /dev/shm ...

OUTPUT FORMAT

List of pairs in format "family_id sequence_id"
where :
- sequence_id are those of FASTAFILE
- family_id are build from the prefix of FASTAFILE
(prefix.fasta) followed by a unique tag such as

  • "_i_j" if family j was deduced from SiLiX pre-family i
  • "_i" if SiLiX pre-family i was conserved

Exple :

seq_1_1 id1
seq_1_2 id2
seq_2 id3

CLASSICAL SKETCH

The user provides a fasta file and the result file(s) of a all-against-all BLAST search in tabular format. It is now necessary to use SiLiX as a preliminary step, with —net option (see Documentation) :

silix seq.fasta blastall.out --net > seqSLX.fnodes

A file blastall.net has been generated.
NB : if you use the MPI version of SiLiX (with mpirun), you get multiple .net files that you need to concatenate into in single .net file.

The user can now use hifix :

hifix seq.fasta blastall.net seqSLX.fnodes > seqHFX.fnodes

See also example files in the data/ directory of the package.

MEMORY USE

Due to mafft-profile requirements, it may be necessary to run HiFiX on computers with more than 2 GB RAM. If SiLiX results displays pre-families larger than 15000 sequences, we recommend using more than 4 GB RAM.
Please note that, if HiFiX runs on multiple cores (option -t), the memory requirements are additive with the number of processes.

- THE HARD WAY (for very large datasets, if you have cluster or grid facilities, please contact us) :

The user must program a pipeline with independent tasks (hifixcore program) distributed and submitted to a scheduler, like this :

silix .....
silix-split ....
for all <FASTAFILE> <NETFILE> do
    hifixcore  <FASTAFILE> <NETFILE>  (job to be submitted)
done