Skip directly to search Skip directly to A to Z list Skip directly to page options Skip directly to site content
IRMA customizable parameters

Configuration files are loaded three times by IRMA. (1) A global configuration file is applied to ensure the initialization of all relevant variables and for any module that is run. Variables should not be deleted from this file. The global configuration file path is: IRMA_RES/defaults.sh  (2) Module-specific configurations are applied that may override any global arguments. These configurations help adjust the assembly to the organism of interest. They are located in: IRMA_RES/modules/<ORGANISM>/init.sh  (3.1) Finally, run-specific named configuration files can be applied to specialize the assembly for different situations. The default run-specific file is: IRMA_RES/modules/<ORGANISM>/config/<ORGANISM>.sh  (3.2) IRMA searches for the named configuration file when you run it. Suppose you specify FLU-utr as the module/configuration argument. The corresponding file path would be: IRMA_RES/modules/FLU/config/FLU-utr.sh

A example of this loading process is illustrated below for calling FLU-lowQC:

  1. Global: <install_path>/IRMA_RES/defaults.sh
  2. Module: <install_path>/IRMA_RES/modules/FLU/init.sh
  3. Run-specific: <install_path>/IRMA_RES/modules/FLU/config/FLU-lowQC.sh


Return to the IRMA homepage


Grid parallelization
Parameter Values Kind Description
GRID_ON 0,1 off, on If a Sun Grid Engine (SGE) heritage scheduler is available, use it. We have tested IRMA on UGE 8.2.1.
  LIMIT_BLAT 60000 ≥ 1 If grid execution is on, use scheduler for BLAT when available read patterns per sample are ≥ threshold.
  LIMIT_LABEL 1000 ≥ 1 If grid execution is on, use scheduler for LABEL when available reads per sample-gene are ≥ threshold.
  LIMIT_SAM 500 ≥ 1 If grid execution is on, use scheduler for SAM when available read patterns per sample-gene are ≥ threshold.
  LIMIT_SSW 80000 ≥ 1 If grid execution is on, use scheduler for SSW when available reads per sample-gene are ≥ threshold.
  LIMIT_PHASE 200 ≥ 1 If grid execution is on, use scheduler for phasing when available variants per sample-gene are ≥ threshold.
  MATCH_PROC 20 ≥ 1 If grid execution is on & quantity limit exceeded, number of array tasks for the match step.
  SORT_PROC 80 ≥ 1 If grid execution is on & limit exceeded, number of array tasks for the sort step.
  ALIGN_PROC 20 ≥ 1 If grid execution is on & limit exceeded, number of array tasks for the align step per gene segment.
  ASSEM_PROC 20 ≥ 1 If grid execution is on & limit exceeded, number of array tasks for the final assembly step per gene segment.
  PHASE_PROC 80 ≥ 1 If grid execution is on & limit exceeded, number of array tasks for the variant phasing step per gene segment.



Single-node setup
Parameter Values Kind Description
SINGLE_LOCAL_PROC 16 ≥ 1 For parallelization of one task on the computer IRMA is executed, typically the number of logical CPU cores.
DOUBLE_LOCAL_PROC 8 ≥ 1 For parallelization of two multi-threaded tasks. Usually half SINGLE_LOCAL_PROC.
ALLOW_TMP 0,1 off, on When not using the grid, try to use /tmp for working directory.
TMP /tmp <PATH> Absolute path to a writable scratch/tmpfs for IRMA's workspace.



Variant calling
Parameter Default Kind Description
MIN_F 0.008 [0,1] Minimum frequency heuristic for calling single nucleotide variants.
  AUTO_F 0,1 off, on Automatically adjust MIN_F based on the maximal zero-confidence minority allele. Go to formula.
  MIN_CONF 0.80 [0,1] Minimum confidence not machine error for single nucleotide variants. Go to formula.
MIN_F 0.008 [0,1] Minimum frequency heuristic for calling single nucleotide variants.
MIN_FI 0.005 [0,1] Minimum frequency heuristic for calling insertion variants.
MIN_FD 0.005 [0,1] Minimum frequency heuristic for calling deletion variants.
MIN_AQ 2 [0,64] Minimum average allele quality score heuristic for calling insertion & single nucleotide variants.
MIN_C 2 ≥ 1 Minimum allele count heuristic for calling variants.
MIN_TCC 100 ≥ 1 Minimum coverage depth heuristic (total coverage count) for calling variants.
SIG_LEVEL 0.999 {.90,.95,.99,.999} Significance level for variant calling statistical tests.



Reference editing
Parameter Default Kind Description
INS_T 0.25 [0,1] Minimum insertion frequency to alter reference. Set to 1 for off. Used during final assembly phase.
DEL_T 0.60 [0,1] Minimum deletion frequency to alter reference. Set to 1 for off. Used during final assembly.
SKIP_E 0,1 off, on Skip reference elongation/extension at the 5′ and 3′ ends. Used during read gathering phase.
MIN_FA 1 [0,1] Minimum frequency for alternative reference generation based on the alternative or most frequent minority allele. Set to 1 for off. Used during reading gather and final assembly.
  MIN_CA 20 ≥ 1 Minimum allele count for alternative reference generation.
MIN_AMBIG 0.25 [0,0.5] Minimum called SNV frequency for mixed base calls in the amended consensus folder. Used at the end of the final assembly phase.



Quality control
Parameter Default Kind Description
INCL_CHIM 0,1 off, on Include chimeric reads that are found. Chimeric reads contain both strands.
USE_MEDIAN 0,1 off, on Use median read quality to filter reads instead of the average read quality.
  QUAL_THRESHOLD 30 [0,64] Minimum threshold for filtering reads by median or average quality.
MIN_LEN 125 ≥ 1 Minimum read length to include reads in read gathering.
ADAPTER AGATGTGTA TAAGAGACAG bases Transposase adapter sequence. To turn off>, set to an empty string (ADAPTER=""). Clips 5′ on forward adapter and 3′ on the reverse complement adapter. May apply to NextTera paired-end reads.
NO_MERGE 0, 1 off, on Do not merge read pairs after final assembly. Merging provides error correction & detection.



 
Meta-assembly program control
Parameter Default Values Description
MAX_ROUNDS 5 ≥ 1 Maximum rounds for the read gathering phase.
  MERGE_SECONDARY 0,1 off,on After round 1, return secondary data into the unmatched read pool. Best for non-segmented data and when references are very similar.
  MATCH_PROG BLAT Program for the match step, to quickly match all reads to all references.
  SORT_PROG BLAT {BLAT, LABEL} Program for the sort step, to determine the best match for each matched read.
    MIN_RP 15 ≥ 1 Minimum number of read patterns to continue to attempt assembly per gene segment.
    MIN_RC 15 ≥ 1 Minimum read count to continue to attempt assembly per gene segment.
    LFASTM 0,1 off, on If using LABEL, use the modules specified by irma-MODULE-GENEGROUP for sorting. Example, the "triple" HA-NA-OG modules for influenza. Faster & more accurate.
    GENE_GROUP "" string Allows two-stage sorting via LFASTM. Stage one uses BLAT to sort into rough groups that can be further sorted using LFASTM. Format is comma-delimited patterns separated by a colon for "otherwise." Example: HA,NA:OG
    SORT_GROUPS "" string Determines the sort groups for BLAT sorting. Learn more now!
  ALIGN_PROG SAM {SAM, BLAT} Program for the align step, to roughly align to and edit each reference for the next round.
MAX_ITER_ASSEM 5 ≥ 1 Maximum iterations for the final assembly phase.
  ASSEM_PROG SSW Program for the final assembly step, which optimizes by the Smith-Waterman score.
  SSW_M 2 ≥ 0 Smith-waterman local alignment match score.
  SSW_X 5 ≥ 0 Smith-waterman local alignment mismatch penalty.
  SSW_O 10 ≥ 0 Smith-waterman local alignment gap open penalty.
  SSW_E 1 ≥ 0 Smith-waterman local alignment gap extension penalty.
REF_SET $DEF_SET <FASTA> Reference set used for the first round of read gathering. Contained in the module's reference folder.
  ASSEM_REF 0,1 off,on Classify REF_SET if needed & use to start final assembly instead of starting from read gathering results.
SEG_NUMBERS "" string Used for amended consensus renaming, typically module-specific. A comma-delimited list of key:value pairs. Example: "PB2:1,PB1:2,PA:3,HA:4,NP:5,NA:6,MP:7,NS:8"


Return to the Influenza Division Bioinformatics Team homepage
This page last reviewed: Tuesday, April 18, 2017
TOP