miRExpress: Analyzing high-throughput sequencing data for profiling microRNA expression
How to install miRExpress in your machine
- 1. Download miRExpress package (miRExpress.tgz) to your linux machine.
- 2. type tar -zxvf miRExpress.tgz
- 3. change into unzip directory
- 4. type ./configure
- 5. type make
- 6. type make install (In root privilege)
- 7. cd "mirExpress/Example" folder; run "sh ExampleTest.sh" for testing installation. This is a simple example of running mirExpress. You can modify ExampleTest.sh to run your data. This step is optional.
miRExpress accepts the next generation seqeuncing data as query sequences in FASTQ format and the length of input sequences shorter than 64 nucleotides. miRExpress contains the miRNA information from miRBase. If you want to deal with raw sequencing data and construct miRNA expression profiles. You can use the follow procedures.
Raw_data_parse -> Trim_adaptor -> alignmentSIMD -> analysis
How to use this command will describe as follow:
"Raw_data_parse" handles the raw data sequences in FASTQ format and output the unique sequences and their counts using Tab(\t) to divide.
Raw_data_parse [-i raw_data] [-o output file name, optional]
-i raw data sequence file in FASTQ format.
-o output file name. Default is input file name plus .merge
"Trim_adapter" handles the sequence file which contain adpter or not according the input of adaptor sequence.
The input sequences file format as follow:
Counts and Sequences are divided by Tab(\t).
Trim_adapter [-i input file] [-t 3' adaptor sequence file] [-h 5' adaptor sequence file, optional] [-o output file name, optional]
-i input sequence file
-t 3' adaptor sequence file
-h 5' adaptor sequence file
-o output file name. Default is input file name plus .trim
"statistics_reads" computes sequence number and counts according to the length of sequence
statistics_reads [-i input file] [-o output file name, optional]
-i input sequence file
-o output file name. Default is input file name plus .len
"alignmentSIMD" handles the alignment in query sequences and reference sequences.
alignmentSIMD [-r precursor miRNA file] [-i input sequence file] [-o output directory] [-t alignment identity, optional] [-n Rank nohit file] [-u Number of CPU for calculation]
-r precursor miRNA (The format must be the same with miRExpress/data/hsa_precursor.txt)
-i input file
-o output directory
-t alignment identity between query and reference sequences. Default value is 1.
-u number of thread want to be created, depends on your cpu number. Default:1
-n rank nohit file, ordered by read counts, Optional
"analysis" handles the result of alignment and constructs miRNA expression profiles
analysis [-r precursor miRNA file] [-m mature miRNA information in precursor sequences][-d alignment result directory] [-o output file name of the alignment between precursor miRNA and reads] [-t output file for miRNA expression]Notice:
Results will be stored in "alignment result directory (folder of parameter -d)"
-r precursor miRNA file (This file must be the same as the file used to do alignmentSIMD)
-m mature miRNA information in precursor sequences (The format must be the same with miRExpress/data/hsa_miRNA.txt)
-d alignment result directory (This directory need to be the same with the directory used to do alignmentSIMD)
-o output file for alignment result
-t output file for expression result
-g tolerance range for mapping mature miRNA [optional, default:4]
-l similarity between read length and mature miRNA [optional, default:0.8]
-s precursor structure file [optional]
Department of Biological Science and Technology &
Institute of Bioinformatics and System Biology,
National Chiao Tung University, Taiwan
Contact with Dr. Hsien-Da Huang