Pipeline
Software pipeline explained.
In order to produce the report, the user needs to input a design file (csv) and a reads file or files representing the NGS results (typically fastq) on a library which is based on variants represented in the design file.
Then the following pipeline will take place:
Prepossessing : The input reads will be filtered so only valid sequences will stay for further analysis. What constitutes as a valid read can be configured by the user, using parameters such as sequence prefix and sequence length.
Matching : Each sequence will be matched to a corresponding variant. The matching can be done by different strategies. We are planning to implement the following approaches:
Barcode matching : If the library has a barcode assigned to each variant we will use that barcode to match each read with a tun-able tolerance for the matching.
Edit distance : Calculates edit distance between an input read and, in principle, all the variants. The variant with the lowest edit distance will be selected as the matched one.
Alignment : We then align each read to its corresponding variant and build the CIGAR path.
Analysis : We then use the collected data (match, alignment) to calculate different statistics mentioned in the synthetic library.
Report Generation: Finally, the calculated statistics are reported to the end user in a clean and accessible manner.
x
Last updated