Pipeline

Software pipeline explained.

In order to produce the report, the user needs to input a design file (csv) and a reads file or files representing the NGS results (typically fastq) on a library which is based on variants represented in the design file.

Then the following pipeline will take place:

  1. Prepossessing : The input reads will be filtered so only valid sequences will stay for further analysis. What constitutes as a valid read can be configured by the user, using parameters such as sequence prefix and sequence length.

  2. Matching : Each sequence will be matched to a corresponding variant. The matching can be done by different strategies. We are planning to implement the following approaches:

    1. Barcode matching : If the library has a barcode assigned to each variant we will use that barcode to match each read with a tun-able tolerance for the matching.

    2. Edit distance : Calculates edit distance between an input read and, in principle, all the variants. The variant with the lowest edit distance will be selected as the matched one.

  3. Alignment : We then align each read to its corresponding variant and build the CIGAR path.

  4. Analysis : We then use the collected data (match, alignment) to calculate different statistics mentioned in the synthetic library.

  5. Report Generation: Finally, the calculated statistics are reported to the end user in a clean and accessible manner.

x

Last updated