# Pipeline

In order to produce the report, the user needs to input a design file (csv) and a reads file or files representing the NGS results (typically fastq) on a library which is based on variants represented in the design file.&#x20;

Then the following pipeline will take place:&#x20;

1. **Prepossessing** : The input reads will be filtered so only valid sequences will stay for further analysis. What constitutes as a valid read can be configured by the user, using parameters such as sequence prefix and sequence length.&#x20;
2. **Matching** : Each sequence will be matched to a corresponding variant. The matching can be done by different strategies. We are planning to implement the following approaches:&#x20;
   1. *Barcode matching* : If the library has a barcode assigned to each variant we will use that barcode to match each read with a tun-able tolerance for the matching.&#x20;
   2. *Edit distance* : Calculates edit distance between an input read and, in principle, all the variants. The variant with the lowest edit distance will be selected as the matched one.&#x20;
3. **Alignment** : We then align each read to its corresponding variant and build the CIGAR path.&#x20;
4. **Analysis** : We then use the collected data (match, alignment) to calculate different statistics mentioned in the synthetic library.&#x20;
5. **Report Generation**: Finally, the calculated statistics are reported to the end user in a clean and accessible manner.

x
