Tool Structure and Scoring
TAG-IN utilises a number of existing scoring models and tools:
- "Rule Set 2" scoring model by Doench et al. 2016 to assess sgRNA efficiency.
- Off-target searching up to 3 mismatches for both NGG and NAG PAM using Bowtie Langmead et al. 2009.
- Off-target scoring model by Hsu et al. 2013 to assess sgRNA specificity.
- And a novel ssDNA design function which allows facile single strand DNA (ssDNA) design for 3 prime tagging.
- Visualisation of genomic data was facilitated by graphics library Scribl.
ssDNA design is completed with the following steps:
- Retrieve gene annotation from Ensembl database.
- Extract restrained sequence around annotated stop codon.
- For each sgRNA simulate sgRNA::Cas9 complex cleavage 3 nt upstream of PAM sequence.
- Tag of choice inserted as close to the cleavage site as possible.
- Resulting PAM 200nt ssDNA sequence contains tag of choice inserted immediately adjacent to the stop codon with flanking arms homologous to target sequence.
- PAM sequence is mutated to prevent further Cas cleavage of donor sequence.
In order to speed up user selection a heuristic summary score was created:
where S represents the summarisation score for sgRNA within the intron or 3UTR, d the distance from the stop codon, m the MIT off-target score by Hsu et al. 2013, and r represents the RS2 efficiency score by Doench et al. 2016. To minimise potential for CRISPR activity within the coding sequence sgRNA that act within the intron/exon are penalised and hence governed by equation (2). Minimising sgRNA distance, d, from the stop codon was considered important to maximise efficiency of tagging - with a particular preference between 8-15 bp from the stop codon.
Please contact us with any queries or concerns:
Pollard Lab - Email: firstname.lastname@example.org