AlleleSelect

Allele-Selective ASO Design Pipeline for CACNA1A Gain-of-Function Mutations

March 2026 - Current

Introduction

In many genetic diseases, the treatment problem is not making something happen, but making something stop. The gene in question is present on both chromosomes. One copy functions normally, keeping an important biological process running and the other copy carries a mutation that makes it overactive- such overactivity is the disease. The obvious therapeutic move, suppressing expression of the relevant gene, immediately runs into a hard constraint: the normal copy is doing something you can't afford to lose.

Familial hemiplegic migraine type 1 is one of such cases. The CACNA1A gene encodes the Cav2.1 calcium channel, the primary mediator of neurotransmitter release at cerebellar and cortical synapses. R192Q, the variant at the center of this project, is a heterozygous gain-of-function missense mutation at position 575 of the CDS. It converts an arginine in the S4 voltage sensor to glutamine, lowering the activation threshold of the channel and increasing presynaptic calcium influx. The result is enhanced cortical spreading depression susceptibility and, in individuals who carry it, familial hemiplegic migraine.

The wildtype allele is vital, as knockdown of its expression in mouse models has been shown to produce severe cerebellar ataxia and early death. The therapeutic window thus for any gene-silencing strategy is defined by how selectively it can suppress the mutant allele while leaving the wildtype allele alone.

AlleleSelect is a Python CLI tool I developed to address this issue. It computes, for every candidate antisense oligonucleotide (ASO) window sliding across a gain-of-function mutation site, the difference in binding free energy (stability basically) between mutant and wildtype mRNA targets. This difference, the allele selectivity ratio/ASR, is the main metric (more negative means the ASO prefers the mutant = good). The pipeline integrates 4 additional scoring layers: mRNA accessibility from RNAfold partition function, off-target screening via BLASTn against the GENCODE v44 human transcriptome, splice site proximity flagging, and gapmer modification pattern annotation. The output is a ranked candidate CSV and an interactive HTML report that in later versions, even researchers of different genes and diseases can use for ASO design prioritization.

This project is the 6th project of my CACNA1A/FHM1 series, and is a natural extension of MiSOF. However, AlleleSelect is categorically different from those projects in that it designs, rather than merely characterizing/stratifying. And although my favorite book is Flowers for Algernon and I find mice delightful, I built AlleleSelect with the explicit intent that its output might one day be synthesized, injected into an knockin mouse, and tested.

Hopefully, though it may be a long shot, this might lead to a cure.

Cheers,
Angie X.

Note: this project is actively updated. Apologies for any content gaps during this time.

AlleleSelect: A Layman's Guide

Most people who hear 'gene therapy' picture replacing a broken gene with a working copy. That sure is one approach, since it works well for diseases where the gene is simply absent or non-functional. However, FHM1 is not that kind of disease. The gene is present in both copies. The problem is that oftentimes, one of the two copies has a mutation that makes the calcium channel (CaV2.1) it encodes in neurons very hyperactive. The hyperactivity (or increased susceptibility to fire) is what causes disease symptoms. The treatment challenge is thus not addition but subtraction: how can we reduce the activity of the mutant copy without touching the normal one?

Antisense oligonucleotides (ASOs) are short synthetic strands of DNA-like molecules, typically 18-22 nucleotides long. The tech has been around since the 1970s, with major breakthroughs in the past decade. ASOs are designed to bind a specific sequence in a target messenger RNA (mRNA, the intermediate molecule that carries genetic instructions from DNA to protein). When an ASO binds its target mRNA, it recruits a naturally-occurring cellular enzyme called RNase H, which cuts the mRNA at the binding site. This causes the mRNA to be degraded, meaning the protein is not produced. The unharmed ASO is released to repeat the cycle.

For a heterozygous mutation like R192Q, the question is whether you can design an ASO that binds the mutant mRNA significantly more tightly than the wildtype mRNA. The mutant and wildtype sequences differ by a single nucleotide at position 575: G in the wildtype, A in the mutant. An ASO designed to complement the mutant sequence will have a deliberate mismatch when it tries to bind the wildtype sequence at that position. The magnitude of this mismatch penalty, in thermodynamic terms, is what AlleleSelect computes.

The pipeline uses the nearest-neighbor model. It's a method from 1998 that calculates the stability of a short nucleic acid duplex by summing the stacking interactions between each adjacent pair of bases. For the wildtype binding calculation, it applies a mismatch correction which accounts for the specific destabilizing effect of the particular mismatch type (G>A at position 575 of CACNA1A produces a T:G wobble mismatch when the mutant-targeted ASO tries to bind wildtype, which is less destabilizing than, say, a purine-purine clash). The allele selectivity ratio is calculated as the wildtype binding energy minus the mutant binding energy. Positive difference means the ASO binds mutant more tightly while more negative difference means stronger preference.

Two additional factors matter for whether an ASO will work in a cell. First, the target mRNA has to be physically accessible. mRNA folds into secondary structures (stem-loops, hairpins) that can sometimes bury potential binding sites. AlleleSelect uses a computational tool called RNAfold to predict which regions of the CACNA1A mRNA are single-stranded and accessible. Second, the ASO should not inadvertently bind other mRNAs in the genome: this would be disastrous, and can cause a slew of adverse and unexpected side effects. AlleleSelect screens each top candidate against the complete catalog of human transcripts using sequence alignment and flags any candidate with suspicious matches.

The output is a ranked list of candidate sequences, with the top-ranked sequences representing the best theoretical combination of mutant preference, mRNA accessibility, and transcriptome specificity. These sequences, if synthesized and validated in cells and then tested in the R192Q knockin mouse model, could tell us whether allele-selective ASO therapy is a viable approach for FHM1.

AlleleSelect Concepts Reference - A. Xiu.pdf

PHASE 1: The Wildtype Sparing Constraint

Before the code, let’s start with biology. AlleleSelect is entirely determined by one biological fact.

CACNA1A encodes the alpha-1A subunit of the Cav2.1 P/Q-type calcium channel. Cav2.1 is the dominant calcium channel at cerebellar Purkinje synapses, at cortical glutamatergic synapses, and at the neuromuscular junction. Every time a neuron in the cerebellum fires and needs to release neurotransmitter, calcium entry through Cav2.1 is the proximate trigger. The channel is so incredibly central to normal CNS function that CACNA1A knockout mice have severe cerebellar ataxia and die early.

R192Q is heterozygous, meaning one allele has the mutation and one does not. The normal allele is doing exactly what normal Cav2.1 is supposed to do. The mutant allele is doing it too vigorously, because the arginine-to-glutamine substitution at position 192 destabilizes the S4 voltage sensor in domain I, which shifts the channel's activation voltage by roughly -4 mV (increasing open probability at rest). The result is elevated presynaptic calcium and increased neurotransmitter release probability, which lowers the threshold for a sustained depolarization wave known as "cortical spreading depression."

Any therapeutic approach that reduces total CACNA1A expression non-selectively will simply trade one problem for another. Get the dose wrong and you have both fewer CSD episodes and worse cerebellar ataxia. The treatment window is very narrow, and it narrows further in an adolescent whose cerebellum is still maturing.

This is why allele-selective silencing via ASOs exists as a possible therapeutic strategy. It also means that AlleleSelect had to be built to compute allele selectivity specifically, because nothing else makes the therapeutic problem tractable.

The ASO field validated the general approach of this tech with nusinersen (2016) and tofersen (2023), both delivered intrathecally for CNS indications. The allele-selective variant was demonstrated in SCN2A gain-of-function epilepsy mouse models in 2021. CACNA1A FHM1 is the same logical structure: heterozygous GoF mutation in an essential neural gene and allele-selective ASO delivered to the CSF via intrathecal injection (injection into the spinal canal).

Figure 1. Allele selectivity ratio across 20-mer ASO candidate windows for CACNA1A c.575G>A (R192Q). Each point represents one candidate window sliding in 2-nt steps across the mutation site. Y-axis: allele selectivity ratio (ASR = ΔGmutant − ΔGwildtype, kcal/mol), computed using SantaLucia (1998) nearest-neighbor thermodynamics with Peyret (1999) internal mismatch corrections. Teal markers: top-ranked candidates (ASR < −1.5 kcal/mol, mutation at positions 8–12 of ASO). Gray dashed line: ASR = −1.0 threshold; teal dashed line: ASR = −1.5 top-candidate threshold. Red dotted vertical line: mutation position (CDS position 575).

PHASE 2: Nearest-Neighbor Thermodynamics for Mismatch Discrimination

The mathematical core of AlleleSelect is the SantaLucia 1998 nearest-neighbor model for nucleic acid duplex thermodynamics. Allow me to explain what I learned from its implementation, because the physics is quite interesting and the history of how these numbers came to exist is also, for lack of better terms, quite interesting.

Short nucleic acid duplexes are not stabilized by the sum of individual base-pair hydrogen bond energies, but primarily by base stacking: the aromatic pi systems of adjacent nucleotide bases overlap and interact, contributing enthalpy and constraining the entropic freedom of the single strands. A GC base pair next to an AT base pair has thus different thermodynamic properties than a GC pair next to a GG mismatch, even though the GC pair itself is identical in both cases. This is the physical content of the nearest-neighbor model. Context matters, and the context unit is here is the dinucleotide pair. And if that made your eyes glaze over, here's the simple version: the stability of DNA binding depends not just on which letters pair up, but on which letters are next to each other. Neighbors matter, like seating arrangements at a dinner party.

John SantaLucia Jr. (1998) (yes, that is his actual last name) ran melting experiments on 108 precisely designed short DNA duplexes and extracted, by fitting a two-state thermodynamic model, the enthalpy and entropy contributions for each of the 16 possible nearest-neighbor dinucleotide pairs. Table 2 of that paper is 16 rows, and is what AlleleSelect uses. It is 27 years old. When I first pulled it up and copied the values into nearest_neighbor.py, I had an unexpected reaction, which was some feeling between appreciation and blasphemy. The field of computational drug design has produced an enormous amount of increasingly sophisticated methodology over the past three decades, and the question of which strand of DNA binds which strand better is still answered, in 2026, by a table published in a journal 10 years before I popped out of the womb. Sometimes the old stuff just works.

The nearest-neighbor sum gives delta-H and delta-S for the duplex, from which delta-G at 37 degrees C and the melting temperature (Tm) are derived. For the Tm formula specifically, the gas constant R and the total strand concentration CT enter as parameters. For ASO design at cellular concentrations (roughly 1 micromolar), the Tm calculation is a useful check but not the primary metric. Delta-G at 37 degrees C is what matters; Tm is informative for synthesis quality control.

The mismatch corrections required a second paper: Peyret et al. (1999), which systematically measured the delta-delta-H and delta-delta-S corrections for all 16 internal DNA/DNA mismatch types. For R192Q specifically, the relevant mismatch is T:G wobble (the ASO carries T at the mutation position to complement mutant A; wildtype carries G, so binding to wildtype produces a T:G wobble). Wobble mismatches have a negative delta-delta-H, meaning they are less destabilizing than complete mismatches. Dejargonified, a T:G wobble is like a slightly loose screw, whereas an A:A mismatch is like a square peg in a round hole. The wobble still holds, just not as tightly. This is a very relevant biochem fact to the design, as it means that the allele selectivity will be smaller in magnitude than it would be for a variant producing an A:C or A:A mismatch. The pipeline still finds useful candidates, but the biology makes the problem slightly harder than the ideal case.

Implementing both papers took around 2 and a 1/2 days of iteration. The unit tests validate the nearest-neighbor implementation against the worked examples in SantaLucia 1998 Table 4 and validate the mismatch penalties against the expected ordering of mismatch severity from Peyret 1999. Twenty-seven passing tests were incorporated for just the thermodynamics modules.

PHASE 3: Pipeline Architecture

With the thermodynamic core working, the pipeline builds outward in layers. Each layer addresses one additional factor that separates theoretical binding affinity from practical knockdown efficiency in a cell.

The sequence retrieval layer uses the Ensembl REST API to fetch the CACNA1A CDS by transcript ID (ENST00000360228.10, the canonical transcript). The HGVS parser handles the mutation notation (c.575G>A), validates the reference base against the fetched CDS, and applies the substitution to generate the mutant sequence. This is simpler than it sounds in principle, but gets finicky in practice: Ensembl transcript IDs have version suffixes, the REST API returns slightly different fields depending on request parameters, and there are encoding edge cases in older transcript records. The implementation handles all of these and fails with informative error messages rather than silently producing wrong sequences.

The candidate window generator slides 18, 19, 20, 21, and 22-nt windows across +/- 30 nucleotides around the mutation site. For each window position and length, the ASO sequence is defined as the reverse complement of the mutant target window, the allele selectivity ratio and mismatch position scores are computed, and the results are collected. This produces a few hundred candidates per run.

The accessibility layer calls RNAfold via subprocess on a +/- 200 nt window around the mutation site, with partition function enabled. RNAfold's _dp.ps output is a PostScript-formatted dot plot; the base-pair probabilities are stored as square roots in the ubox entries of that file. The parser extracts these, squares them to recover true probabilities, sums them per position, and computes 1 - sum as the per-position accessibility. Window accessibility is the mean across the ASO binding region. Parsing PostScript by regex is not glamorous, but it works reliably and avoids a heavier dependency on ViennaRNA's Python bindings.

The off-target layer was the most infrastructure-heavy part of the build. GENCODE v44 is approximately 4 GB compressed. Building the local BLASTn database takes 5-10 minutes the first time and is cached for subsequent runs. Running BLASTn in short-sequence mode on 50 candidates takes another 2-3 minutes. The GENCODE transcript IDs in BLASTn subject fields use a pipe-delimited format (ENST|ENSG|source|gene_name|transcript_name|...) from which gene names are extracted by field index. CACNA1A self-hits are filtered out.

The splice risk module uses a combination of hardcoded approximate exon boundary positions for the R192Q region and the ability to load full positions from the GENCODE GTF annotation file when available. The hardcoded fallback covers the common case (running the R192Q pre-computation) without requiring the 1.5 GB GTF download.

The modification annotator applies the published Ionis gapmer design rules: 5-10-5 architecture for 20-mers, proportional adjustment for shorter oligos, MOE modification for sequences with GC > 40%, LNA for lower-GC sequences, and a PyPy dinucleotide toxicity flag in the 5' flank. The gapmer pattern string (e.g. 'mmmmmddddddddddmmmmm') encodes the modification at each position.

The final ranking combines allele selectivity ratio, accessibility, off-target count, and splice risk into a composite priority score. The HTML report renders with Plotly for the scatter plots and a vanilla SVG arc diagram for the mRNA secondary structure visualization. The CSV is self-contained and formatted for direct import into any spreadsheet or analysis environment.

Figure 2. mRNA secondary structure of the CACNA1A R192Q mutation region (CDS positions 550–610). Arcs represent predicted base pairs; arc height and line weight scale with base-pair probability from the RNAfold partition function (Vienna RNA package, Lorenz et al. 2011). Colored baseline dots: per-position accessibility (warm = low accessibility / high pairing probability; cool = high accessibility / predominantly single-stranded). Colored horizontal bars below the backbone indicate the binding windows of the top 10 ranked ASO candidates; teal bars correspond to top-ranked candidates (ranks 1–5, ASR < −1.5 kcal/mol), gray bars to candidates 6–10. Red star and dashed vertical line mark the R192Q mutation site (CDS position 575). The single-stranded region surrounding position 575 is consistent with high accessibility scores in top-ranked candidates.

PHASE 4: R192Q Candidate Output and Interpretation

The pre-computed run for R192Q (v2 pipeline) produces candidates across all five window lengths. The top candidate by composite score is AS_21_564, a 21-mer with ASR -1.335 kcal/mol, mutation at position 10 (gap center, position score 0.818), accessibility 0.443, and 47 BLAST off-target hits that need manual review. The second candidate, AS_21_565, has perfect position score (1.000) and no BLAST hits; it becomes the primary synthesis candidate if AS_21_564's off-targets are concerning. A candidate that was in the v1 top five, AS_21_558, was removed because its mutation falls in the 5' MOE wing where RNase H cannot cut – it would never be allele-selective regardless of thermodynamics.

The T:G wobble mismatch issue I described in Phase 2 shows up clearly in the output. The ASR magnitude for R192Q candidates is moderate compared to what would be expected for a variant producing a purine-purine mismatch. An A:A mismatch has a delta-delta-H of +4.7 kcal/mol; the T:G wobble has -1.5 kcal/mol. The effective discrimination between alleles at 1 micromolar strand concentration, for a candidate with ASR = -1.5 kcal/mol, translates to roughly 10-12 fold preferential binding to mutant. In a cell with both alleles present at roughly equal abundance, this means the mutant transcript will be preferentially degraded but not exclusively. Wildtype degradation at therapeutic concentrations will not be zero, but a 10‑fold preference might still be enough to shift the disease balance.

The accessibility scores vary significantly across window positions. Windows overlapping the predicted stem regions of the CACNA1A mRNA secondary structure around position 575 score below 0.4 and are deprioritized. The top candidates fall in the predicted single-stranded loop region immediately surrounding the mutation site, which is consistent with the general principle that mutation sites in missense variants are often slightly more accessible than the flanking sequence (since they are by definition not perfectly base-paired to any complementary intramolecular sequence).

The off-target landscape for CACNA1A candidates is cleaner than I expected. The gene has multiple voltage-gated calcium channel paralogs in the human genome, but the sequence around position 575 in exon 4 is not conserved across the paralogs at the level of 80% identity over 14+ consecutive nucleotides. The top candidates have zero BLASTn hits outside CACNA1A itself. This does not mean off-target effects are impossible in a cell; it means there are no obvious sequence-based red flags.

The demo output in demo/R192Q_output/ contains the full candidates.csv and report.html generated from this run. The HTML report is interactive: clicking column headers sorts the table, and hovering over scatter plot points shows the full candidate profile. The top 5 candidates are highlighted in teal in both the table and the scatter plot.

Figure 3. AlleleSelect priority space for CACNA1A c.575G>A (R192Q) ASO candidates. X-axis: mRNA accessibility score (mean unpaired probability from RNAfold partition function, 0–1). Y-axis: allele selectivity ratio (ASR = ΔGmutant − ΔGwildtype, kcal/mol). Color encodes off-target count from BLASTn against the GENCODE v44 human transcriptome (≥80% identity over ≥14 nt): teal = top candidates (0 off-targets, ASR < −1.5, mutation at optimal position); green open circles = 0 off-targets; orange = 1–2 off-targets; red = 3+ off-targets. Purple × symbols indicate splice-risk candidates. Shaded teal region: target zone (accessibility > 0.65, ASR < −1.5).

PHASE 5: Comparison with Existing Tools + Open-Source Utility

ASOG, published in September 2025 from the Bordeaux structural bioinformatics group, is a real and useful ASO design tool. It generates candidates, checks off-targets via BLASTn, assesses splice site proximity, and produces thermodynamic scores. Here's exactly what AlleleSelect adds:

ASOG does not compute allele selectivity. It takes a single target sequence and generates ASOs for it. For a gain-of-function heterozygous mutation where the therapeutic constraint is differential allele knockdown, ASOG produces the same candidates regardless of whether the input is the mutant or wildtype sequence. It has no mechanism to rank candidates by how much they prefer the mutant allele over the wildtype. AlleleSelect's core contribution is that it computes delta-G for both alleles for every window and ranks by the difference. This is the step that is missing for FHM1 ASO design, and it is what the van den Maagdenberg lab would need to select candidates for in vivo testing. In other words, ASOG tells you "this ASO binds its target." AlleleSelect tells you "this ASO binds the mutant 10x better than the normal one." Those are very different (and equally important) questions.

The output of AlleleSelect for R192Q is a CSV file and an HTML report. Both are formatted for direct sharing with wet-lab collaborators. The CSV has all the thermodynamic parameters, scores, and annotations needed for a synthesis order. The HTML report is self-contained and renders in any browser without any dependencies. The intent was to make the output as frictionless as possible for a researcher receiving it cold: open the HTML file, look at the scatter plot, read the top 5 rows, and know which sequences to order.

As with every Xiu Lab project, the standard disclaimer applies: I am a high school student building research tools without institutional access, wet lab facilities, or a research budget. AlleleSelect is not production-grade drug design software. It is a computational pipeline built on published methods, validated against the primary literature, and designed to produce output that is interpretable and actionable by the researchers who would actually do the experimental work. If someone synthesizes a top-ranked candidate and tests it in an R192Q mouse and it does something interesting, that would be a nice outcome. If the output helps clarify the design space even without being directly tested, that is also something.

PHASE 6: Position Scoring, Toxicity Filtering, and Composite Ranking – v2

Following initial publication of AlleleSelect v1 results and outreach to researchers in the ASO therapeutics and CACNA1A biology fields, four independent experts provided technical feedback that identified concrete gaps in the pipeline's scoring logic. This phase documents the feedback received, the biological reasoning behind each critique, the implementation decisions made in response, and the change in candidate rankings produced by v2.

1. Position of Mutation Matters

Dr. Paymaan Jafar-Nejad (Ionis) and Dr. Willeke van Roon-Mom (LUMC) both pointed to the same paper (Ostergaard 2013). The key finding: where the mutation sits inside the ASO determines whether RNase H can discriminate. If the mutation is in the MOE wings (positions 1-5 or 17-21), RNase H cannot cut at all- no selectivity possible. If it is at the exact center of the DNA gap, discrimination is best. This made immediate intuitive sense to me once I heard it: the cutting enzyme needs to be positioned right over the mismatch. If the mismatch is off to the side, the enzyme doesn't care.

What I changed in v2: Added SNP position scoring. Every candidate now gets a score from 0.0 to 1.0 based on how close the mutation is to the gap center. If the mutation is in a wing, score is 0.0 and the candidate is deprioritized. This changed the ranking. One candidate that was in my v1 top five (AS_21_558) had its mutation at position 4 (in the wing). It is now removed.

2. Toxic Sequence Motifs

Dr. van Roon-Mom gave me three papers showing that certain short sequence patterns (like GGGG repeats, multiple CpG dinucleotides, and specific trinucleotides) can cause neurotoxicity, immune activation, or liver damage, regardless of how selective the ASO is. I had actually never heard of this before. It turns out that even a perfectly selective ASO can kill cells just by having the wrong letter pattern. Chemistry is full of hidden traps.

What I changed in v2: Added a toxicity filter that scans every candidate for known dangerous motifs. The filter outputs a flag: PASS (safe), NOTE (one CpG, fine), WARN (moderate risk), or FAIL (do not use). None of my top candidates have WARN or FAIL; all have a single CpG, which is expected.

3. Composite Scoring

After adding position scoring, I needed a way to combine the three metrics (ASR, position, accessibility) into one rank. I chose weights based on expert emphasis: ASR (thermodynamic selectivity) gets 40%, position score gets 35%, accessibility gets 25%. This composite score determines the final ranking.

4. Dr. Xiaoyu Chen (Timothy Syndrome ASO Paper)

Dr. Xiaoyu Chen at St. Jude Children's Research Hospital reminded me that transcript knockdown does not guarantee functional rescue. The channel has many roles, and reducing mutant RNA by some percentage might not improve CSD threshold. This limitation is now stated in the README and was a humbling reminder for me that mRNA is not the same as protein function. You can cut the message, but the protein already made might stick around for days.

5. Anastasia Khvorova (UMass)

Dr. Khvorova at UMass confirmed the 2013 Ostergaard paper and added a new detail: it is not only the position but the sugar pucker (chemical shape) of the ASO letters near the mutation that matters. By locking those sugars into an RNA-like shape, you can further block RNase H from cutting the normal RNA. This is implemented in v4 as the --recommend-mods flag.

PHASE 7: Making AlleleSelect Disease-Agnostic - v3

Following outreach to the broader ASO therapeutics community, two developments prompted extending AlleleSelect beyond CACNA1A. First, Professor Lyn Griffiths (QUT) confirmed that the allele-selective ASO approach is most tractable for variants with well-characterized gain-of-function electrophysiology and existing mouse models (specifically R192Q, S218L, G293R, A454T, T501M, and R1349Q) and that R192Q and S218L are the strongest proof-of-concept targets given their dual characterization and mouse model availability. AlleleSelect was run on all six variants and results are in demo/.

Second, Gijs-Jan Scholten (PhD candidate, van Roon-Mom lab, LUMC) reached out via email and requested support for ATXN1/SCA1 allele-selective ASO design. SCA1 is caused by polyglutamine expansion in ATXN1 and, analogously to FHM1, requires allele-selective targeting to spare the wildtype allele. It was actually at the moment I first saw Mr. Scholten's email when I realized AlleleSelect could be useful beyond my little corner of migraine genetics. This is when I initially began considering disease agnosticism for the tool.

v3 changes implemented: --gene flag for output labeling, --no-splice-check flag for non-CACNA1A transcripts, and get_splice_positions_for_transcript() generic Ensembl-based splice site lookup in scoring/splice.py. The thermodynamics, accessibility, SNP position scoring, and toxicity screening were already gene-agnostic. AlleleSelect now functions as a general allele-selective gapmer design pipeline for any dominant neurological disease with a targetable SNP. The correct transcript for ATXN1/SCA1 is ENST00000436367.6 (MANE Select, NM_001128164.2).

Live AlleleSelect Report - R192Q Run

Interactive candidate report for CACNA1A R192Q (v2 pipeline): the HTML report displays all 210 ASO candidates ranked by composite score (40% ASR, 35% SNP position, 25% accessibility). Columns include ASO sequence, length, allele selectivity ratio (ASR, kcal/mol), SNP position within the ASO (1‑indexed), position score (0–1, center=best), RNAfold accessibility (0–1, higher=more open), BLASTn off‑target count against GENCODE v44, toxicity flags (CpG, G‑rich, etc.), and splice site proximity. Click any column header to sort; hover over the interactive scatter plot to see candidate details. The top candidate AS_21_564 (composite 0.664, ASR –1.335, SNP at position 10, accessibility 0.443) and the clean backup AS_21_565 (SNP at exact gap center, position score 1.000, unscreened off‑targets) are highlighted. The report is self‑contained and opens in any browser.

PHASE 8: RNase H Cleavage Site Integration + Sugar Pucker – v4

Following expert feedback and preparation for experimental collaborations with Dr. Fikri Birey (Emory, CACNA1A A713T ASO work) and Dr. Sara Aguti and Dr. Haiyan Zhou (UCL, COL6A3 allele-selective gapmer work), four improvements were implemented in v4. Additionally, wing-position scoring was revised in v5 based on independent confirmations from two researchers.

Improvement 1: Differential mRNA accessibility (--diff-accessibility)

This improvement was largely motivated by Aguti and Zhou 2024 (Mol Ther Nucleic Acids, PMID 38993932), which demonstrated that differences in secondary structure between mutant and wildtype mRNA at the mutation site contribute to allele selectivity independently of thermodynamics. Their mixmer design for COL6A3 exploited structural accessibility differences to achieve a 3-fold specificity improvement over standard gapmer design. The idea here is clever: if the mutation changes how the mRNA folds, that folding difference alone can make the mutant more accessible to an ASO, even without a perfect thermodynamic match.

AlleleSelect v4 runs RNAfold partition function scoring on both the wildtype and mutant CDS sequences separately (±200 nt around the mutation site) and computes a per-candidate diff_accessibility = mut_accessibility - wt_accessibility. A positive differential means the mutant mRNA is more single-stranded at the ASO binding window, which independently favors selective engagement. When differential accessibility is computed, it contributes 15% to the composite score. The R192Q mutation produces minimal structural differences between wildtype and mutant (differential ≈ 0.000 for all windows), consistent with the G>A substitution being too subtle to reshape local secondary structure. This was confirmed by Dr. Aguti on the call- for structurally symmetric targets, the deliberate mismatch strategy is necessary rather than optional.

Improvement 2: Engineered mismatch mode (--extra-mismatch)

This improvement was proposed by Professor van Roon-Mom (LUMC) and validated experimentally in Ostergaard 2013 Figure 7. Adding one deliberate additional mismatch 2-3 positions from the SNP gives the wildtype allele two mismatches against the ASO while the mutant allele retains only one. This strategy achieved >100-fold allele discrimination in the Huntingtin ASO context. Dr. Aguti confirmed on a follow-up call that for targets without structural asymmetry between alleles, scanning mismatch positions from 5' to 3' is the appropriate empirical approach. The idea is a bit counterintuitive. You're deliberately making the ASO less perfect for the mutant, but you're making it much worse for the wildtype. The net effect can be a huge gain in selectivity.

AlleleSelect v4 generates engineered-mismatch variants for the top 5 ranked candidates, testing deliberate mismatches at positions ±2 and ±3 relative to the SNP. For R192Q, the top engineered-mismatch candidates (AS_19_566_em variants) achieve estimated ASR of -1.783 kcal/mol versus -0.983 kcal/mol for the unmodified parent, and composite scores of 0.8316 versus 0.6716. These variants output to candidates_engineered_mismatch.csv and are best treated as relative rankings rather than absolute predictions.

Improvement 3: Chemical modification recommender (--recommend-mods)

Based on Khvorova (UMass) and Ostergaard 2013 Figures 3, 5, and 7: placing 2S-dT, FRNA, or S-cEt at gap positions flanking the SNP (positions p-1 and p-2 from the SNP in the gap) suppresses minor RNase H cleavage sites on the wildtype duplex. AlleleSelect v4 adds a mod_recommendation output column recommending the appropriate backbone modification for each candidate's specific sequence context. For R192Q top candidates, the position immediately 5' of the SNP carries C, G, or A (not T), so S-cEt is recommended. When T is present, 2S-dT achieves >48-fold selectivity improvement per Ostergaard Figure 3.

Improvement 4: Minimum mismatch distance reporting

Following direct feedback from Dr. Matias Wagner's group (Helmholtz Munich), the BLAST off-target output now reports min_off_target_mismatches (minimum edit distance to any off-target sequence in GENCODE v44) and ot_risk_level (high: ≤1 mismatch; moderate: ≤2; low: ≤3; clean: >3). A raw hit count without mismatch context is uninformative. This directly addresses Wagner's primary critique of v2. The top R192Q candidates show min_mm=0 to 26-31 sequences, reflecting real shared sequence between voltage-gated calcium channel family members at the conserved S4 domain — a genuine off-target concern rather than a pipeline artifact.

Improvement 5: Wing position scoring revision (v5)

Previously, candidates with the SNP falling in the MOE wing received a score of 0.0 and were excluded. Dr. Sara Aguti (UCL, call May 2026) and Professor Elgersma (Erasmus MC, written feedback citing PMID 32092825) independently confirmed that the most selective ASO for a target can sometimes have its SNP in the wing: Aguti confirmed that the minimum requirement is four consecutive DNA nucleotides in the gap for RNase H activation, not SNP position in the gap. Wing-position SNP candidates now receive score 0.10 with snp_region = "wing_caution" and appear in output with a flag indicating experimental validation is required rather than being discarded.

Phase 9: Junction Targeting Mode - v5

Following a call with Dr. Sara Aguti (UCL GTAC, May 14 2026), AlleleSelect v5 adds a --junction-mode flag for mutations that create novel exon-exon junctions rather than single-nucleotide substitutions. Dr. Aguti's mutation of expertise, COL6A3 c.6210+1G>A, destroys the splice donor of exon 16, causing the mutant mRNA to skip that exon and create a novel exon 15/17 junction absent from all wildtype mRNA. Her ASOs target that junction, making wildtype binding structurally impossible rather than thermodynamically discriminated. AlleleSelect's ASR calculation is undefined in this context since there is no mismatch to score; selectivity derives from the junction sequence being structurally absent from wildtype.

Junction mode takes a transcript ID and skipped exon number, fetches the flanking exon sequences from Ensembl, constructs the novel mutant junction sequence, and slides ASO windows across the junction center. Each candidate is checked against the wildtype region sequence: true novel junction candidates have zero wildtype cross-reactivity. Scoring uses junction-specific composite: 50% junction specificity, 30% junction position score (how well the window spans the actual junction), and 20% RNAfold accessibility. Note that delta_G, ASR, and Tm columns are set to zero throughout junction mode output — these are undefined in this design paradigm. The relevant ranking columns are composite_score, junction_specificity, junction_pos_score, and mRNA_accessibility_score.

Validation run on COL6A3 exon 16 skipping: 405 candidates generated, 80 with zero wildtype cross-reactivity. The top candidate JM_22_82 (CTCGAATCCCAGGATTCCCTTT, 22-mer) achieves composite 0.899, junction specificity 1.0, RNAfold accessibility 0.494. All top-ranked candidates span the junction center and are confirmed absent from wildtype mRNA. A TCC toxicity flag (hepatotoxicity-associated trinucleotide, Burdick 2014) was noted on the top candidates. Lower-ranked candidates JM_18_87 (rank 34) and JM_18_79 (rank 37) carry only a single CpG note flag and represent cleaner synthesis alternatives if the TCC flag is disqualifying. Full results were shared with Dr. Aguti for comparison against her published experimental data. See below for the spreadsheet.

COL6A3_candidates_junction

AlleleSelect v5 junction mode output for COL6A3 c.6210+1G>A (exon 16 skipping), showing the top 80 candidates with junction specificity 1.0 (zero wildtype cross-reactivity). Columns: ASO ID, sequence, composite score, junction specificity, junction position score, RNAfold accessibility, toxicity flag, off-target count, wildtype cross-reactivity count. Delta_G, ASR, and Tm are zero throughout junction mode output- these quantities are undefined when selectivity derives from the junction sequence being absent from wildtype mRNA rather than from thermodynamic mismatch discrimination. Top candidate JM_22_82 (CTCGAATCCCAGGATTCCCTTT) achieves composite 0.899, junction specificity 1.0, accessibility 0.494. TCC toxicity flag noted on top candidates; JM_18_87 (rank 34) and JM_18_79 (rank 37) carry only a CpG note flag and represent cleaner synthesis alternatives.

Phase 10: External Val (ATXN1) and Corrected Thermodynamics – v6–v8

First external experimental validation (Scholten, LUMC)

Mr. Gijs-Jan Scholten (a PhD candidate in Dr. van Roon-Mom's lab, LUMC) ran AlleleSelect on ATXN1 for his SCA1 allele-selective ASO work and reported that the first 20-nucleotide gapmer produced by AlleleSelect matched the candidate that proved most effective in his initial in vitro screen. This is the first independent external confirmation that AlleleSelect's composite scoring correlates with experimental selectivity on a gene and variant outside the CACNA1A context. Specific SNP positions and ASO sequences cannot be disclosed due to pending patent considerations.

Mr. Scholten subsequently provided a full experimental dataset: 20 fixed-length 5-10-5 gapmers tested in a reporter gene assay, each corresponding to one possible position of the SNP within a 20-mer sliding across his target. For each gapmer, he measured downregulation of both the target (mutant) and non-target (wildtype) allele at 100nM, as well as IC50 dose-response for four selected candidates. A correlation analysis between AlleleSelect's triangular SNP position score and his experimental allele discrimination percentages across all 20 gapmers gives Pearson r = 0.91 (p < 0.0001, n = 20), confirming that SNP position within the gap is a strong predictor of experimental selectivity for this dataset.

AlleleSelect's top 4 predictions (gapmers 11, 10, 12, 9) all fall in the high-discrimination cluster experimentally, with discrimination percentages of 29%, 31%, 21%, and 22% respectively. The one notable miss in the top 5 was gapmer 15 (ranked #5, only 6% discrimination). More importantly, gapmer 13 (which AlleleSelect did not rank) showed the strongest IC50-based allele selectivity of all 20 gapmers (4.27-fold target vs. non-target). Gapmer 13's SNP sits at position 13 of the 20-mer, toward the 3' end of the DNA gap. The fixed triangular position score assigns this a score of 0.444 vs. 0.889 for gap-center candidates, which is why it was deprioritized. This miss directly motivates the v8 RNase H cleavage site integration described below and was probably the most valuable failure of the whole project (so far). A 0.91 correlation is great, but the one outlier pointed directly at a blind spot in my scoring logic. Science lives in the exceptions.

In a separate bug report, Mr. Scholten identified that the --gapmer-architecture 5-10-5 filter was returning zero candidates despite valid input. But this was no biggie. The issue was that the filter was looking for integer fields (wing_len, gap_len) that the modification annotator does not set (it only writes the architecture as a string like "5MOE-10DNA-5MOE"). The filter now parses that string directly and was fixed by v7.

Fixed-length and architecture filter (v6/v7)

AlleleSelect v6 adds --fixed-length INT and --gapmer-architecture WING-GAP-WING flags for direct comparison against wet-lab screens using fixed architectures. The architecture filter bug identified by Mr. Scholten was fixed in v7: the filter now parses the recommended_gapmer_pattern string rather than looking for integer fields. The command for ATXN1 with Scholten's format:

alleleselect --variant c.XXXN>Y --transcript ENST00000436367.6 --gene ATXN1 --no-splice-check --fixed-length 20 --gapmer-architecture 5-10-5 --output atxn1_20mer/

Dr. Pietrobon correspondence

Professor Daniela Pietrobon (University of Padova, developer of the R192Q knockin mouse model) confirmed that a 40-60% reduction in mutant transcripts in homozygous R192Q mice would likely produce a detectable CSD threshold increase, citing Tottene et al. 2009 (Neuron): a 43% reduction in glutamate release via subsaturating Aga doses produced a 41% CSD threshold increase in homozygous R192Q mice. The heterozygous context is less predictable because the wildtype allele continues to contribute current. Starting with homozygous mice as proof-of-concept is the recommended experimental sequence before moving to the heterozygous therapeutic context.

Dr. Fikri Birey (Emory) - organoid validation offer

Dr. Fikri Birey graciously offered to build FHM1 brain organoid models from patient-derived iPSCs and test AlleleSelect R192Q candidates in them. His lab works on CACNA1A A713T (DEE variant) and can generate cortical organoids relevant to FHM1. This would represent the first human neuronal validation of AlleleSelect candidates, in a context that mouse models cannot replicate. Top-ranked candidates with full modification annotations will be provided when his models reach a testable stage. Very excited about this! This means human neurons (even better than a mouse approximation!) in a dish, grown from a patient's own cells and carrying the actual mutation. I can't overstate how cool this would be.

Thermodynamic parameter correction

Dr. Frank Bennett (Ionis) identified that AlleleSelect's SantaLucia 1998 DNA:DNA nearest-neighbor parameters were suboptimal for a MOE PS gapmer binding an RNA target. The correct reference is Sugimoto 1995 (PMID 7545436), which measured RNA:DNA hybrid nearest-neighbor thermodynamics from 68 experimental sequences. AlleleSelect v7 implements the full Sugimoto 1995 16-parameter table as default. SantaLucia remains available via --rna-params santalucia.

Mismatch detection correction (v7)

During implementation of the Sugimoto parameter switch, a strand orientation error was identified in the mismatch detection logic of nearest_neighbor.py. The error caused the code to compare the ASO sequence against itself rather than against the mRNA target sequence, because both were being read in the same 5'→3' direction when they should be antiparallel. The consequence was that the Peyret 1999 mismatch correction was not being applied to any candidate in v1 through v6. ASR values reported in those versions, including the AS_21_564 value of -1.335 kcal/mol cited in Phase 4, were computed without the mismatch penalty and are incorrect. To be frank, this was quite embarrassing to discover. I had tested the thermodynamics module against SantaLucia's worked examples, but I never tested the mismatch correction on a real variant. The antiparallel orientation is RNA 101- I just missed it.

The corrected v7 implementation detects mismatches by comparing each ASO base against the base at the antiparallel position in the mRNA and applies the Peyret correction at that position. For R192Q, the corrected ASR for top standard gapmer candidates is approximately +0.054 kcal/mol. This small positive value accurately reflects that R192Q's G>A transition produces a G:T wobble mismatch, which is the weakest mismatch type thermodynamically and in some sequence contexts is slightly stabilizing at 37°C. This is consistent with literature on G/A transition SNPs, which are among the hardest allele-selective design targets. The candidate rankings in v1-v6 were partially preserved despite the incorrect ASR values because composite score also weights SNP position (35%) and accessibility (25%), which were computed correctly throughout.

The engineered mismatch candidates remain the primary synthesis recommendation for R192Q: their ASR of approximately -0.746 kcal/mol represents real thermodynamic discrimination achieved by adding a deliberate second mismatch against the wildtype.

In the v7 run, A454T and T501M are the most tractable allele-selective targets among the six variants, with ASR = -0.326 kcal/mol reflecting the greater destabilizing effect of the C:A mismatch produced by C>T substitutions relative to the G:T wobble from G>A transitions. T501M achieves the highest composite score of any standard gapmer run (0.530) with a PASS toxicity flag and requires ENST00000638009 rather than ENST00000360228 due to alternative exon usage.

Junction mode results (COL6A3, Phase 9) and the ATXN1 external validation (Scholten, Phase 10) are not affected by this correction. Junction mode does not use the mismatch detector, and the Scholten validation compared composite score rankings rather than absolute ASR values.

Pre-mRNA off-target flag

Dr. Stefan Hauser (DZNE Tübingen) identified that BLAST screening against GENCODE v44 mature mRNA misses intronic off-targets. A --premrna-blast flag was implemented and activates when ALLELESELECT_GENOME_FASTA points to a local hg38 FASTA.

Emilio Harris-Mostert collaboration (Erasmus MC) — v8

Emilio Harris-Mostert (PhD candidate, Elgersma lab, Erasmus MC) won first place for best oral presentation at DATS 2023 for "Rational design and testing of allele-selective gapmer antisense oligonucleotides." I reached out and scheduled a call on June 3, 2026 and his insights produced three concrete findings for AlleleSelect.

First: the G:T wobble mismatch (produced by R192Q's G>A transition) is empirically not as weak as thermodynamics predicts. Emilio's experimental data shows G:T wobble can give surprisingly strong allele selectivity, possibly because of structural effects on the RNase H active site that thermodynamic penalties do not capture. The ASR of +0.054 kcal/mol is thermodynamically correct but may underestimate practical selectivity. This was a relief to hear, since we trust cells more than predictions.

Second: the optimal placement for a deliberate engineered mismatch is toward the 5' half of the DNA gap, near where RNase H is predicted to cut. Two valid strategies exist: clustering the mismatch close to the existing SNP mismatch (AlleleSelect's current default, appropriate for most cases), or placing it specifically in the 5' half of the gap near the predicted RNase H cleavage site (v8 alternative mode, not yet implemented pending Emilio's benchmark data).

Third: the fixed triangular SNP position score should be replaced with a sequence-dependent prediction based on where RNase H1 is likely to cut in each specific sequence context, not a fixed peak at gap center.

Mr. Harris-Mostert provided the Kielpinski 2017 paper (PMID 29126318) and confirmed his own tool uses the R4b dinucleotide position weight matrix (PWM) with cleavage assigned between positions 7 and 8 of a 9-nucleotide window. He will provide a list of allele-selective ASO sequences with measured knockdown data for benchmark validation.

RNase H1 cleavage site scoring module (v8)

AlleleSelect v8 adds alleleselect/scoring/rnase_h_scorer.py, implementing the Kielpinski 2017 R4b dinucleotide PWM for human RNase H1. The module contains four functions: score_sequence_window scores a 9-nucleotide RNA window; score_rnase_h_cleavage returns cleavage efficiency scores for all positions in a DNA gap; get_optimal_cleavage_position identifies the position most likely to be cut; and compute_snp_rnase_h_position_score scores each candidate's SNP based on how efficiently RNase H cuts at that position, normalized across the gap. This might fix the gapmer 13 miss since instead of assuming the center is always best, we can actually predict where RNase H will cut in that specific sequence.

Activated via --rnase-h-scoring flag. Default triangular behavior is unchanged.

Testing on R192Q revealed that the v8 scoring does not differentiate candidates for this variant: position 574 in the mRNA is the optimal RNase H cleavage site for the R192Q sequence context, and all competitive candidates have their SNP at or near that position, so all receive a high score. The v8 scorer is working correctly — R192Q is a degenerate case where the SNP coincidentally sits at the enzymatic preference peak. Differentiation is expected for variants where the optimal cleavage site and the SNP location do not overlap across the candidate window. Gijs-Jan Scholten has been asked to rerun his ATXN1 target with --rnase-h-scoring to test whether v8 corrects the gapmer 13 miss; results pending.

N1C presentation (June 9, 2026)

I presented AlleleSelect at the N-of-1 Collaborative preclinical working group at Dr. Willeke van Roon-Mom's invitation. My presentation covered pipeline architecture, pre-computed results across six CACNA1A variants and one ATXN1 target, honest gaps in experimental validation, and three asks to the group: experimental ASO selectivity data for benchmarking, empirical observations on which scoring factors mattered experimentally, and feature requests. Dr. van Roon-Mom's lab is actively using AlleleSelect; a benchmark publication is under consideration contingent on data availability from the working group.

6 CACNA1A Variants - Pre-Computed.pdf

v7 corrected results for all six CACNA1A FHM1 variants and R192Q engineered mismatch candidates. All runs use Sugimoto 1995 RNA:DNA hybrid thermodynamic parameters with corrected mismatch detection (v7). Off-target counts from BLASTn against GENCODE v44 (all 210 candidates screened). T501M uses ENST00000638009 (CACNA1A-256) rather than ENST00000360228 due to isoform-specific amino acid numbering. ASR values for G>A variants (R192Q, S218L, G293R, R1349Q) are positive and small, reflecting the weak thermodynamic discrimination of the G:T wobble mismatch type. ASR values for C>T variants (A454T, T501M) are negative at -0.326 kcal/mol, reflecting the greater destabilizing effect of the C:A mismatch produced by a C>T substitution.

External experimental validation on ATXN1 (Scholten, LUMC). 20 fixed-length 5-10-5 gapmers tested in a reporter gene assay. AlleleSelect rankings shown where assigned. Pearson r = 0.91 between AlleleSelect SNP position score and experimental discrimination (p < 0.0001, n = 20). Gapmer 13, best by IC50 (4.27-fold), not ranked by AlleleSelect v7: motivates v8 RNase H cleavage site integration. ASO sequences and SNP details not disclosed due to pending patent considerations.

Closing Remarks

These sequences, if synthesized and injected into an R192Q knockin mouse, might reduce cortical spreading depression frequency. The ASO binds the mutant mRNA, recruits RNase H, degrades the transcript, less mutant Cav2.1 channel is made, less gain-of-function current flows, less cortical spreading depression. Whether any of the sequences AlleleSelect produces will survive synthesis, transfection, cellular pharmacology, and in vivo delivery to produce that effect is a question I cannot answer computationally. It requires a wet lab, a dry mouse, and a seasoned researcher.

AlleleSelect is my half of that conversation. The computation is done as carefully as I know how to do it, the methods are documented, the code is open source, and the R192Q output is pre-computed and ready to send. The other half of the conversation, if it happens, will happen in someone else's lab, possibly in a city I have never been to, in a context I cannot predict.

That is fine. That is how science is supposed to work. Research is not a solo endeavor even when it is built by one person. The papers AlleleSelect relies on were written by dozens of researchers over decades. The mouse model was built by a lab in Leiden. The clinical validation of intrathecal ASO delivery happened at Biogen and Ionis. AlleleSelect is one more node in a graph of accumulated knowledge, doing a specific computation that was not being done before, in the hope that it leads to some semblance of a cure.

Cheers,
Angie X.

This project is open source at github.com/axshoe/AlleleSelect.

Page updated

Google Sites

Report abuse