AlleleSelect

Allele-Selective ASO Design Pipeline for CACNA1A Gain-of-Function Mutations

March 2026 - Current

Introduction

There is a class of genetic disease where the treatment problem is not making something happen but making something stop. The gene in question is not absent; it is present on both chromosomes. One copy is functioning normally, keeping an essential biological process running. The other copy carries a mutation that makes it overactive, and the overactivity is the disease. The obvious therapeutic move, suppressing expression of the relevant gene, immediately runs into a hard constraint: the normal copy is doing something you cannot afford to lose.

Familial hemiplegic migraine type 1 is one of these diseases. CACNA1A encodes the Cav2.1 calcium channel, the primary mediator of neurotransmitter release at cerebellar and cortical synapses. R192Q, the variant at the center of this project, is a heterozygous gain-of-function missense mutation at position 575 of the CDS. It converts an arginine in the S4 voltage sensor to glutamine, lowering the activation threshold of the channel and increasing presynaptic calcium influx. The result is enhanced cortical spreading depression susceptibility and, in individuals who carry it, familial hemiplegic migraine.

The wildtype allele at the CACNA1A locus is vital. Complete knockdown of CACNA1A expression in mouse models produces severe cerebellar ataxia and early death. The therapeutic window for any gene-silencing strategy is therefore defined by how selectively it can suppress the mutant allele while leaving the wildtype allele intact.

AlleleSelect is a Python CLI tool that directly addresses this constraint. It computes, for every candidate antisense oligonucleotide (ASO) window sliding across a CACNA1A gain-of-function mutation site, the difference in binding free energy between mutant and wildtype mRNA targets. This difference, the allele selectivity ratio, is the core metric. More negative means the ASO prefers the mutant. The pipeline integrates four additional scoring layers: mRNA accessibility from RNAfold partition function, off-target screening via BLASTn against the GENCODE v44 human transcriptome, splice site proximity flagging, and gapmer modification pattern annotation. The output is a ranked candidate CSV and an interactive HTML report, pre-run for R192Q and ready to share with wet-lab collaborators.

This project is the 6th project of my CACNA1A/FHM1 series, and is a natural extension of MiSOF. However, AlleleSelect is categorically different from those projects in that it designs, rather than merely characterizing/stratifying. I built it with the explicit intent that its output might one day be synthesized, injected into an R192Q knockin mouse, and tested. Hopefully, though it may be a long shot, this might lead to a cure.

Cheers,
Angie X.

Note: this project is actively updated. Apologies for any content gaps during this time.

AlleleSelect: A Layman's Guide

Most people who hear 'gene therapy' picture replacing a broken gene with a working copy. That is one approach, and it works well for diseases where the gene is simply absent or non-functional. FHM1 is not that kind of disease. The gene is present. Both copies are present. The problem is that one of the two copies has a mutation that makes the channel it encodes hyperactive, and the hyperactivity is what causes the disease. The treatment challenge is not addition but subtraction: reduce the activity of the mutant copy without touching the normal one.

Antisense oligonucleotides are short synthetic strands of DNA-like molecules, typically 18-22 nucleotides long. They are designed to bind a specific sequence in a target messenger RNA (mRNA, the intermediate molecule that carries genetic instructions from DNA to protein). When an ASO binds its target mRNA, it recruits a cellular enzyme called RNase H, which cuts the mRNA at the binding site. The mRNA is degraded, the protein is not produced, and the ASO is released to repeat the cycle.

For a heterozygous mutation like R192Q, the question is whether you can design an ASO that binds the mutant mRNA significantly more tightly than the wildtype mRNA. The mutant and wildtype sequences differ by a single nucleotide at position 575: G in the wildtype, A in the mutant. An ASO designed to complement the mutant sequence will have a deliberate mismatch when it tries to bind the wildtype sequence at that position. The magnitude of this mismatch penalty, in thermodynamic terms, is what AlleleSelect computes.

The pipeline uses the nearest-neighbor model, a method from 1998 that calculates the stability of a short nucleic acid duplex by summing the stacking interactions between each adjacent pair of bases. For the wildtype binding calculation, it applies a mismatch correction that accounts for the specific destabilizing effect of the particular mismatch type (G>A at position 575 of CACNA1A produces a T:G wobble mismatch when the mutant-targeted ASO tries to bind wildtype, which is less destabilizing than, say, a purine-purine clash). The allele selectivity ratio is simply the wildtype binding energy minus the mutant binding energy. Positive difference means the ASO binds mutant more tightly; more negative difference means stronger preference.

Two additional factors matter for whether an ASO will work in a cell. First, the target mRNA has to be physically accessible: mRNA folds into secondary structures (stem-loops, hairpins) that can bury potential binding sites. AlleleSelect uses a computational tool called RNAfold to predict which regions of the CACNA1A mRNA are single-stranded and accessible. Second, the ASO should not inadvertently bind other mRNAs in the genome. AlleleSelect screens each top candidate against the complete catalog of human transcripts using sequence alignment and flags any candidate with suspicious matches.

The output is a ranked list of candidate sequences, with the top-ranked sequences representing the best theoretical combination of mutant preference, mRNA accessibility, and transcriptome specificity. These sequences, if synthesized and validated in cells and then tested in the R192Q knockin mouse model, could tell us whether allele-selective ASO therapy is a viable approach for FHM1.

AlleleSelect Concepts Reference - A. Xiu.pdf

PHASE 1: The Constraint That Changes Everything

Before the code, let’s start with biology. AlleleSelect is entirely determined by one biological fact.

CACNA1A encodes the alpha-1A subunit of the Cav2.1 P/Q-type calcium channel. Cav2.1 is the dominant calcium channel at cerebellar Purkinje synapses, at cortical glutamatergic synapses, and at the neuromuscular junction. Every time a neuron in the cerebellum fires and needs to release neurotransmitter, calcium entry through Cav2.1 is the proximate trigger. The channel is so incredibly central to normal CNS function that CACNA1A knockout mice have severe cerebellar ataxia and die early.

R192Q is heterozygous. One allele has the mutation and one does not. The normal allele is doing exactly what normal Cav2.1 is supposed to do. The mutant allele is doing it too vigorously, because the arginine-to-glutamine substitution at position 192 destabilizes the S4 voltage sensor in domain I, shifting the channel's activation voltage by roughly -4 mV and increasing open probability at rest. The result is elevated presynaptic calcium and increased neurotransmitter release probability, which lowers the threshold for the sustained depolarization wave known as cortical spreading depression.

Any therapeutic approach that reduces total CACNA1A expression non-selectively will simply trade one problem for another. Get the dose wrong and you have both fewer CSD episodes and worse cerebellar ataxia. The therapeutic window is narrow, and it narrows further in an adolescent whose cerebellum is still maturing.

This is why allele-selective silencing exists as a therapeutic strategy. It is not a preference for elegance over simplicity. It is a hard requirement imposed by the biology. The tool had to be built to compute allele selectivity, specifically, because nothing else makes the therapeutic problem tractable.

The ASO field validated the general approach with nusinersen (2016) and tofersen (2023), both delivered intrathecally for CNS indications. The allele-selective variant was demonstrated in SCN2A gain-of-function epilepsy mouse models in 2021. CACNA1A FHM1 is the same logical structure: heterozygous GoF mutation in an essential neural gene, allele-selective ASO delivered to the CSF via intrathecal injection. AlleleSelect computes the selectivity ratios. The rest is wet lab.

PHASE 2: Learning RNA Thermodynamics from a 27-Year-Old Table

The mathematical core of AlleleSelect is the SantaLucia 1998 nearest-neighbor model for nucleic acid duplex thermodynamics. I want to explain what I learned from its implementation, because the physics is quite interesting and the history of how these numbers came to exist is also…quite interesting.

Short nucleic acid duplexes are not stabilized by the sum of individual base-pair hydrogen bond energies. They are stabilized primarily by base stacking: the aromatic pi systems of adjacent nucleotide bases overlap and interact, contributing enthalpy and constraining the entropic freedom of the single strands. A GC base pair next to an AT base pair has different thermodynamic properties than a GC pair next to a GG mismatch, even though the GC pair itself is identical in both cases. This is the physical content of the nearest-neighbor model: context matters, and the relevant context unit is the dinucleotide pair.

SantaLucia (1998) ran melting experiments on 108 precisely designed short DNA duplexes and extracted, by fitting a two-state thermodynamic model, the enthalpy and entropy contributions for each of the 16 possible nearest-neighbor dinucleotide pairs. Table 2 of that paper is 16 rows. It is what AlleleSelect uses. It is 27 years old. When I first pulled it up and copied the values into nearest_neighbor.py, I had an unexpected reaction, which was something between appreciation and irritation. The field of computational drug design has produced an enormous amount of increasingly sophisticated methodology over the past three decades, and the question of which strand of DNA binds which strand better is still answered, in 2026, by a table published in a journal a year before I was born.

The nearest-neighbor sum gives delta-H and delta-S for the duplex, from which delta-G at 37 degrees C and the melting temperature (Tm) are derived. For the Tm formula specifically, the gas constant R and the total strand concentration CT enter as parameters. For ASO design at cellular concentrations (roughly 1 micromolar), the Tm calculation is a useful check but not the primary metric. Delta-G at 37 degrees C is what matters; Tm is informative for synthesis quality control.

The mismatch corrections required a second paper: Peyret et al. (1999), which systematically measured the delta-delta-H and delta-delta-S corrections for all 16 internal DNA/DNA mismatch types. For R192Q specifically, the relevant mismatch is T:G wobble (the ASO carries T at the mutation position to complement mutant A; wildtype carries G, so binding to wildtype produces a T:G wobble). Wobble mismatches have a negative delta-delta-H, meaning they are less destabilizing than complete mismatches. This is a meaningful biochemical fact for the design: it means that the allele selectivity will be smaller in magnitude than it would be for a variant producing, say, an A:C or A:A mismatch. The pipeline still finds useful candidates, but the biology makes the problem slightly harder than the ideal case.

Implementing both papers from scratch took approximately 2 and a 1/2 days of careful work. The unit tests validate the nearest-neighbor implementation against the worked examples in SantaLucia 1998 Table 4 and validate the mismatch penalties against the expected ordering of mismatch severity from Peyret 1999. Twenty-seven passing tests for just the thermodynamics modules.

PHASE 3: Building the Pipeline

With the thermodynamic core working, the pipeline builds outward in layers. Each layer addresses one additional factor that separates theoretical binding affinity from practical knockdown efficiency in a cell.

The sequence retrieval layer uses the Ensembl REST API to fetch the CACNA1A CDS by transcript ID (ENST00000360228.10, the canonical transcript). The HGVS parser handles the mutation notation (c.575G>A), validates the reference base against the fetched CDS, and applies the substitution to generate the mutant sequence. This is simpler than it sounds in principle, but gets finicky in practice: Ensembl transcript IDs have version suffixes, the REST API returns slightly different fields depending on request parameters, and there are encoding edge cases in older transcript records. The implementation handles all of these and fails with informative error messages rather than silently producing wrong sequences.

The candidate window generator slides 18, 19, 20, 21, and 22-nt windows across +/- 30 nucleotides around the mutation site. For each window position and length, the ASO sequence is defined as the reverse complement of the mutant target window, the allele selectivity ratio and mismatch position scores are computed, and the results are collected. This produces a few hundred candidates per run.

The accessibility layer calls RNAfold via subprocess on a +/- 200 nt window around the mutation site, with partition function enabled. RNAfold's _dp.ps output is a PostScript-formatted dot plot; the base-pair probabilities are stored as square roots in the ubox entries of that file. The parser extracts these, squares them to recover true probabilities, sums them per position, and computes 1 - sum as the per-position accessibility. Window accessibility is the mean across the ASO binding region. Parsing PostScript by regex is not glamorous, but it works reliably and avoids a heavier dependency on ViennaRNA's Python bindings.

The off-target layer was the most infrastructure-heavy part of the build. GENCODE v44 is approximately 4 GB compressed. Building the local BLASTn database takes 5-10 minutes the first time and is cached for subsequent runs. Running BLASTn in short-sequence mode on 50 candidates takes another 2-3 minutes. The GENCODE transcript IDs in BLASTn subject fields use a pipe-delimited format (ENST|ENSG|source|gene_name|transcript_name|...) from which gene names are extracted by field index. CACNA1A self-hits are filtered out.

The splice risk module uses a combination of hardcoded approximate exon boundary positions for the R192Q region and the ability to load full positions from the GENCODE GTF annotation file when available. The hardcoded fallback covers the common case (running the R192Q pre-computation) without requiring the 1.5 GB GTF download.

The modification annotator applies the published Ionis gapmer design rules: 5-10-5 architecture for 20-mers, proportional adjustment for shorter oligos, MOE modification for sequences with GC > 40%, LNA for lower-GC sequences, and a PyPy dinucleotide toxicity flag in the 5' flank. The gapmer pattern string (e.g. 'mmmmmddddddddddmmmmm') encodes the modification at each position.

The final ranking combines allele selectivity ratio, accessibility, off-target count, and splice risk into a composite priority score. The HTML report renders with Plotly for the scatter plots and a vanilla SVG arc diagram for the mRNA secondary structure visualization. The CSV is self-contained and formatted for direct import into any spreadsheet or analysis environment.

PHASE 4: The R192Q Candidates

The pre-computed run for R192Q produces candidates across all five window lengths. I want to walk through what the output actually shows, because the numbers have biological meaning that deserves more than a table entry.

The top-ranked candidates cluster around a window where the R192Q mutation falls at positions 8-12 of the ASO. This is expected: the mismatch position penalty penalizes candidates where the G>A substitution falls near the termini, and the SantaLucia plus Peyret corrections produce the largest ASR magnitude at the center of the window. The top candidates have allele selectivity ratios between -1.5 and -1.6 kcal/mol, accessibility scores above 0.65, and zero BLASTn off-target hits. These are the sequences to synthesize first.

The T:G wobble mismatch issue I described in Phase 02 shows up clearly in the output. The ASR magnitude for R192Q candidates is moderate compared to what would be expected for a variant producing a purine-purine mismatch. An A:A mismatch has a delta-delta-H of +4.7 kcal/mol; the T:G wobble has -1.5 kcal/mol. The effective discrimination between alleles at 1 micromolar strand concentration, for a candidate with ASR = -1.5 kcal/mol, translates to roughly 10-12 fold preferential binding to mutant. In a cell with both alleles present at roughly equal abundance, this means the mutant transcript will be preferentially degraded but not exclusively. Wildtype degradation at therapeutic concentrations will not be zero.

The accessibility scores vary significantly across window positions. Windows overlapping the predicted stem regions of the CACNA1A mRNA secondary structure around position 575 score below 0.4 and are deprioritized. The top candidates fall in the predicted single-stranded loop region immediately surrounding the mutation site, which is consistent with the general principle that mutation sites in missense variants are often slightly more accessible than the flanking sequence (since they are by definition not perfectly base-paired to any complementary intramolecular sequence).

The off-target landscape for CACNA1A candidates is cleaner than I expected. The gene has multiple voltage-gated calcium channel paralogs in the human genome, but the sequence around position 575 in exon 4 is not conserved across the paralogs at the level of 80% identity over 14+ consecutive nucleotides. The top candidates have zero BLASTn hits outside CACNA1A itself. This does not mean off-target effects are impossible in a cell; it means there are no obvious sequence-based red flags.

The demo output in demo/R192Q_output/ contains the full candidates.csv and report.html generated from this run. The HTML report is interactive: clicking column headers sorts the table, and hovering over scatter plot points shows the full candidate profile. The top 5 candidates are highlighted in teal in both the table and the scatter plot.

PHASE 5: What Open-Source ASO Design Means

ASOG, published in September 2025 from the Bordeaux structural bioinformatics group, is a real and useful ASO design tool. It generates candidates, checks off-targets via BLASTn, assesses splice site proximity, and produces thermodynamic scores. I want to be precise about what AlleleSelect adds, because the claim that it fills a gap requires the gap to actually exist.

ASOG does not compute allele selectivity. It takes a single target sequence and generates ASOs for it. For a gain-of-function heterozygous mutation where the therapeutic constraint is differential allele knockdown, ASOG produces the same candidates regardless of whether the input is the mutant or wildtype sequence. It has no mechanism to rank candidates by how much they prefer the mutant allele over the wildtype. AlleleSelect's core contribution is that it computes delta-G for both alleles for every window and ranks by the difference. This is the step that is missing for FHM1 ASO design, and it is what the van den Maagdenberg lab would need to select candidates for in vivo testing.

The output of AlleleSelect for R192Q is a CSV file and an HTML report. Both are formatted for direct sharing with wet-lab collaborators. The CSV has all the thermodynamic parameters, scores, and annotations needed for a synthesis order. The HTML report is self-contained and renders in any browser without any dependencies. The intent was to make the output as frictionless as possible for a researcher receiving it cold: open the HTML file, look at the scatter plot, read the top 5 rows, and know which sequences to order.

As with every Xiu Lab project, the standard disclaimer applies: I am a high school student building research tools without institutional access, wet lab facilities, or a research budget. AlleleSelect is not production-grade drug design software. It is a computational pipeline built on published methods, validated against the primary literature, and designed to produce output that is interpretable and actionable by the researchers who would actually do the experimental work. If someone synthesizes a top-ranked candidate and tests it in an R192Q mouse and it does something interesting, that would be a nice outcome. If the output helps clarify the design space even without being directly tested, that is also something.

Closing Remarks

These sequences, if synthesized and injected into an R192Q knockin mouse, might reduce cortical spreading depression frequency. The ASO binds the mutant mRNA, recruits RNase H, degrades the transcript, less mutant Cav2.1 channel is made, less gain-of-function current flows, less cortical spreading depression. Whether any of the sequences AlleleSelect produces will survive synthesis, transfection, cellular pharmacology, and in vivo delivery to produce that effect is a question I cannot answer computationally. It requires a wet lab, a dry mouse, and a seasoned researcher.

AlleleSelect is my half of that conversation. The computation is done as carefully as I know how to do it, the methods are documented, the code is open source, and the R192Q output is pre-computed and ready to send. The other half of the conversation, if it happens, will happen in someone else's lab, possibly in a city I have never been to, in a context I cannot predict.

That is fine. That is how science is supposed to work. Research is not a solo endeavor even when it is built by one person. The papers AlleleSelect relies on were written by dozens of researchers over decades. The mouse model was built by a lab in Leiden. The clinical validation of intrathecal ASO delivery happened at Biogen and Ionis. AlleleSelect is one more node in a graph of accumulated knowledge, doing a specific computation that was not being done before, in the hope that it leads to some semblance of a cure.

Cheers,
Angie X.

This project is open source at github.com/axshoe/AlleleSelect.

Page updated

Google Sites

Report abuse