VectoSelect

CACNA1A Variant Prioritization for Gene Therapy Study Design

May 2026 - Current

Introduction

Gene therapy for CACNA1A channelopathies exists as a real possibility for the first time. The 2024 paper from Samuel Young Jr.'s lab at UNC Chapel Hill demonstrated that helper-dependent adenoviral vectors can transduce Purkinje cells in humanized CD46 mice with meaningful efficiency. This had been a 30-year unsolved problem. Purkinje cells are large, highly arborized neurons that are notoriously difficult to reach with conventional viral vectors, and they are also the cells most directly implicated in the cerebellar ataxia that defines the severe end of the CACNA1A disease spectrum.

The delivery problem being partially solved creates the next bottleneck: which variants to model, in which order, and with what therapeutic strategy. CACNA1A is not a single disease. The same gene causes pure hemiplegic migraine, episodic ataxia, spinocerebellar ataxia, and developmental epileptic encephalopathy depending on whether the mutation causes gain or loss of function, where it sits in the channel structure, and how severely it disrupts calcium signaling. A gene therapy approach that makes sense for a loss-of-function EA2 variant may be inappropriate for a dominant gain-of-function FHM1 variant. The therapeutic target tissue also shifts with the clinical phenotype. These distinctions matter enormously for study design, and no computational tool had addressed them for CACNA1A specifically.

VectoSelect is the seventh project in my CACNA1A/FHM1 series. It integrates four scoring modules: clinical severity classification (ClinSev), biophysical severity scoring from published patch-clamp electrophysiology (BioSev), mouse model availability checking (ModelCheck), and gene therapy tractability assessment (GTScore). The output is a ranked table of variants with a composite priority score and an HTML report with embedded figures.

This project was built for Dr. Samuel Young at UNC and for the CACNA1A Foundation research community. It also addresses an initial, specific limitation in ChanVar, my existing variant pathogenicity tool: ChanVar scored S218L and R192Q nearly identically on its composite pathogenicity scale despite S218L causing fatal cerebellar ataxia and R192Q causing pure hemiplegic migraine. VectoSelect corrects this by scoring the actual biophysical severity difference.

Cheers,
Angie X.

Note: this project is actively updated. Apologies for any content gaps during this time.

VectoSelect: A Layman's Guide

CACNA1A encodes a calcium channel that sits in the membranes of neurons in the brain and cerebellum. Its job is to open when a neuron fires, letting calcium ions flood in, which triggers neurotransmitter release to neighboring neurons. When this channel has the wrong amino acid at a critical position, it either opens too easily (gain-of-function) or fails to open at all (loss-of-function). Five different diseases can result from these failures, ranging from episodic hemiplegic migraines to progressive cerebellar degeneration.

Gene therapy is the attempt to correct the genetic error at the source. For loss-of-function variants, this usually means delivering a working copy of the gene into the affected cells. For gain-of-function variants, the strategy is more complicated: you can try to add back a quieter wild-type copy to dilute the overactive mutant, or you can use antisense oligonucleotides to selectively silence the mutant allele. The therapeutic approach depends entirely on which variant you're dealing with and which cells it is harming.

VectoSelect takes a list of CACNA1A variants and answers four questions about each one. First: how severe is the clinical phenotype, and which disease category does it belong to? Second: how severe is the channel dysfunction at the biophysical level, based on published electrophysiology? Third: does a mouse model for this variant already exist? Fourth: how tractable is this variant for the specific gene therapy approaches currently available? The composite score from these four questions gives a ranked priority list for deciding which variants to model first.

The tool was designed specifically for Dr. Young's lab at UNC Chapel Hill, which has a working cerebellar delivery system but needs to decide which variants to put through the expensive and time-consuming knockin mouse development pipeline. An individual knockin mouse model costs roughly $150,000-$200,000 and takes 18-24 months to produce. Getting that decision right matters.

PHASE 1: The Scoring Problem With ChanVar

The starting point for VectoSelect was an initial problem in the existing pipeline. ChanVar, which I built as Layer 1 of the Migraine Stratification Outcomes Framework (MiSOF), scores CACNA1A missense variants on 9 structural and evolutionary features and assigns a composite pathogenicity score (CPS). It also runs a GoF/LoF classifier trained on 28 labeled variants with leave-one-out cross-validation.

ChanVar is accurate at what it was designed to do. On the 93-variant ClinVar validation set, it achieves AUC 0.751 for pathogenicity discrimination. But on the specific question of distinguishing severity within the GoF class, it fails in a medically relevant way. R192Q and S218L score 0.816 and 0.817 on the CPS, respectively. From a pathogenicity standpoint, these are both clearly pathogenic GoF variants. From a disease severity standpoint, they represent opposite ends of the FHM1 spectrum. R192Q causes pure hemiplegic migraine with no cerebellar involvement. S218L causes hemiplegic migraine attacks plus progressive cerebellar ataxia, potentially fatal coma, and permanent cerebellar structural damage in some patients.

The reason ChanVar misses this is that its features were not designed to capture electrophysiological severity magnitude. FoldX predicted stability change, gnomAD allele frequency, and CADD-PHRED are all roughly similar for two missense variants in the same domain class that both clearly disrupt channel function. The difference between them shows up in patch-clamp recordings: S218L shifts the V1/2 activation curve by approximately -12 mV, while R192Q shifts it by approximately -4.5 mV. This difference predicts the difference in CSD threshold reduction in knockin mice. ChanVar never sees this number because patch-clamp data is not in its input feature set.

VectoSelect was built to solve that problem. The biophysical score module pulls from a curated database of published electrophysiology, with the V1/2 shift as its primary feature weighted at 40%. The result: S218L scores 0.83 on BioSev, R192Q scores 0.25. The clinical severity tier correctly separates them into Tier 2 and Tier 1. The composite priority score puts S218L at #1 out of 7 variants analyzed.

PHASE 2: Building the Biophysical Database

The scientific foundation of VectoSelect is a curated database of published patch-clamp electrophysiology for CACNA1A variants. Building this required reading the primary literature rather than relying on any existing resource, because no public database aggregates CACNA1A-specific biophysical parameters in a machine-readable format.

The core sources are the Pietrobon lab papers from 2002-2020 and the van den Maagdenberg lab papers. Tottene et al. 2002 is the most comprehensive single source: it reports V1/2 shifts, peak current ratios, and some persistent current data for R192Q, T666M, K1336E, V714A, and D715E expressed in Xenopus oocytes and cortical neurons. The S218L biophysical data comes primarily from the van den Maagdenberg 2010 paper and associated recordings.

The database schema stores five values per variant: V1/2 activation shift in mV with uncertainty range, peak current ratio (mutant/WT), persistent current fraction, recovery from inactivation tau in milliseconds, and the expression system used. The expression system flag matters because Xenopus oocyte values and HEK293 values differ systematically, and both differ from recordings in actual neurons. These are not interchangeable numbers. Every entry also carries source PMIDs so the provenance is traceable.

For the nine variants with measured data, BioSev uses the measured values directly. For variants without published electrophysiology, BioSev falls back to the population mean of the measured variants and flags the output as 'predicted' with low confidence. This is not a structural prediction — it is a conservative fallback that correctly signals uncertainty. A proper structural prediction would require integrating ChanVar's pore-axis distance and domain features with the regression, which is a reasonable extension for the next version.

One thing I found consistently interesting in this literature: the expression system experiments are described in very precise procedural terms but the interpretation of what the numbers mean at the level of in vivo neural circuit function is always handled carefully and with appropriate hedging. The Tottene papers, in particular, go out of their way to acknowledge that oocyte recordings don't recapitulate the exact membrane environment of a Purkinje cell. This epistemic caution is part of what makes these papers credible, and it is also the reason VectoSelect carries confidence flags rather than presenting scores as absolute.

PHASE 3: Clinical Severity Classification and Mouse Model Integration

Module 1 (ClinSev) classifies each variant into one of five severity tiers. The tier assignment for known variants uses a curated ground-truth database derived from the clinical literature. For unknown variants, the tier is inferred from the V1/2 shift threshold and GoF/LoF classification, supplemented by a PubMed keyword scan of the top 5 abstracts for that variant.

The two-tier FHM1 schema (pure HM vs. severe complex) is drawn from Dr. van den Maagdenberg's clinical classification. The key biophysical threshold is the -8 mV V1/2 shift boundary. Variants with shifts more negative than -8 mV and GoF classification are flagged as Tier 2 candidates. This threshold is not arbitrary: -8 mV sits roughly midway between the R192Q shift (-4.5 mV) and the S218L shift (-12 mV), and it correlates with the boundary above which cerebellar involvement is consistently reported in the literature. The threshold requires validation against a larger variant set as more biophysical data becomes available.

Module 3 (ModelCheck) queries the curated mouse model database and optionally the IMSR API at findmice.org. The curated database includes R192Q knockin (van den Maagdenberg/Pietrobon, 2004), S218L knockin (van den Maagdenberg, 2010), tottering mouse (P601L, JAX strain 000561), leaner mouse (splice site, JAX strain 000551), and rolling nagoya mouse (R1262G, RIKEN BRC). These cover the range from mild GoF (tottering, EA2-like) to severe LoF (leaner, cerebellar degeneration).

The IMSR query runs as a supplementary check because new strains are deposited continuously. In offline mode (--no-imsr), the curated database provides coverage for all major published models through 2024.

PHASE 4: Gene Therapy Tractability Scoring

GTScore is Module 4 and the most assumption-laden part of VectoSelect. It evaluates whether a given variant and disease category is tractable for the specific gene therapy approaches currently available, with direct attention to Young's HdAd Purkinje cell delivery system.

For LoF variants, the logic is relatively clean. EA2 and related haploinsufficiency conditions are straightforward replacement candidates: add a functional copy, restore channel expression, rescue Purkinje cell pacemaking. SCA6 is explicitly routed to the small_molecule_preferred recommendation because the polyglutamine expansion mechanism involves a toxic gain-of-function from the expanded protein rather than simple channel loss. Adding back a WT copy does not address the toxic species.

For GoF variants, the scoring reflects two competing considerations. First, Tier 2 variants (cerebellar involvement) receive a 20-point tractability bonus because Young's HdAd vectors provide a delivery mechanism that was previously unavailable. This is arguably the most important single development in CACNA1A gene therapy and directly motivated prioritizing Tier 2 variants in the score formula. Second, variants with V1/2 shifts more negative than -10 mV are flagged for combination therapy (silencing + replacement) rather than replacement alone, because at that level of dominant GoF the mutant protein may be too active to be diluted by adding WT copies.

The combination therapy recommendation points to AlleleSelect, my earlier project that designs allele-selective gapmer ASOs for CACNA1A R192Q. The same design pipeline could be extended to S218L, which has a different SNP position. VectoSelect and AlleleSelect are complementary tools: VectoSelect prioritizes which variants to target, AlleleSelect designs the therapeutic molecule for those with dominant GoF characteristics.

PHASE 5: Composite Scoring and Report Generation

Module 5 integrates all four scoring components into a ranked priority table. The formula weights are: BioSev 30%, ClinSev 25%, ModelCheck 25%, GTScore 20%. The HTML report is generated with embedded SVG figures rather than external image dependencies, which means it opens correctly in any browser without needing the original data files.

On the 7-variant benchmark set (R192Q, S218L, T666M, R583Q, K1336E, V714A, R1668W), the composite score produces the following ranking: S218L at 0.784, R1668W at 0.611, T666M at 0.553, R192Q at 0.510, K1336E at 0.397, V714A at 0.359, R583Q at 0.358.

S218L at #1 is scientifically correct. It has the most severe biophysical profile, the most severe clinical phenotype, an existing mouse model, and the highest GT tractability because of its cerebellar involvement. R192Q at #4 reflects its pure HM phenotype (no cerebellar bonus), mild biophysical severity, and the fact that it already has a well-characterized model, which makes it less novel as a study design target even though it is biologically important.

R1668W at #2 is worth noting. It has no curated mouse model, which subtracts from its ModelCheck component, but its V1/2 shift of -8 mV places it at the Tier 2 boundary with cerebellar involvement reported in some patients. If the Young lab were to develop a knockin for a currently unmodeled severe FHM1 variant, R1668W is a reasonable candidate.

All 37 unit tests pass. The key correctness tests are: S218L BioSev > R192Q BioSev (by at least 0.30 units), R192Q ClinSev tier = 1, S218L ClinSev tier = 2, R192Q ModelCheck model_exists = True, S218L GTScore tractability = high, and the full pipeline integration test confirming S218L ranks above R192Q in composite score.

Closing Remarks

The CACNA1A field is in an unusual spot. The delivery problem for Purkinje cell gene therapy is, for the first time, potentially solved. The next constraint is experimental prioritization: which of the hundreds of known CACNA1A variants should go through the enormously expensive process of knockin mouse development and therapeutic characterization first. VectoSelect is an attempt to make that prioritization decision more systematic. The tool does not replace clinical judgment or experimental expertise, and it does not know which variants a given lab already has in their colony, which regulatory pathway is most practical, or what the patient advocacy priorities are for a given disease subtype. It merely aggregates the published literature and existing models into a single output which may inform those convos.

The V1/2 shift difference between R192Q and S218L is about 7-8 millivolts. That is a small number in absolute terms. 7 millivolts is less than the noise in a typical action potential recording- but because the Boltzmann activation curve is steep in the physiological voltage range, those 7 millivolts translate into a substantially different fraction of channels open at rest, a substantially different amount of calcium entering per action potential, and ultimately a substantially different CSD threshold in the living brain. The relationship between the small number and the large clinical consequence is not intuitive, and it is why the electrophysiology literature is so careful to report precise values with proper uncertainty bounds.

Building VectoSelect required learning this relationship well in order to encode it correctly. Calibrating the -8 mV severity threshold necessitated inspecting many papers about mouse brains, human patients, patch-clamp oocyte recordings, and trying to understand how the numbers connect. The tool is better for such process, and I am too.

Cheers,
Angie X.

This project is open source at github.com/axshoe/VectoSelect.

Page updated

Google Sites

Report abuse