Adverse Effect Trajectory Modeling of Migraine Preventives in Adolescent Females
December 2025 - Current
There is a gap in how migraine prevention is understood, and it becomes most visible in the patients who have already tried multiple treatments without success. Chronic migraine is the leading cause of disability in people under 50. It affects adolescent females at disproportionately high rates, yet the evidence guiding treatment in this group is limited.
Most preventive medications currently in use were approved based on clinical trials conducted primarily in adults. These trials typically enrolled participants between 18 and 65, rarely examined outcomes separately by sex in a meaningful way, and were conducted over timeframes that are often too short to capture the side effects that ultimately determine whether a medication is tolerable over months or years. For adolescents, especially females, this creates a situation where the available evidence does not fully reflect their experience.
In practice, this means that when a teenager has already tried several medications and needs to choose another, the decision is often guided by general patterns, prior experience, and cautious trial and error rather than individualized evidence. This is not a failure of clinicians. Rather, it reflects the limits of the data.
My intention in creating NeuroTrack was to address a specific part of this problem. It’s a Python-based pipeline that uses large-scale pharmacovigilance data from the FDA FAERS database to examine adverse effects in populations that clinical trials have not adequately characterized and applies established methods from pharmacoepidemiology to identify patterns in reported side effects, models how those effects emerge over time, and uses ML to estimate how different medications may be tolerated given a patient profile.
Cheers,
Angie X.
*Note: this project is not fully polished yet. Sorry if there are discrepancies or content gaps- I'm working on it!
NeuroTrack is a Python pipeline that pulls together five decades of pharmacoepidemiological methodology to address this gap. It queries the FDA FAERS database (20+ million adverse event reports) for the specific population that clinical trials missed, computes pharmacovigilance signals stratified by adolescent age and female sex, models time-to-adverse-event onset using survival analysis, and builds an XGBoost prediction model that takes a patient profile as input and returns drug-specific adverse effect probability distributions. Everything is open-source, everything runs locally, and the whole thing exists because someone should have built it ten years ago.
The standard of care for adolescent chronic migraine prevention is: try topiramate, try valproate, try amitriptyline, keep going until something works or until the side effects become intolerable or until the patient stops coming to the clinic. As unfortunate as it may be, this is actually not even a cynical characterization in the slightest. Clinical guidelines literally describe such a process as appropriate treatment escalation. There is no validated tool for predicting which drug will cause which side effect in which patient. No biomarker. No genomic panel. Not even a cricket chirping.
The specific issue, beyond a personal bias to cure a friend, that motivated NeuroTrack was the adverse effects that actually drive discontinuation in this population. This includes weight loss on topiramate ("dopamax"), hair loss on valproate (12-24% of users), sleep architecture disruption on amitriptyline (REM suppression documented by polysomnography), cognitive dulling on topiramate, and menstrual irregularity on valproate, all of which are largely absent from the clinical trial adverse event tables because (a) the trials weren't powered to detect them as rare events, (b) the trials ran too short to see delayed-onset effects, and (c) the adolescent female subgroup wasn't large enough for stratified analysis even when present.
FAERS partially compensates for all 3 of these problems by capturing reports from patients, clinicians, and manufacturers in the real world, across all ages and sexes, with indefinitely long observation periods. Of course, it is biased in its own ways (notoriety bias, underreporting, no denominator) but still contains real signals that trials structurally can’t detect.
NeuroTrack is a systematic, open-source, reproducible disproportionality analysis of FAERS stratified simultaneously by adolescent age, female sex, and migraine indication, combined with survival analysis of time-to-event onset and a predictive model. Each of those three components exists as an established methodology, so the synthesis for this underserved population is the contribution. My hope is that this makes the existing evidence more visible and more usable for patients who have already cycled through several medications, so their next choice carries less uncertainty and more of what the data already offers.
I did not know what a Reporting Odds Ratio was before this project, but I did know what an odds ratio was from my DermEquity work :) However, applying it to spontaneous reporting databases, and navigating the methodological jungle that comes with that, was new territory.
The reading list was actually fairly constrained. The Rothman 2004 paper on ROR gave me the delta method standard error and the Norén 2006 paper on the Information Component introduced the Bayesian shrinkage formula and the IC025 lower credible interval that WHO Uppsala uses for signal detection. The Evans 2001 PRR paper laid out the 3-criterion signal definition of PRR at least 2, chi squared at least 4, and at least 3 reports. The Kaplan-Meier paper from 1958 is one of the most elegant statistical derivations I have read. The nonparametric survival function drops directly from first principles in about two pages. Cox 1972 required more work as the partial likelihood derivation was very difficult to swallow on the first read, and I spent a long time on the Newton-Raphson implementation and ChatGPT before it made sense.
What surprised me was how much of clinical pharmacology is applied thermodynamics dressed in Latin. Valproate causes weight gain because it inhibits fatty acid beta oxidation and increases lipogenesis through mechanisms related to PPAR-gamma activation. Topiramate causes weight loss because it inhibits carbonic anhydrase and reduces appetite through mechanisms not fully characterized. Amitriptyline disrupts sleep architecture because tricyclics suppress REM sleep at the brainstem level. The polysomnographic signature, shallow NREM with reduced REM and preserved ability to take short naps, is distinct from insomnia and has been documented since the 1970s. Once I understood the mechanisms, I understood why FAERS would show certain signals and not others. The biology made the statistics interpretable rather than just computable.
I also read enough MIMIC-IV documentation to understand what I was working with. MIMIC-IV is primarily ICU and emergency department data from Beth Israel Deaconess in Boston. That means migraine patients appear when they present acutely with intractable migraine, severe nausea requiring IV antiemetics, or dehydration, rather than in the outpatient setting where preventive treatment actually happens. The Outpatient Medical Records module is better for weight trajectory data, but it is still incomplete. This limitation is stated explicitly in the pipeline output and would need to be addressed in any sort of publication.
The FAERS parsing took longer than I expected for a reason that now seems obvious. Real pharmacovigilance data is extraordinarily and horrendously messy. The drug name field is free text. Topiramate appears as TOPIRAMATE, Topamax, topiramato, TOPIRIMATE (a common misspelling that appears thousands of times), Trokendi XR, and about 30 other variants. Genuinely, I thought I was going to get the first migraine in my life from banging my head against the wall. The same drug submitted by a manufacturer in Brazil, a patient in Australia, and a physician in Ohio all look different in the raw data. RapidFuzz's token set ratio with a threshold of 82 handles most of it. The threshold was tuned against a manually annotated sample of 500 records until the false positive rate was acceptable.
FAERS deduplication happens in two stages. Stage one removes follow up reports, where the same adverse event case is submitted multiple times as new information arrives. These are identifiable by the same case ID with increasing case version. Stage two removes independently submitted duplicates, where the same real world event is submitted by multiple reporters like a patient and a manufacturer. Stage two uses a fingerprint of age bracket, sex, first drug, first reaction, country, and event date within a 30 day window. The two stages together removed roughly 35 to 45 percent of raw rows in my test runs with publicly available quarterly data.
The disproportionality analysis is implemented from scratch rather than using a black box library. All 3 measures, ROR, IC, and PRR, have unit tests derived from worked examples in the original papers. The ROR unit test reproduces Table 2 from Rothman 2004. The IC test validates against manual calculation. The PRR test reproduces Evans 2001. If any of those unit tests fail, the pipeline stops and logs an error before any results are written. This is the minimum standard for scientific software.
The survival analysis is similarly from scratch. Kaplan Meier, Greenwood variance, log log confidence interval transformation, log rank test, and Cox proportional hazards via L BFGS B optimization of the partial likelihood. The Cox model implementation taught me something I would not have learned from calling lifelines. The numerical Hessian for the standard errors is genuinely tricky when the partial likelihood is not perfectly smooth, and the diagonal of the inverse Hessian can produce negative values under near collinearity. The absolute value trick in the square root is not elegant, but it avoids silent NaN propagation.
The MIMIC-IV weight trajectory analysis uses statsmodels MixedLM with random intercepts and random slopes per patient. The REML estimator is appropriate here because the fixed effect standard errors are the target of inference. Maximum likelihood would underestimate them. Whether the proportional hazards assumption holds for the FAERS survival data is tested via Schoenfeld residuals, with the Spearman correlation between residuals and event time as the test statistic.
The XGBoost prediction model trains one classifier per adverse event category. Stratified 5 fold cross validation reports AUROC and AUPRC per fold. AUPRC is the more meaningful metric here because the positive class, meaning the adverse event occurred, is the minority class in most drug and adverse event combinations. SHAP TreeExplainer computes exact Shapley values for each prediction, enabling the beeswarm plots and force plots that make the model interpretable rather than just accurate.
The pipeline processed 44 quarterly FAERS releases (2014–2024), producing a final cohort of 2,685 unique cases after two-stage deduplication: adolescent and young adult females (ages 10–25) with a migraine indication on at least one of the 13 target preventive drugs. The cohort spans 13 canonical drug names across five pharmacological classes, with erenumab (n = 1,226), topiramate (n = 496), and galcanezumab (n = 360) contributing the largest case counts and reflects both prescribing patterns and FAERS reporting propensity, with newer drugs generating proportionally more reports per patient.
Disproportionality analysis produced 23 pharmacovigilance signals across 62 drug-AE category pairs. The signal pattern is coherent with the known clinical pharmacology of each drug class, which is the primary internal validation criterion for a pharmacovigilance study.
Topiramate dominated the signal landscape: elevated ROR for cognitive effects (ROR 8.25, 95% CI 5.19–13.10), reproductive AEs (8.08, 4.67–13.97), weight/metabolic (6.29, 4.33–9.16), renal (5.35, 1.63–17.60), cardiac (4.67, 3.08–7.07), sleep disturbance (4.05, 2.90–5.66), mood (3.56, 2.41–5.24), and neurological (3.40, 2.56–4.52). The breadth of the topiramate signal profile which spans seven of ten AE categories with IC025 > 0 for five is consistent with topiramate's known mechanism: carbonic anhydrase inhibition, GABAergic activity, and AMPA/kainate receptor blockade produce effects across multiple organ systems, and these signals are detectable even in a relatively modest cohort of 496 topiramate cases.
Amitriptyline showed signal for neurological AEs (7.46, 4.11–13.52) and gastrointestinal effects (3.87, 2.08–7.22), consistent with its anticholinergic mechanism and the well-characterized neurological side effects of tricyclic antidepressants. Candesartan showed an unexpected sleep signal (ROR 12.88, 4.41–37.60) on small case counts (n = 6), which warrants caution: the renin-angiotensin system has established CNS activity, and this could reflect a genuine pharmacological effect, confounding by indication, or small-sample instability. It is reported here as a hypothesis rather than a conclusion.
The most scientifically interesting negative finding is valproate. Despite valproate-induced alopecia being one of the most clinically discussed adverse effects of this drug (with published rates of 12–24% in long-term users) the hair loss signal did not reach the full threshold in this cohort (valproate hair ROR 0.50, n = 4, non-significant). This is a known FAERS limitation: alopecia is chronically underreported because patients attribute hair loss to stress, illness, or the disease itself rather than the drug. The valproate weight/metabolic signal did emerge (ROR 5.46, n = 3, small-sample), but with limited precision. This combination of strong weight signal, absent hair signal illustrates exactly the kind of selective underreporting bias that must be stated explicitly in any pharmacovigilance paper using FAERS. The absence of a signal does not mean the absence of an effect.
The CGRP monoclonal antibodies (erenumab, fremanezumab, galcanezumab) showed consistently low or negative ROR across most AE categories: erenumab returned ROR < 0.5 for weight/metabolic, sleep, cardiac, cognitive, mood, and neurological categories, indicating these effects are reported less than expected in the FAERS background. This is the expected finding given the clinical trial safety profile of this drug class and validates the pipeline's ability to detect protective as well as adverse signals.
Prediction model performance across 10 AE categories ranged from AUROC 0.624 (hair) to 0.859 (reproductive), with weight/metabolic at 0.793. These results should be interpreted carefully. The feature set available from FAERS demographics (age, drug, drug class, report year) is limited; there is no comorbidity data, no BMI, no prior medication history. What these AUROC values reflect is the degree to which drug identity and age, without any additional patient-level information, can distinguish cases that experienced a given AE from those that did not. An AUROC of 0.793 for weight/metabolic means drug-type and age alone explain a meaningful fraction of the variance in weight AE occurrence, which is the correct prior given the known class-specific effects (topiramate causes weight loss, valproate causes weight gain). The reproductive model at 0.859 likely reflects strong drug-class specificity for reproductive AEs, with valproate and certain antiepileptics known to cause menstrual irregularities via hormonal mechanisms.
Hair at 0.624 is the worst-performing category, which is scientifically consistent: valproate is the primary driver of drug-induced alopecia in this drug set, but the sparse FAERS hair loss data for valproate (discussed above) limits the model's ability to learn this signal.
The renal category warrants a specific note: with only 11 positive cases, the CV metrics (AUROC 0.762) should be treated with suspicion. A model trained on 11 positive examples and evaluated on 2-3 per fold is not reliably estimating generalization performance; it is reflecting the noise structure of a very small positive class.
What these results defensibly claim: for adolescent females on migraine preventives, topiramate is associated with disproportionate reporting of adverse effects across nearly every category measured. CGRP monoclonal antibodies show substantially cleaner FAERS profiles. The drug-class effects are large enough to be detectable from pharmacovigilance data alone, without patient-level comorbidity information, at AUROC values that exceed chance by a meaningful margin. These are signals; they are not causal estimates; they require clinical validation before any use in prescribing decisions. The MIMIC-IV weight trajectory component, pending institutional data access, will add empirical longitudinal estimates to complement the FAERS cross-sectional signal analysis.
The gap between what clinical trials measure and what patients actually experience is not an accident. It is a structural feature of how drug development works. Clinical trials are powered for efficacy endpoints, not adverse effect detection. They enroll adult populations that are easier to recruit and less legally complicated. They run long enough to satisfy FDA duration requirements, not long enough to characterize delayed onset toxicities. The result is a systematic underproduction of evidence for populations at the edges. Younger patients. Patients with comorbidities. Patients on polypharmacy. Patients who do not look like the typical Phase III trial participant.
Adolescent females with chronic migraine are at the very centers of the condition. The female preponderance of migraine is one of the most replicated findings in headache medicine. The onset during adolescence is documented in every epidemiological review going back to the 1990s. The disproportionate disability burden in people under 50 has been in the Global Burden of Disease data for two decades. None of this is obscure.
What is obscure is the adverse effect data specific to this population. That obscurity is an evitable consequence of the clinical trial design choices made for each of these drugs. Open pharmacovigilance tools, pipelines that make FAERS analysis reproducible, stratifiable, and accessible to anyone with a laptop and API, are a partial remedy. They cannot fix the denominator problem or undo notoriety bias, but can make the existing signal more visible.
The argument I am trying to make with NeuroTrack (and my work in general) is not that software can replace clinical judgment. I just think that the evidence base informing clinical judgment should not be invisible to whom it may concern. A teenager who is about to start another migraine preventive has a right to know that the pharmacovigilance data on adolescent females with her diagnosis exists and that, with appropriate caveats about its limitations, it provides useful info. She also has a right to a neurologist who has access to that information in a usable form. Et voilà!
NeuroTrack draws on methods from pharmacoepidemiology, machine learning, and clinical data science, but it does not claim to resolve the full clinical problem it engages with.
I am not a pharmacoepidemiologist. What I have done here is learn the core methods and apply them to a situation I found both technically interesting and personally meaningful. What this project enlightened upon me is that the difficulty lies in a lack of accessible, structured ways to interpret data for specific populations. The tools used in pharmacovigilance already exist, so the challenge is rendering them useful in contexts where decisions are being made without them.
The outputs of this pipeline are signals and estimates meant to build up better questions and informed conversations, not to replace the judgment of clinicians or the lived experience of patients. There are also obvious limits: FAERS is a biased dataset and MIMIC-IV does not fully represent outpatient care. The predictive model operates under constraints that restrict its accuracy. These are not minor caveats.
Even with those limitations, there is value in making partial information visible. For patients navigating repeated treatment changes, small improvements in clarity can matter. For clinicians, having access to structured signals rather than relying solely on memory or general guidelines can shift how decisions are framed. This project contributes one piece to that process. The work that follows will depend on quite a few others who can apply these ideas in clinical settings, which I do hope will occur.
Cheers,
Angie X.
This project is open source at github.com/axshoe/NeuroTrack.