A Critical Review of the 2025 Randomized Controlled Trial

https://www.apollohealthco.com/wp-content/uploads/CriticalReviewOfEvantheaTrial.jpg

Make KetoFLEX 12/3™ easy, order meal delivery with Trifecta

By Julie Gregory, Chief Health Liaison for Apollo Health

Why the first randomized controlled study of precision medicine in early Alzheimer’s deserves serious scientific attention.

I approach Alzheimer’s research through the eyes of a seeker — one shaped not by abstraction, but by personal stakes. As an ApoE4/4 carrier and the founder of ApoE4.info, I’ve spent years immersed in both the science and the lived reality of cognitive risk. I would welcome a simple solution — a single pill that works while life goes on unchanged. But Alzheimer’s has never been a single-pathway disease, and decades of failed monotherapies have made one thing unmistakably clear: there is no magic bullet.

Alzheimer’s neurodegeneration arises from interacting metabolic, inflammatory, infectious, toxic, vascular, and lifestyle factors that differ from one individual to the next. That complexity is precisely why the recently posted preprint for the 2025 randomized controlled trial warrants careful scientific attention rather than reflexive dismissal. Rather than targeting one downstream pathology, the study tested whether a precision, systems-based therapeutic approach— one that identifies and addresses the specific drivers of decline in each person — could meaningfully alter cognitive outcomes.

At its core, the 2025 randomized controlled trial (RCT) evaluated this approach in individuals with mild cognitive impairment (MCI) or early dementia due to Alzheimer’s disease. Seventy-three participants were randomized to either a personalized precision-medicine intervention or standard of care, with objective cognitive outcomes assessed over nine months.

Below, I address the most common criticisms raised in response to the trial — why they arise, where they fall short, and why they merit thoughtful scientific engagement rather than summary rejection.

Criticism #1: “Those participants didn’t even have Alzheimer’s — they were too young, and not everyone had elevated p-tau.”

This criticism rests on a misunderstanding of how early Alzheimer’s disease is identified and why early intervention trials are designed the way they are.

Participants in the 2025 RCT were diagnosed with mild cognitive impairment (MCI) or early dementia, based on objective cognitive testing and informant-reported functional decline — the same clinical criteria widely used to define symptomatic early Alzheimer’s in both research and practice.

The claim that participants were “too young” reflects a misunderstanding of Alzheimer’s disease biology rather than trial design. Alzheimer’s pathology begins decades before late-stage dementia, and individuals in midlife — particularly those with genetic risk factors such as ApoE ε4 — may present with clinically meaningful cognitive decline. Epidemiologic trends increasingly support this observation, suggesting that symptomatic cognitive impairment and Alzheimer’s disease are being recognized earlier in life. Earlier-onset cases often progress more rapidly, underscoring the rationale for precision-medicine approaches that target this earlier, more biologically modifiable stage of disease.

Criticism based on age also ignores a practical reality of drug development: many pharmaceutical trials skew older, not because Alzheimer’s begins late in life, but because later stages are easier to detect using coarse biomarkers and global cognitive scales. Evanthea, by contrast, targeted earlier-stage disease where disease-modifying effects are most plausible.

The biomarker critique similarly benefits from clarification. In Evanthea, p-tau-217 was measured as an observational biomarker, intended to characterize biological heterogeneity and longitudinal change—not to determine eligibility. Variability in p-tau at early symptomatic stages is expected and does not negate clinically meaningful impairment.

Most importantly, by the same standards routinely applied in monoclonal anti-amyloid antibody trials, the Evanthea cohort clearly qualifies as an early Alzheimer’s population: participants had cognitive test scores comparable to those trials, and among those tested, 68 of 70 demonstrated abnormal amyloid-β levels.

Criticism #2: “There was no statistically significant difference in MoCA improvement between the two arms.”

This criticism relies on over-weighting a single global screening tool while overlooking both its limitations and the broader pattern of results observed in the trial.

The Montreal Cognitive Assessment (MoCA) is widely used as a brief cognitive screening instrument, but it is well known to be susceptible to practice effects, particularly over repeated administrations in short to moderate timeframes. Participants often improve simply because they become familiar with the test format — an effect that can inflate scores in both intervention and control groups and obscure true between-group differences.

For this reason, the 2025 RCT appropriately included CNS Vital Signs (CNS-VS), a computerized neurocognitive battery specifically designed to minimize practice effects through alternate forms, randomization, and domain-specific scoring. Notably, the largest and most statistically robust improvements were observed in the CNS-VS composite and domain scores, including memory, executive function, and processing speed — domains that are highly relevant to daily function and disease progression.

Focusing exclusively on MoCA while discounting CNS-VS is, therefore, methodologically unsound. In multidomain interventions, composite cognitive batteries are more sensitive than brief global screens, especially when assessing subtle but meaningful changes across multiple neural systems.

Additionally, the 2025 RCT was conducted in free-living humans, not tightly controlled laboratory conditions. In such real-world randomized controlled trials, it is common — and well documented — for participants assigned to standard care to begin making lifestyle changes before receiving the formal intervention, particularly when they know they will eventually be offered the active protocol. This phenomenon was reflected in the Evanthea study by early improvements in weight and metabolic biomarkers observed during the screening and run-in period, even prior to intervention initiation.

Such early behavior changes can raise the baseline of the control group, thereby reducing the apparent between-group difference on certain outcome measures — especially those prone to practice effects like the MoCA. Importantly, this does not indicate a lack of efficacy; rather, it underscores the challenges of detecting treatment effects in lifestyle-based trials where ethical and practical constraints prevent strict behavioral isolation.

Taken together, the MoCA findings should be interpreted in context: while global screening scores may show modest between-group differences, domain-specific cognitive improvements as measured by tools designed to avoid practice effects demonstrated clear and clinically meaningful gains in the precision medicine arm.

Criticism #3: “The dataset is too small to be meaningful.”

This criticism reflects a misunderstanding of how sample size relates to effect size, particularly in early-phase clinical trials.

Large anti-amyloid drug trials often enroll hundreds to thousands of participants, not because larger datasets are inherently superior, but because the observed treatment effects are small and must be detected against substantial biological and measurement noise. When effect sizes are modest — as has been the case in most Alzheimer’s pharmacologic trials — very large sample sizes are required to achieve statistical significance.

By contrast, the 2025 RCT tested a precision medicine intervention that targets multiple, individualized drivers of cognitive decline. The rationale for such an approach is that addressing root causes simultaneously can produce larger effect sizes than single-target therapies. When effect sizes are large, fewer participants are required to detect meaningful differences between groups.

Indeed, the Evanthea investigators designed the study based on effect sizes observed in prior proof-of-concept work, powering the trial accordingly. The resulting dataset — 73 participants randomized in a 2:1 ratio — is entirely appropriate for a Phase 2 randomized controlled trial intended to test feasibility, signal strength, and biological plausibility.

It is also worth noting that heterogeneity — the fact that people have different underlying drivers of disease and therefore respond differently to treatment — is an expected feature, not a flaw, in precision medicine trials. Because participants receive individualized interventions based on their unique biological profiles, response variability can be higher than in tightly constrained drug trials. This variability can widen confidence intervals but does not negate the presence of meaningful average benefits — particularly when improvements are consistent across multiple cognitive domains.

Finally, dismissing the 2025 RCT dataset because it is smaller than anti-amyloid trials ignores a critical context: those trials required enormous cohorts to detect marginal clinical benefits, often measured as fractions of a point on global cognitive scales. In contrast, Evanthea reported larger domain-specific cognitive improvements, allowing statistically meaningful conclusions to be drawn from a smaller, well-characterized cohort.

In short, sample size must be evaluated in relation to effect size and study purpose, not in isolation. By that standard, the Evanthea dataset is not only meaningful — it is appropriate and informative.

Comparison of Cognitive Outcomes

Fig 10 Forest Plot Comparing Effect Size (3)

Criticism #4: “The wide confidence intervals suggest the protocol is untrustworthy.”

This criticism misinterprets what confidence intervals signify in the context of precision medicine and multifactorial interventions.

In trials of single, non-personalized interventions — such as anti-amyloid monoclonal antibodies — effects tend to be small, relatively uniform, and narrowly distributed across large populations. As a result, confidence intervals are often tight, even when the absolute clinical benefit is modest. Tight confidence intervals in such trials reflect consistency, not magnitude or clinical relevance.

By contrast, a personalized, multifactorial approach is designed to generate larger benefits in responders by addressing the specific drivers of disease in each individual — metabolic dysfunction, inflammation, sleep apnea, nutrient deficiencies, toxin exposure, infections, or hormonal imbalances. Because participants differ in which drivers are most active, response variability is expected by design. Some individuals improve dramatically; others improve modestly; a few may show minimal change. This biological heterogeneity naturally produces wider confidence intervals, particularly in early-phase trials.

Importantly, wide confidence intervals do not imply unreliability. They indicate variability around a mean effect — an expected feature of individualized interventions applied to a heterogeneous disease like Alzheimer’s. When such intervals still exclude no-effect thresholds across multiple cognitive domains, they signal a real and meaningful treatment effect despite variability.

Moreover, focusing on confidence interval width without considering effect size risks missing the central result: while non-personalized interventions require thousands of participants to detect small average benefits, the 2025 RCT observed substantially larger domain-specific cognitive improvements, detectable in a much smaller cohort. This reflects a higher signal-to-noise ratio — not statistical weakness.

In short, precision medicine trades uniformity for magnitude. The resulting variability is not a flaw but a reflection of biological reality — and a necessary step toward identifying which patients benefit most and why.

Criticism #5: “With all those interventions, we should have seen more improvement.”

This criticism presumes that meaningful, sustained cognitive improvement in Alzheimer’s disease is readily achievable — and that anything short of dramatic reversal represents failure. That assumption is not supported by the current state of Alzheimer’s therapeutics.

Outside of Dr. Bredesen’s body of work, there is no intervention — pharmaceutical or otherwise — that has reliably produced durable cognitive improvement in individuals with MCI or early dementia. At best, existing therapies offer modest slowing of decline, often accompanied by high cost, risk, or limited clinical relevance. In this context, the expectation that a 9-month intervention should produce uniformly large improvements across a heterogeneous cohort is not just unrealistic — it ignores decades of sobering clinical trial results.

Alzheimer’s is a complex, multifactorial systems disease, not a single-pathway disorder. Even when multiple contributors are addressed, outcomes will vary based on disease stage, biological resilience, adherence, and the extent of existing neurodegeneration. The goal of early-stage intervention is not dramatic reversal in every individual, but stabilization, selective improvement, and slowed progression — outcomes that stand in stark contrast to the natural history of untreated disease.

Indeed, in the absence of effective treatment, individuals with MCI or early Alzheimer’s typically experience steady cognitive decline over time, not maintenance or improvement. Against that baseline, the improvements observed in the 2025 RCT —particularly across multiple cognitive domains — represent a meaningful departure from the expected disease trajectory.

To suggest that these gains “should have been larger” implies a standard that no other approach in Alzheimer’s medicine has come close to meeting. Until a therapy exists that can reliably deliver dramatic, sustained improvement at scale, dismissing early, reproducible signs of benefit reflects misplaced expectations, not scientific rigor.

Conclusion: Through the Eyes of a Seeker

I write this as a seeker — but not an untested one.

I have lived long enough with genetic risk, cognitive vulnerability, and real-world intervention to know what no treatment looks like. And I have lived long enough with a precision, systems-based approach to know what meaningful improvement feels like—not in theory, but in daily function, clarity, and resilience. That experience shapes how I read the 2025 RCT results.

Alzheimer’s remains a disease defined more by failure than success. Outside of this body of work, there is still no therapy that reliably produces sustained cognitive improvement in people with MCI or early dementia. Most approaches have required massive trials to detect small, often clinically marginal effects, and even then, benefits are inconsistent and limited.

Against that backdrop, the 2025 RCT does something quietly radical. It demonstrates statistically robust cognitive improvements, supported by p-values that are unusually strong for this field, in a modestly sized cohort treated early —when biology is still responsive. These signals did not emerge from targeting a single molecule, but from identifying and correcting the specific contributors driving decline in each individual.

As a seeker, I don’t expect uniform outcomes or dramatic reversals for everyone. Alzheimer’s is too heterogeneous for that. What I do expect is biological coherence — meaning that the intervention targets known disease mechanisms in a way that logically fits human biology — and results that align with it. The 2025 RCT delivers that alignment. Variability is not a flaw here; it is evidence that the intervention is doing what precision medicine is supposed to do.

For critics, these findings will hopefully invite a new conversation. For me, they confirm what lived experience has already made clear: when you address the right problems, in the right person, at the right time, improvement is not hypothetical — it is achievable.

Making KetoFLEX 12/3™ Easy with Trifecta

A Critical Review of the 2025 Randomized Controlled Trial

Criticism #1: “Those participants didn’t even have Alzheimer’s — they were too young, and not everyone had elevated p-tau.”

Criticism #2: “There was no statistically significant difference in MoCA improvement between the two arms.”

Criticism #3: “The dataset is too small to be meaningful.”

Comparison of Cognitive Outcomes

Criticism #4: “The wide confidence intervals suggest the protocol is untrustworthy.”

Criticism #5: “With all those interventions, we should have seen more improvement.”

Conclusion: Through the Eyes of a Seeker

Topics:

Blogs:

The Ageless Brain

Join our email community for exclusive news, the latest science, and more: