Meta AI Releases NeuralBench: A Unified Open-Source Framework to Benchmark NeuroAI Models Across 36 EEG Tasks and 94 Datasets
Meta AI has released NeuralBench, a unified, open-source framework for benchmarking AI models of brain activity across 36 EEG tasks and 94 datasets.

['Evaluating AI models trained on brain signals has long been a messy, inconsistent topic. Different research groups use different preprocessing pipelines, train models on different datasets, and report results on a narrow set of tasks — making it nearly impossible to know which model actually works best, or for what. A new framework from Meta AI team is designed to fix that.', 'Meta Researchers have released NeuralBench, a unified, open-source framework for benchmarking AI models of brain activity.
Its first release, NeuralBench-EEG v1.0, is the largest open benchmark of its kind: 36 downstream tasks, 94 datasets, 9,478 subjects, 13,603 hours of electroencephalography (EEG) data, and 14 deep learning architectures evaluated under a single standardized interface.', 'The broader field of NeuroAI, where deep learning meets neuroscience, has exploded in recent years. Self-supervised learning techniques originally developed for language, speech, and images are now being adapted to build brain foundation models: large models pretrained on unlabeled brain recordings and fine-tuned for downstream tasks ranging from clinical seizure detection to decoding what a person is seeing or hearing. However, the evaluation landscape has been badly fragmented.
Existing benchmarks like MOABB cover up to 148 brain-computer interfacing (BCI) datasets but limit evaluation to just 5 downstream tasks.', 'NeuralBench is built on three core Python packages that form a modular pipeline. NeuralFetch handles dataset acquisition, pulling curated data from public repositories including OpenNeuro, DANDI, and NEMAR. NeuralSet prepares data as PyTorch-ready dataloaders, wrapping existing neuroscience tools like MNE-Python and nilearn for preprocessing, and HuggingFace for extracting stimulus embeddings (for tasks involving images, speech, or text).
NeuralTrain provides modular training code built on PyTorch-Lightning, Pydantic, and the exca execution and caching library.', 'The first release focuses on EEG and spans eight task categories: cognitive decoding (image, sentence, speech, typing, video, and word decoding), brain-computer interfacing (BCI), evoked responses, clinical tasks, internal state, sleep, phenotyping, and miscellaneous. All foundation models are fine-tuned end-to-end using a shared training recipe — AdamW optimizer, learning rate of 10⁻⁴, weight decay of 0.05, cosine-annealing with 10% warmup, up to 50 epochs with early stopping (patience=10). The sole exception is BENDR, for which the learning rate is lowered to 10⁻⁵ and gradient clipping is applied at 0.5 to obtain stable learning curves.', 'The benchmark offers two variants: NeuralBench-EEG-Core v1.0, which uses a single representative dataset per task for broad coverage, and NeuralBench-EEG-Full v1.0, which expands to up to 24 datasets per task to study within-task variability across recording hardware, labs, and subject populations.
Source: MarkTechPost