Automated meta-analysis

NeuroQuery is a tool for meta-analysis of neuroimaging studies. Given a text query, it produces a map of the most relevant brain regions: the locations where neuroimaging studies related to the query are likely to report activations. NeuroQuery also models the associations between terms used in neuroimaging publications — for example that "aphasia" is related to "language". For each query, it shows a list of related terms, that it uses to build its prediction. It also displays a list of neuroimaging publications related to the query.

overview figure

What do I see when I hover over a term or document?

How does NeuroQuery differ from neurosynth?

The main difference between NeuroQuery and neurosynth is that NeuroQuery is focused on producing a brain map that predicts where in the brain a study on the topic of interest is likely report observations, while neurosynth tests the consistency of observations reported in the literature. Prediction, as opposed to statistical testing, is important because it can be applied out of sample, in other terms on queries not already present in the literature. The prediction is done by extrapolating between relevant studies from the literature. Standard meta analysis, as performed by neurosynth, works by defining a set of studies of interest but cannot model variations across these studies.

How are the maps scaled and thresholded?

The maps show how likely a given brain location is to be detected for studies addressing the given query. They are Z scores: effect divided by standard deviation, as typical of neuroimaging maps. However this scaling is only for convenience. These maps remain predictions, and cannot be used in a statistical test to reject any easily interpretable null hypothesis. The display thresholds maps by default at some arbitrary level, but the downloadable map is not thresholded.

Is there a query language to control the matches?

No, there is no specific markup. NeuroQuery recognizes words by itself in the query. If two consecutive words form a known token, it will match this token.

Does NeuroQuery do conjunctions between terms?

NeuroQuery's model is not based on inclusion or exclusion of studies. Rather, it describes studies in a continuous way, and combines them to match the query. Hence, when you enter multiple terms, the reply is not strictly speaking a conjunction of these terms, but rather a prediction of what a study containing these terms is likely to report.

How is the list of related terms built?

Given a query, the user interface shows a list of related terms. This list is made of two parts: terms "in query", in others words recognized in the text entered, and terms "in expansion", in other words related to the terms recognized.
The table gives two measures for each term: the similarity to the query and the weight in the brain map. The similarity to the query is the similarity of a term to terms in the query, modulated by how many times these terms are in the query. Similarities are based on term co-occurrences in the literature. The weight in the brain map details how much each term contributes to the brain map. These weights are tuned by fitting the multivariate NeuroQuery model to the literature.

What data are these maps drawn from?

NeuroQuery has been trained on the largest existing corpus of neuroimaging articles and reported stereotactic coordinates of peak activations. The corpus comprises around 14,000 full-text publications and 400,000 peak activations. For each query, the most relevant publications from the corpus are displayed. The data is available here.

Technical details

NeuroQuery is a reduced-rank linear regression model. The activity of each voxel in the brain, across studies from the training corpus, is regressed on term occurrence frequencies in the corresponding publications. NeuroQuery automatically selects the most predictive terms from a vocabulary of over 7000 terms related to neuroscience, and fits a linear regression to link brain activity with the selected keywords. To transform a text query into a brain map, it first maps the query onto the set of selected keywords, using semantic associations estimated from co-occurrence statistics in the corpus. Then, it encodes the resulting representation into brain space through the linear regression coefficients.

An overview of NeuroQuery is available in this poster. We are currently working on a publication, but some of the machinery behind NeuroQuery is described in this paper.

Trying this at home

The NeuroQuery model can be used through the web interface, but also by installing this Python package: https://github.com/neuroquery/neuroquery. The package is easy to install, and allows downloading and using the trained NeuroQuery model that is behind the website. See this notebook for a short example, which can also be run online. The neuroquery package also enables training new models (and possibly extending NeuroQuery).

The data used to build NeuroQuery is also available online: https://github.com/neuroquery/neuroquery_data. This repository contains the vocabulary, the low-dimensional embeddings of the vocabulary used to compute similarities, and term frequency vectors and peak activation coordinates for all studies in the NeuroQuery corpus.

Please do not scrape this website

We run this website as a demo of the capabilities of the NeuroQuery model and as a service to the neuroscience community. Please do not run an automated tool to scrape results from this website. Instead, we provide the Python library (for both training and inference) here. The library also enables downloading the trained model that runs on the website.

Contacting us

We'd be thrilled to hear from you! you can: