NeuroQuery is a tool for meta-analysis of neuroimaging studies. Given a text query, it produces a map of the most relevant brain regions: the locations where neuroimaging studies related to the query are likely to report activations. NeuroQuery also models the associations between terms used in neuroimaging publications — for example that "aphasia" is related to "language". For each query, it shows a list of related terms, that it uses to build its prediction. It also displays a list of neuroimaging publications related to the query. A detailed description and extensive validation are provided in this paper.
The main difference between NeuroQuery and neurosynth is that NeuroQuery is focused on producing a brain map that predicts where in the brain a study on the topic of interest is likely report observations, while neurosynth tests the consistency of observations reported in the literature. Prediction, as opposed to statistical testing, is important because it can be applied out of sample, in other terms on queries not already present in the literature. The prediction is done by extrapolating between relevant studies from the literature. Standard meta analysis, as performed by neurosynth, works by defining a set of studies of interest but cannot model variations across these studies.
The maps show how likely a given brain location is to be detected for studies addressing the given query. They are Z scores: effect divided by standard deviation, as typical of neuroimaging maps. However this scaling is only for convenience. These maps remain predictions, and cannot be used in a statistical test to reject any easily interpretable null hypothesis. The display thresholds maps by default at some arbitrary level, but the downloadable map is not thresholded.
No, there is no specific markup. NeuroQuery recognizes words by itself in the query. If two consecutive words form a known token, it will match this token.
NeuroQuery's model is not based on inclusion or exclusion of studies. Rather, it describes studies in a continuous way, and combines them to match the query. Hence, when you enter multiple terms, the reply is not strictly speaking a conjunction of these terms, but rather a prediction of what a study containing these terms is likely to report.
Given a query, the user interface shows a list of related terms. This list is made of two parts: terms "in query", in others words recognized in the text entered, and terms "in expansion", in other words related to the terms recognized. The table gives two measures for each term: the similarity to the query and the weight in the brain map. The similarity to the query is the similarity of a term to terms in the query, modulated by how many times these terms are in the query. Similarities are based on term co-occurrences in the literature. The weight in the brain map details how much each term contributes to the brain map. These weights are tuned by fitting the multivariate NeuroQuery model to the literature.
NeuroQuery has been trained on the largest existing corpus of neuroimaging articles and reported stereotactic coordinates of peak activations. The corpus comprises around 14,000 full-text publications and 400,000 peak activations. For each query, the most relevant publications from the corpus are displayed. The data is available here.
NeuroQuery is a reduced-rank linear regression model. The activity of each voxel in the brain, across studies from the training corpus, is regressed on term occurrence frequencies in the corresponding publications. NeuroQuery automatically selects the most predictive terms from a vocabulary of over 7000 terms related to neuroscience, and fits a linear regression to link brain activity with the selected keywords. To transform a text query into a brain map, it first maps the query onto the set of selected keywords, using semantic associations estimated from co-occurrence statistics in the corpus. Then, it encodes the resulting representation into brain space through the linear regression coefficients.
The NeuroQuery model can be used through the web interface, but also by installing this Python package: https://github.com/neuroquery/neuroquery. The package is easy to install, and allows downloading and using the trained NeuroQuery model that is behind the website. See this notebook for a short example, which can also be run online. The neuroquery package also enables training new models (and possibly extending NeuroQuery).
The data used to build NeuroQuery is also available online: https://github.com/neuroquery/neuroquery_data. This repository contains the vocabulary, the low-dimensional embeddings of the vocabulary used to compute similarities, and term frequency vectors and peak activation coordinates for all studies in the NeuroQuery corpus.
In https://github.com/neuroquery/neuroquery_apps you can find a small gallery of web applications, based on NeuroQuery, that can be easily be run on your machine. They include an application for decoding images and an interface similar to this website, but based on an ensemble model (average of 30 neuroquery models trained on randomly subsampled data) that we find to be more robust on some examples.
We run this website as a demo of the capabilities of the NeuroQuery model and as a service to the neuroscience community. Please do not run an automated tool to scrape results from this website. Instead, we provide the Python library (for both training and inference) here. The library also enables downloading the trained model that runs on the website.
We'd be thrilled to hear from you! you can: