Data Management

Purpose TRANSFORM-DBS will generate a large volume of high-value neuroscience data, including brain recordings from humans and animals performing a variety of cognitive tasks, brain scans during similar tasks, and recordings of the brain's electrical response to applied stimulation. As a component of President Obama's BRAIN Initiative, we have accepted the challenge to make those datasets widely available for the neuroscience community to speed the discovery of new treatments for psychiatric illness. Given the wide variety of data collected and the size of the dataset (recordings can stretch into the hundreds of gigabytes), we need a robust sharing infrastructure.

Infrastructure The data sharing infrastructure supporting the TRANSFORM-DBS project is based on a modern design meant to fully support the sharing of raw and processed data as well as creating a rich set of accompanying metadata.  At its core is the NoSQL document store CouchDB.  Instead of relational database tables and a fixed schema, CouchDB stores Javascript Object Notation (JSON) documents that can be easily queried.  Data files are either added as attachments to the JSON documents or as links to the file storage location are recorded.  This design allows for maximum flexibility as new data types are added, while minimizing query time.

Data The TRANSFORM-DBS project stores data from magnetic resonance imaging (MRI), magnetic encephalography (MEG), from several electrophysiological modalities.  In addition, the accompanying behavioral data are also stored. In many cases, the processed data are also available.  For each modality, the data is either natively in, or converted to, a standard data format upon upload to increase the reusability of the data.  The data is fully query-able and anonymized data is available for download.

Metadata The Data Management group endeavors to share a rich set of metadata to aid our goals of making the data acquired in this project both reproducible and reusable.  For data to be reproducible, each step of the data workflow must be well documented, from acquisition to analysis.  Reusability too, requires that provenance for all data be well-established and available.  In this project we have created data parsers for each data format that extract the full set of parameters and stores it in a JSON document.  We also have created and made available a data dictionary that defines each term used as metadata so that consumers of the data will know the precise meaning of each term used in each document.