The Cuban Human Brain Mapping Project population based normative EEG, MRI, and Cognition dataset

The Cuban Human Brain Mapping Project (CHBMP) repository is an open multimodal neuroimaging and cognitive dataset from 282 healthy participants (31.9 ± 9.3 years, age range 18–68 years). This dataset was acquired from 2004 to 2008 as a subset of a larger stratified random sample of 2,019 participants from La Lisa municipality in La Habana, Cuba. The exclusion included presence of disease or brain dysfunctions. The information made available for all participants comprises: high-density (64-120 channels) resting state electroencephalograms (EEG), magnetic resonance images (MRI), psychological tests (MMSE, Wechsler Adult Intelligence Scale -WAIS III, computerized reaction time tests using a go no-go paradigm), as well as general information (age, gender, education, ethnicity, handedness and weight). The EEG data contains recordings with at least 30 minutes duration including the following conditions: eyes closed, eyes open, hyperventilation and subsequent recovery. The MRI consisted in anatomical T1 and T2 as well as diffusion weighted (DWI) images acquired on a 1.5 Tesla system. The data is available for registered users on the LORIS database which is part of the MNI neuroinformatics ecosystem.


Background & Summary
In the past decade, several healthy and patient neuroimaging databases of (ADNI, HCP, UK Biobank, CAMCAN, ABCD, PPMI) as well as consortia (ENIGMA) have been launched. They aim to accelerate discovery of insights into neurodevelopment and physiopathology, and to allow the identification of new biomarkers of disease with their integration into disease progression models. An essential ingredient, lacking many of these projects, is the inclusion of data from the electroencephalograms (EEG) as one of the most informative and direct measurement of brain activity--recordable at the same time scale of neural processes. At the same time, EEG is a cost effective and accessible neuroimaging modality that is applicable to underserved populations in all countries. It is now clear that EEG is a technique of choice for extensive population screening in any economic setting. This situation is being remedied with the collection and publication of open datasets such as 1,2 .
This paper is an addition to the set of open multimodal datasets, having been originated in a Latin American middle-income country--the Cuban Human Brain Mapping Project (CHBMP) 3 . This is a population-based, decades long, brain health data gathering effort in Havana Cuba. This still ongoing project is organized by the Cuban Ministry of Public Health (MINSAP) and coordinated by the Cuban Center for Neuroscience (CNEURO). The CHBMP focuses on the development of tools and health applications based on multimodal neuroimaging.
Making this dataset open is part of the Cuba-Canada-China (CCC) and the Global Brain Consortium (GBC strategy for open science, not only directed to integrate EEG neuroimaging as an essential component of future multimodal Neuroimaging projects, but also to serve as a "translational bridge" for resource limited scenarios in all countries. This has been made possible by integration of the CHBMP efforts into the MNI neuroinformatics ecosystem, based on the CBRAIN processing portal for the processing modules and the use of the LORIS database system 4 for data storage and open access. This added value of the dataset is due to the health-oriented focus of the entire CHBMP which we now briefly summarize.
The emphasis of the CHBMP has been on quantitative evaluation, known as "qEEG" 5 . qEEG has been shown to identify brain disorders in a wide variety of settings and therefore a candidate for use as a screening tool. In qEEG, the scalp recorded EEG log spectra of a proband are compared with normative spectra by means of a statistical parametric mapping procedure These normative spectra are obtained as age dependent means and standard deviations of the EEG log spectra of large sample of healthy participants. The age dependent norms are regressions of log-spectra over a wide age range. An EEG normative database is thus a prerequisite for qEEG.
The need for a Cuban normative database for qEEG 6 thus prompted the first wave of the CHBMP, initiated in 1988. This first database included 211 healthy persons from age 5 to 97 years. The participants were randomly selected from the Cuban population and screened by the Family Doctor system to include only healthy participants. This database was used to develop a high resolution qEEG validated by the Cuban Health system 7,8 . It also promted the devlopment of qEEG for sources (qEEGt). Due to the unavailability of MRI in Cuba during the first wave CHBM, an "approximate qEEGt" was based on the average head model developed by the ICBM consortium 9,10 . This first wave EEG dataset has being submitted separately 11 at Frontiers. Procedures to apply qEEG and qEEGt processing based on this dataset have been integrated into CBRAIN 12 .
At the end of the first wave of the CHBMP, it was recognized that qEEGt as in 13 , based on individual MRI is important, not only to validate approximate qEEG, but also as a basis for multimodal neuroimaging studies of normal and pathological brain function. Thus, the need for a multimodal neuroimaging database was considered essential and planned for the time when MRI was available in Cuba. This was the motivation for the second wave of the CHBMP, between 2004 to 2008 as one of the projects for the National Program for Disability carried out at that time by the Cuban government. As in the first wave of the CHBMP the participants were recruited from the general population with a stratified randomized sampling of the population. This yielded 2,019 candidates, which were then screened, initially by the Family Nurses and Doctors, and later by extensive clinical, neurological, psychological, and neuroimaging evaluations in order to exclude participants with brain disorders, addictive habits or ill health. This resulted in a final sample of 282 "functionally healthy" participants. The recording protocol included high resolution EEG, T1 and T2 MRI, DWI, and psychological tests such as MMSE, Wechsler Adult Intelligence Scale (WAIS III), computerized reaction time, as well as the collection of blood samples for a genome wide association study (GWAS) to be described in a separate further publication.
It is to be noted that the third wave of the Cuban Human Brain Project was launched in 2019 and will be a large study of elderly subjects including a 10-year follow-up of the second wave CHBMP sample.

Participants
The Cuban Human Brain Mapping Project second-wave database contains neuroimaging, medical and cognitive data from, 282 "functionally healthy" participants from the general Cuban population between ages 18-68, (31.9 ± 9.30), comprising 87 (36.5±10.43) females and 195 (29.9 ±7.97) males. Details of the demographical variable frequency of gender, selfreferenced handedness and educational level are presented in Table I. and benefits of the study. Participants also were informed about the confidentiality of their personal information as well as full access to the best diagnostic and therapeutic procedures in case any kind of illness were detected that might prevent them to form part of the normative dataset. Additionally, they were verbally informed about their right to obtain any clinical, psychological and neuroimaging results. Finally, participants were informed about further publications which would result from the project, with the guarantee of anonymization and control of the privacy of their personal information.

Recruitment and Exclusion Criteria
The recruitment procedure is summarized in table II and subsequently detailed. The Doctor and nurses of the Family evaluated the willingness to participate and health condition of the candidates from the subsample (n=2019 ). In detail: 1) The Family nurses visited each household to provide both printed and verbal information about the project objectives and enquired about preliminary about willingness to participate in the study. 2) Those willing to participate went to their local Polyclinic center and for further explanation and signed the Informed Consent.
3) The Health Questionnaire was then applied by the nurse in order to carry out a further screening of pathology, cognitive complaints, use of pharmaceutical agents, heavy smoking etc. 4) According to the exclusion criteria (see Table III) a further selection of the subsample was carried out for the next stage.
Participants with health problems were remained in the overall study, and continue to be followed up to this day with further clinical evaluation and treatment. However, they were excluded from the normative part of the project. At the local Polyclinic center, blood sample for genetic studies were extracted and additional measures: the blood pressure, body temperature, frequency and heart rate, body weight, height (data not included in this version of the database).
Digital EEG recording was the final step at the designated Policlinics 530 Stage 6 MRI collection was carried at the CIMEQ Hospital followed by MRI evaluation by a neuroradiologist.

394
Stage 1: In agreement with Ministry of Public Health (MINSAP), and due to logistical constraints, a single municipality of the province of Havana City selected for the study, together with the whole structure of the Family Doctor and Polyclinic Centers. Towards this end a study was carried out with a committee from MINSAP and the National Office for Population Studies, to assess the distribution of the following variables of all inhabitants in every municipality in the Province of Habana City: ethnicity, sex, province of origin, and socioeconomic status. On the basis of these distributions, the Municipality of La Lisa, was selected for the study since it had the closest match to the general Cuban population. A sample of 30,000 inhabitants in this region was randomly selected from the National Identity Card registry.

Stage 2:
From the original roster of 30,000, a random subsample of n=2019 was then selected for further processing, being stratified by age, gender, socio-economic status.
Family Doctors (Stage 3) then examined the participants records to exclude persons whom they already had ascertained to have health issues. All the remaining participants were visited by the Family nurses who left a printed description of the project and gave a detailed verbal explanation of its aims. As usual for population studies in Cuba, it was explained that there would be no payment for the study, but if a participant needed be absent from his workplace, the local government guaranteed this as a fully paid day. They were also informed about all data acquisition protocols as well as safety measures with a special focus on MRI acquisition and safety. To a great degree, the success of this project was due to the close contact of the Family Doctor and Nurse with the local population, as well as the abundant information provided from the media to the general public about the Cuban Neuroscience Center and its project. This explains a 93% initial willingness to participate in the project. For those participants that gave their written consent, a health questionnaire was applied for further screening and consequently, 580 persons were excluded at this stage from the normative study. In this, as well as in subsequent stages, all participants that did not continue in the normative study followed a separate workflow to ensure specialized diagnostic and intervention by units of the health system, with the same protocol as those continuing in the study. The exclusion criteria used for this stage are listed in Table III. The most prevalent health conditions to exclude participants were diagnosed metabolic syndrome, psychiatric conditions, personal history of severe illnesses, and sensory and motor disabilities. During the stage 4 and 5 specific exclusion criteria (see Table IV). Additionally, a psychiatrist/psychometrist applied the computerized Reaction Time test at the end of the study for a subsample n=56 of the final sample.
The participants who presented hypertension during this research study were included in a separate study and underwent more specific evaluations such as carotid flow, white matter hyper intensities, eye fund, optic and blood vessel impairments and a set of extra measurements. The analysis of the results of this hypertension study was partially published in 3 .

Procedure Workflow
Examinations for final participants (Stages 4-6) were carried out in a five-day schedule. You can see the figure 1 below.

Figure 1. Flow-chart of assessment
Finally, the sample resulting for this study (N=282) included all the participants who completed all the requisites, after all the steps. The participants included in each measurement and the conjunction between modalities in table V. The intelligence raw measures were scored according to the official normative data included in the printed version of WAIS-III. However, to avoid cultural bias, they were subsequently standardized with information from the Cuban sample to produce scores of the specific performance, adjusted for age for our population. The results about how the white matter (FA-tracts based) predicts fluid and crystalized intelligence has been published using this dataset 19 .

Go No Go test:
For a sub-set of 56 participants, reaction times were recorded using a go-no go paradigm which consisted in a visual attention task, implemented using the psychophysiology software for cognitive stimulation Mindtracer 20 (N_P-SW 1.

EEG recordings
Resting state EEG was recorded using the digital electroencephalograph system (MEDICID 5-64 and MEDICID 5-128) (www.neuronic.cu ) with differential amplifiers and gain of 10,000. Electrodes were placed according to the 10-10 International System with a customized electrode cap. Linked earlobes were used as the EEG reference. Electrode impedances were considered acceptable if less than 5 KΩ. The band pass filters parameters was 0.5-50 Hz and 60 Hz notch, and sampling period of 200 Hz. The EEG was recorded in a temperature and noise-controlled room while the participant was sitting in a reclined chair. All individuals were asked to relax and remain at rest during the test to minimize artifacts produced by movements, and to avoid excessive blinking. The participants received instructions to have enough sleep the previous night, take breakfast and wash the hair before attending this appointment. See table VI for a summary of the technical parameters of the EEG. The structure of raw EEG recording was generated in the default format of the MEDICID neurometrics system (*.plg extension), which later are converted to standard BIDS format. See data records section.

Electrode placement
Two different montages were employed, one with 64 channels and other with 120 channels as illustrated in Figure 3 with different colors black (64) and white (120) to identify each montage. The nomenclature of the electrodes employed in the MEDICID system and their standardization is included in supplementary material 1 at the end of this document.  (maximum 62 cephalic electrodes). In white the configuration for the 120 channels (maximum 120 cephalic electrodes). In both configuration, three channels are employed to record the Electro-oculogram and Electrocardiogram.
Description of the EEG protocol comprising the following participant condition: 1. Baseline: resting state EEG with closed Eyes (state A), 10 minutes 2. Reactivity: this test consisted in the consecutive opening and closing eyes with an interval of 12 seconds. Open eyes (state B), 5 minutes, where the participant was instructed to look at a point, keeping the pupils fixed. 3. Hyperventilation (HPV): Dividing it in the first minute HPV1 (state C), the subject was instructed to start taking air through the nose and to breathe deeply. The second minute HPV2 (state D), and HPV3 (state E), this last minute less deep and more frequent. Total 3 minutes. 4. Recovery (state F): The last step is the recovery of the patient after the HPV, which lasted around one and half minute, but were recorded for 2 full minutes.
Note that the subject's recordings were monitored continuously by the technician, in order to avoid contamination of the EEG with the electromyogram interference, other changes in the direct current level due to sweating, and also to prevent drowsiness. Any of this contamination were annotated online by the technician.
Therefore, recordings of at least half an hour were ensured. A design requirement was to have enough valid EEG to carry subsequent frequency domain analysis. For this EEG epochs for are necessary, each consisting of 256 time samples, or 2.56 seconds being marked on line continuous EEG recordings. Due to the high density of electrodes, the number of epochs for further analysis was guaranteed to be at least 50 windows for 64 channels and 80 windows for 120 channels (For details on analysis see 10,12 .

MRI procedure
MRI: Magnetic resonance imaging (MRI) was performed on a 1.5 Tesla scanner (MAGNETOM Symphony Siemens Erlangen Germany). Over the course of MRI data acquisition, the scanner remained stable and did not undergo any major maintenance or updates which would systematically affect the quality of data provided here. The total measurement time was 45 minutes. See the MRI protocol used in Table VII. This DWI sequence was repeated in a second run with the same parameters. The only difference was the position of the slices. They were translated parallel to the axis normal to the slice (axial plane) so that the single from the gaps of the first run were acquired in the second run. Consequently, the gaps of the second run occupied the regions from which the signal in the first run was acquired. In this way the entire brain was covered with a total of 50 slices.  Different analysis and type of processing using this MRI dataset already has been published. One study demonstrated how the surface area could explain the morphological connectivity of brain networks 21 (Sanabria Neuroimage 2010). Other study explained the substantial interindividual variability on the neuroanatomical determinants of EEG spectral properties using the DWI-fractional anisotropy 22 (Valdes-Hernandez 2010 Neuroimage). Two papers studied the human brain anatomical network via diffusion-weighted MRI and Graph Theory characterizing brain anatomical connections 23,24 (Iturria 2007,2008. Another paper presented a general framework for tensor analysis of single-modality model inversion and multimodal data fusion using our neuroimaging data as an example 25 (Karahan Tensor IEEE 2015).

Code Availability
The following in-house Windows software developed by EAV will be soon available at GITHUB.
1. EEG-Anonymizer: to erase all the personal information stored in the EEG recordings, which could facilitate the identification of the participants. 2. PLG2BIDS: to read the original EEG recordings in NEURONIC format and convert them to BIDS structure. 3. Combine EEG-MRI-BIDS: to combine BIDS-EEG and MRI-BIDS into only one structure. 4. Unwrap: As part of the MRI quality control process, several MRI T1images studies were fixed when a wrap-around artifact (without overlapping on head) was detected.
5. Quality Control MRI: Automatic inspection was performed to check the protocol parameters of the DWI images and generate a file with the value of the parameters.

Data Record
BIDS (Brain Imaging Data Structure) is the new standard for the organization and description of the datasets containing neuroimaging (MRI, MEG, EEG, iEEG, NIRS, PET) and behavioral information 26 .
Based in this BIDS structure, we developed a methodology with the following steps: 1. Anonymization of EEG recordings and MRI scans. 2. Defacing of the MRI scans 3. Conversion of EEG recordings and MRI to BIDS-EEG 4. Validation of the BIDS structure

Anonymization.
We developed an application (EEG-Anonymizer) to erase all the personal information stored in the EEG recordings, which could facilitate the identification of the participants. This application generated a security copy of the personal information before its elimination.

Defacing.
The defacing process consisted in the elimination of the section with the face of the subject inside the anatomical MRI. This prevent the identification of the subject if a posterior 3D rendering is employed with the MRI scan. The software employed was the Mri_deface V 1.2, del FreeSurfer https://surfer.nmr.mgh.harvard.edu/fswiki/mri_deface Conversion to BIDS EEG. We developed an ad-hoc application (PLG2BIDS: ) to read the original EEG recordings in NEURONIC format and convert them in BIDS structure. This application is designed to read either individual EEG recordings or folders with multiple recordings and is able to update a current BIDS structure with new recordings.

MRI.
The conversion of MRI neuroimages to BIDS structure was using the Dcm2Bids https://github.com/cbedetti/Dcm2Bids which generate the MRI BIDS structure with the original data in format DICOM. The BIDS-EEG and MRI-BIDS structures were combined in only one structure using a software Combine EEG-MRI-BIDS.

Validation.
The final step was the validation of the BIDS structure using the web bids-validator. https://bids-standard.github.io/bids-validator/ All the dataset was also imported in Longitudinal Online Research and Imaging System LORIS)v20.2. https://mcin.ca/technology/loris/