Accessing Full Training Data Set

Overview

The 2024 MRSI Data Processing and Quantitation Challenge will test the ability of data processing and spectral analysis pipelines to create accurate metabolite maps from a group of FID-MRSI data sets simulated using in vivo experimental data. The Challenge contains two sub-challenges:

  • Sub-Challenge 1: Perform nuisance signal removal and spectral quantification on (x,t)-space MRSI data contaminated by noise, baseline signals, spectral distortions, residual water signals, and unsuppressed lipid signals.
  • Sub-Challenge 2: Perform spectral quantification on (x,t)-space MRSI data contaminated by noise, baseline signals, and spectral distortions.

Interested groups are encouraged to participate in either or both of the challenges.

Note. You do NOT have to be attending the ISMRM MRS Workshop in October to participate in this challenge.

1. Basic Outline of the Trainging Data Sets

In order to accommodate various forms of MRSI data processing and quantitation methods, including machine learning-based methods that need labeled training data, we provide 24 training data sets, each consisting of the contaminated FID-MRSI data, anatomical structural image, and B0 inhomogeneity map along with the ground truth of the metabolite concentration maps, metabolite signals (after removal of macromolecular signals) and nuisance signals (residual water and unsuppressed lipid signals). More specifically, the training data were produced as follows:

  • The metabolite signals were created using spectral basis functions obtained by quantum mechanics simulation for an FID-MRSI acquisition at 3T. Macromolecular signals generated from in vivo measurement were also included.
  • The water signals were created using Bloch equation simulation to simulate residual water signals after WET water suppression with one frequency-selective saturation pulse (50 Hz bandwidth).
  • The lipid signals were created using high-resolution lipid signals acquired from in vivo experiments without any suppression pulses.
  • The spatial distribution of metabolite, water, and lipid signals was simulated based on anatomical structural images and B0 inhomogeneity maps from in vivo experiments at 3T as well as literature values.

Note that the training data can be used for both sub-Challenges. See more explanations below in Section 3. Also note that the water and lipid signals involved in this challenge were simulated to mimic situations in an in vivo FID-MRSI scan with short TR and TE for improved data acquisition efficiency and thus might appear stronger than a normal MRSI experiment.

Separate testing data that have slightly different spectral artifacts have been released for evaluation and comparison, which can be accessed by sending an email to fittingchallenge2024@gmail.com.

2. Training Data Access

Box folder links will be emailed to each Team as data is released. As with the Example Data, the Training data folders will contain subdirectories with the MRI/FID-MRSI data in NIfTI and NIfTI-MRS formats as well as MATLAB .mat format.

Image data files provided in NIfTI format, (nx, ny, nz) array:

  • xxx_mri_t1w_mpr.nii.gz - T1w image stack
  • xxx_mri_b0_map.nii.gz - B0 map (in Hz) image stack
  • xxx_brain_mask.nii.gz - Brain mask image stack
  • Metabolite area ground truth maps (x18, one for each metabolite basis set)
    • xxx_mrs_metab_result_naa.nii.gz
    • xxx_mrs_metab_result_tcho.nii.gz
    • xxx_mrs_metab_result_tcr.nii.gz
    • xxx_mrs_metab_result_lac.nii.gz

Spectral data files provided in NIfTI-MRS format array:

  • xxx_mrs_fids_si_data.nii_v2.gz – (x,t)-space data for FID-MRSI data, which contains residual water signals, unsuppressed lipid signals, metabolite signals (along with macromolecular signals), and noise with the following acquisition parameters:
    • matrix = 64,64,32,384 (nx,ny,nz,nt)
    • TR/TE = 450ms/1.66 ms
    • dwell time = 0.83 ms
    • center frequency = 127.7 MHz
    • All data formats have been co-registered
  • xxx_mrs_fids_metabolites_v2.nii.gz - (x,t)-space data of ideal fitted metabolite signals after removal of macromolecular signals and nuisance signals
  • xxx_mrs_fids_nuisance_v2.nii.gz - (x,t)-space data of ground truth ‘nuisance’ signals

The same data are also provided in an MATLAB .mat file (xxx_all.mat)

  • B0map (64x64x32) - B0 map in Hz
  • Iref (64x64x32) - T1w image at the resolution of the metabolite maps
  • brainMask (64x64x32) - Brain mask
  • metaMap (64x64x32x16) - Metabolite area ground truth maps of the following metabolites, respectively: ‘Asp’; ‘GABA’; ‘Gln’; ‘Glu’; ‘Lac’; ‘Lac41’; ‘NAA’; ‘NAAG’; ‘PE’; ‘;PE40’; ‘Tau’; ‘mIns’; ‘sIno’; ‘tCho’; ‘tCr’; ‘tCr39’;
  • xtAll (64x64x32x384) - (x,t)-space data for FID-MRSI data, which contains residual water signals, unsuppressed lipid signals, metabolite signals (along with macromolecular signals), and noise.
  • xtMeta (64x64x32x384) - (x,t)-space data of ideal fitted metabolite signals after removal of macromolecular signals and nuisance signals
  • xtNuisance (64x64x32x384) - (x,t)-space data of ground truth ‘nuisance’ signals

The .mat file includes the following additional information

  • IrefHr (128x128x64) – High-resolution T1w image
  • VtMM (10x386) – The first to the ninth row of VtMM contains the temporal basis functions of macromolecular signals for t = 0, , …, 385 , where = 0.83 ms is the dwell time. The last row contains the basis of Cr at 3.9 ppm as a reference.
  • Vt (16x386) – The temporal basis functions of the following metabolites for t = 0, , …, 385: ‘Asp’; ‘GABA’; ‘Gln’; ‘Glu’; ‘Lac’; ‘Lac41’; ‘NAA’; ‘NAAG’; ‘PE’; ‘PE40’; ‘Tau’; ‘mIns’; ‘sIno’; ‘tCho’; ‘tCr’; ‘tCr39’;
  • t (1x384) – Time domain sampling grid of the MRSI data, i.e., t = 2 , 3 , …, 385
  • hzpppm – Herz per ppm
  • ppmoff – Offset for display purpose

Note that in the context of this Challenge, the area is defined as the following:

\[s(n\delta t) = \sum_{m=1}^M c_m g_m(n\delta t), n= 2, ..., 384\]

where $s(n\delta t)$ is the time-domain signal after removal of the water, lipid and baseline signal, $c_m$ is the area of the m-th metabolite or macromolecular signal, $\delta t$ denotes a signal decay function, and $g_m$ is the temporal basis function of the m-th metabolite or macromolecular signal stored in Vt and VtMM.

3. Usage of the Data for Network Training

Use the data in the MATLAB .mat file as an example, to train a network for nuisance signal, one can use xtAll, which contains residual water signals, unsuppressed lipid signals, metabolite signals (along with macromolecular signals), and noise, as the input of the network. The corresponding ground truth (training label) is xtNuisance, which contains noiseless water and lipid signals.

Similarly, to train a network for spectral quantification (i.e., those only interested in sub-Challenge 2), one can use xtAll-xtNuisance, which contains metabolite signals (along with macromolecular signals) and noise, as the input of the network. The corresponding ground truth (training label) is xtMeta, which contains ideal fitted metabolite signals after removal of macromolecular signals and nuisance signals, or metaMap, which contains ground truth metabolite area.

If you have trouble accessing the data, or other questions, please email us at fittingchallenge2024@gmail.com