General Linear Model

General Linear Model#

General linear models (GLMs) are a category of models that is commonly used in neuroimaging data analysis. It is typically used to model how a variable (e.g., neural responses of a vertex) changes with a set of regressors (e.g., stimulus presentations).

In this example, we will use the object category localizer data from the StudyForrest dataset. In each run, the participant watched images of six categories: faces, scenes, bodies, houses, objects, and scrambled objects.

The tutorial will focus on the first run of the first participant. We will examine which part of the brain responds to a given category, and which part of the brain responds selectively to a given category.

First, let’s import all the packages we need:

import urllib3
from io import StringIO

import matplotlib.pyplot as plt
import neuroboros as nb
import numpy as np
import pandas as pd
from scipy.stats import gamma, zscore
from scipy.ndimage import convolve1d

Experimental design#

The dataset is available as an OpenNeuro Dataset.

For each localizer run, there is a NIFTI file (ends with .nii or .nii.gz) which is the fMRI data, and a TSV file (.tsv), which describes the design of the run.

For example, here are the NIFTI file and the TSV file of the first localizer run.

We can use the read_csv function of the pandas package to load the TSV file:

url = 'https://openneuro.org/crn/datasets/ds000113/objects/'\
      '4a71014ac9ebeeddace74d0ec8164204869bc1d6'
resp = urllib3.request('GET', url)
assert resp.status == 200

content = StringIO(resp.data.decode('utf-8'))
df = pd.read_csv(content, sep='\t')

In the file, each block/trial is described using three attributes:

onset (When did the block start relative to the start of the scan),
duration (How long did the block last), and
trial_type (What was the condition/stimulus type of the block).

df

	onset	duration	trial_type
0	11.995	16	face
1	35.988	16	scene
2	59.996	16	body
3	83.989	16	house
4	107.997	16	object
5	131.989	16	scramble
6	155.998	16	scramble
7	179.990	16	body
8	203.999	16	object
9	227.991	16	face
10	252.000	16	house
11	275.992	16	scene

For the data we are going to use, the repetition time (TR) is 2 seconds, and each run contains 156 time points (TRs).

TR = 2.0
nt = 156

We subdivide each TR into 256 smaller time points for a higher temporal precision.

n_div = 256

t = np.arange(nt * n_div) * TR / n_div

Based on the design we got from the TSV file, we know when a block started and ended.

By aggregating across all blocks of the same type, we know when the stimulus category was presented to the participant or not at any given time point.

blocks = df[df['trial_type'] == 'face']
regressor = np.zeros_like(t)
for i in range(blocks.shape[0]):
    block = blocks.iloc[i]
    onset = block['onset']
    offset = onset + block['duration']
    print(f"Block onset: {onset:6.2f} s, Block offset: {offset:6.2f} s")
    regressor[np.logical_and(t > onset, t < offset)] = 1.0

Block onset:  11.99 s, Block offset:  27.99 s
Block onset: 227.99 s, Block offset: 243.99 s

plt.plot(t, regressor)
plt.show()

../_images/e87b3814dc76504bc3ba6fec5b613f678b103811bc97d59f739e20a58fddcd2b.png

Here are the boxcar regressors for all stimulus categories:

conditions = ['face', 'scene', 'body', 'house', 'object', 'scramble']
original_regressors = []
for condition in conditions:
    blocks = df[df['trial_type'] == condition]
    regressor = np.zeros_like(t)
    for i in range(blocks.shape[0]):
        block = blocks.iloc[i]
        onset = block['onset']
        offset = onset + block['duration']
        regressor[np.logical_and(t > onset, t < offset)] = 1.0
    original_regressors.append(regressor)
    plt.plot(t, regressor, label=condition)
plt.legend()
plt.show()

../_images/6c6782cd22fd03ea704170bad1362bb653cdf2360be7dff30422d6278f1a9e62.png

Haemodynamic response function and convolution#

To account for the haemodynamic response function (HRF), we will need to convolve these regressors. Here we will use the HRF from SPM5, which was used by Mantini et al., 2012 and shown in its Supplementary Figure 1.

The HRF is the difference between two Gamma functions, one for the main response and the other for the undershoot.

tt = np.arange(15 * n_div) * TR / n_div
bold = gamma.pdf(tt, 6) - gamma.pdf(tt, 16) / 6.0
bold /= bold.sum()

plt.plot(tt, bold)
plt.show()

../_images/f4ca4948efdda49e549cd21b506b56b88d67b368ec55dc03e29d3532cebc8cd6.png

Now we use the HRF to convolve our regressors:

kwargs = {"mode": "nearest", "origin": -len(bold) // 2}
convolved = []
for reg in original_regressors:
    reg = convolve1d(reg, bold, axis=0, **kwargs)
    convolved.append(reg)

After convolution, the regressors correctly reflect the haemodynamic delay:

plt.plot(t, original_regressors[0], label='original')
plt.plot(t, convolved[0], label='convolved')
plt.legend()
plt.show()

../_images/9744366f5ed653e6935724106885c7a2eab8644d0d9ce626a33668fde49febd8.png

for condition, regressor in zip(conditions, convolved):
    plt.plot(t, regressor, label=condition)
plt.legend()
plt.show()

../_images/ee4d4a60303d795901b8cefa205313f3bc2f6c05cf3b75876eb8f3628ec426f1.png

Pairing regressors with fMRI data#

The regressors we use have higher temporal resolution. Here we downsample them to match our fMRI data.

regressors = []
for reg in convolved:
    downsampled = reg.reshape(-1, n_div).mean(axis=1)
    regressors.append(downsampled)
regressors = np.stack(regressors, axis=1)

The fMRI data have been preprocessed with fMRIPrep and curated using neuroboros.

dset = nb.Forrest()
sids = dset.subjects
sid = sids[0]

Besides the data matrix, we also have the confounding variables (aka nuisance regressors) from fMRIPrep.

dm = dset.load_data(
    sid, 'objectcategories', 1, 'lr',
    'onavg-ico32', dset.surface_resample)
confounds = dset.load_confounds(sid, 'objectcategories', 1)

reg.shape, regressors.shape, dm.shape, confounds[0].shape

((39936,), (156, 6), (156, 20484), (156, 22))

confounds[0]

array([[-0.06436624, -0.09735048,  0.05634564, ...,  1.        ,
        -1.        ,  1.        ],
       [ 0.00996179, -0.05005741, -0.05963685, ...,  1.        ,
        -0.98709677,  0.96154006],
       [ 0.0916715 , -0.03550841,  0.20058609, ...,  1.        ,
        -0.97419355,  0.9235796 ],
       ...,
       [-0.09220897, -0.03032143,  0.00818588, ...,  1.        ,
         0.97419355,  0.9235796 ],
       [-0.01562807, -0.09659374,  0.05209395, ...,  1.        ,
         0.98709677,  0.96154006],
       [ 0.12948981,  0.01547053,  0.04359279, ...,  1.        ,
         1.        ,  1.        ]])

confounds[1]

	global_signal	global_signal_derivative1	global_signal_power2	global_signal_derivative1_power2	csf	csf_derivative1	csf_power2	csf_derivative1_power2	white_matter	white_matter_derivative1	...	rot_x_power2	rot_x_derivative1_power2	rot_y	rot_y_derivative1	rot_y_power2	rot_y_derivative1_power2	rot_z	rot_z_derivative1	rot_z_derivative1_power2	rot_z_power2
0	292.435067	NaN	85518.268259	NaN	396.300730	NaN	157054.268405	NaN	313.936551	NaN	...	5.041148e-06	NaN	0.000101	NaN	1.014895e-08	NaN	0.000000e+00	NaN	NaN	0.000000e+00
1	292.718334	0.283267	85684.023107	0.080240	396.542604	0.241874	157246.036838	0.058503	314.475375	0.538824	...	2.927624e-06	2.853910e-07	-0.000000	-0.000101	0.000000e+00	1.014895e-08	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
2	292.599236	-0.119098	85614.312671	0.014184	397.557162	1.014558	158051.696889	1.029327	314.805351	0.329976	...	4.471575e-06	1.628768e-07	-0.000000	0.000000	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00	0.000000e+00
3	292.478735	-0.120501	85543.810359	0.014520	397.155322	-0.401840	157732.349869	0.161475	313.778681	-1.026670	...	2.371076e-06	3.303720e-07	-0.000095	-0.000095	9.078431e-09	9.078431e-09	-5.293960e-23	-5.293960e-23	2.802601e-45	2.802601e-45
4	291.915179	-0.563556	85214.471729	0.317595	396.402329	-0.752993	157134.806478	0.566999	313.849771	0.071090	...	2.661629e-06	8.394224e-09	-0.000180	-0.000084	3.228957e-08	7.125420e-09	0.000000e+00	5.293960e-23	2.802601e-45	0.000000e+00
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
151	292.969152	-0.309593	85830.923750	0.095848	398.315568	1.822215	158655.292049	3.320469	314.812829	0.631811	...	7.688273e-07	1.033700e-07	0.000705	0.000000	4.966542e-07	0.000000e+00	-1.532280e-04	-9.853670e-05	9.709481e-09	2.347882e-08
152	293.315032	0.345881	86033.708032	0.119633	397.179822	-1.135747	157751.810807	1.289921	314.350860	-0.461968	...	1.502855e-06	1.218582e-07	0.000705	0.000000	4.966542e-07	0.000000e+00	-1.532280e-04	0.000000e+00	0.000000e+00	2.347882e-08
153	293.217327	-0.097705	85976.400942	0.009546	397.896884	0.717062	158321.930014	0.514178	315.146570	0.795710	...	8.663216e-07	8.711116e-08	0.001006	0.000301	1.011513e-06	9.060281e-08	-1.532280e-04	0.000000e+00	0.000000e+00	2.347882e-08
154	293.250529	0.033202	85995.872768	0.001102	396.355571	-1.541313	157097.738436	2.375646	314.618423	-0.528147	...	6.301708e-07	1.875010e-08	0.000909	-0.000096	8.267774e-07	9.305882e-09	-1.532280e-04	0.000000e+00	0.000000e+00	2.347882e-08
155	293.056588	-0.193941	85882.163617	0.037613	399.133676	2.778105	159307.691049	7.717867	315.424945	0.806521	...	5.335937e-07	4.014236e-09	0.001097	0.000188	1.203914e-06	3.532783e-08	0.000000e+00	1.532280e-04	2.347882e-08	0.000000e+00

156 rows × 107 columns

confounds[2]

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True])

Visualizing regressors#

R = np.concatenate([regressors, confounds[0]], axis=1)

mat = np.corrcoef(R.T)

/Users/feilong/miniconda3/envs/nb/lib/python3.12/site-packages/numpy/lib/_function_base_impl.py:2999: RuntimeWarning: invalid value encountered in divide
  c /= stddev[:, None]
/Users/feilong/miniconda3/envs/nb/lib/python3.12/site-packages/numpy/lib/_function_base_impl.py:3000: RuntimeWarning: invalid value encountered in divide
  c /= stddev[None, :]

fig = plt.figure(figsize=(6, 6), dpi=200)
plt.imshow(zscore(R, axis=0), vmax=2, vmin=-2, cmap='bwr')
plt.colorbar()
plt.show()

../_images/5da18ea0c59016f867650a99a175602169a42f235d9112b8185cc70d5128618c.png

Correlation among regressors#

plt.imshow(mat, vmax=1, vmin=-1, cmap='bwr')
plt.plot((0, 5), (-1, -1), 'k-', clip_on=False)
plt.annotate('design', (2.5, -1), annotation_clip=False, ha='center', va='bottom')
plt.plot((6, 11), (-1, -1), 'k-', clip_on=False)
plt.annotate('aCompCor', (8.5, -1), annotation_clip=False, ha='center', va='bottom')
plt.annotate('FD', (12, -1), annotation_clip=False, ha='center', va='bottom')
plt.plot((13, 24), (-1, -1), 'k-', clip_on=False)
plt.annotate('motion parameters', (18.5, -1), annotation_clip=False, ha='center', va='bottom')
plt.plot((25, 27), (-1, -1), 'k-', clip_on=False)
plt.annotate('poly', (26, -1), annotation_clip=False, ha='center', va='bottom')
plt.ylim([mat.shape[0] - 0.5, -0.5])
plt.colorbar(label='Correlation')
plt.show()