Preproc

The functions and classes in this file are related to basic preprocessing of dataframe, including frequency spectra, ICA, regressing out confounds, and filtering.

class lazyfmri.preproc.DataFilter[source]

Bases: object

A class for filtering functional fMRI data based on subject, task, and run identifiers. It supports multiple filtering strategies, including high-pass and low-pass filtering.

Parameters:
  • func (pd.DataFrame) – The input functional data as a Pandas DataFrame.

  • **kwargs (dict) – Additional filtering parameters.

Example

from lazyfmri.preproc import DataFilter
obj = DataFilter(
    func=df_func,
    filter_strategy="hp",
    hp_kw={"cutoff": 0.01},
)

filtered_df = obj.get_result()
filter_input(**kwargs)[source]

Filter input data

Filters the input data by applying subject-level, task-level, and run-level filtering.

Parameters:

**kwargs (dict) – Additional parameters for filtering.

filter_runs(df_func, **kwargs)[source]

Filter runs

Extracts and processes functional data for each unique run in the dataset.

Parameters:
  • df_func (pd.DataFrame) – Functional data to be filtered.

  • **kwargs (dict) – Additional parameters for filtering.

Returns:

Filtered functional data, concatenated across runs.

Return type:

pd.DataFrame

filter_subjects(df_func, **kwargs)[source]

Filter subjects

Extracts and processes functional data for each unique subject in the dataset.

Parameters:
  • df_func (pd.DataFrame) – Functional data to be filtered.

  • **kwargs (dict) – Additional parameters for filtering.

Returns:

Filtered functional data, concatenated across subjects.

Return type:

pd.DataFrame

filter_tasks(df_func, **kwargs)[source]

Filter tasks

Extracts and processes functional data for each unique task in the dataset.

Parameters:
  • df_func (pd.DataFrame) – Functional data to be filtered.

  • **kwargs (dict) – Additional parameters for filtering.

Returns:

Filtered functional data, concatenated across tasks.

Return type:

pd.DataFrame

get_result()[source]

Get filtered result

Returns the final filtered DataFrame.

Returns:

Filtered functional data.

Return type:

pd.DataFrame

plot_task_avg(orig=None, filt=None, t_col='t', avg=True, plot_title=None, incl_task=None, sf=None, use_cols=['#cccccc', 'r'], power_kws={}, make_figure=True, **kwargs)[source]

Plot task-averaged time series

Plots the original and filtered time series averaged across tasks.

Parameters:
  • orig (pd.DataFrame, optional) – Original unfiltered data. Defaults to self.func.

  • filt (pd.DataFrame, optional) – Filtered data. Defaults to self.df_filt.

  • t_col (str, optional) – Column name representing time. Default is “t”.

  • avg (bool, optional) – Whether to compute the average time series across subjects. Default is True.

  • plot_title (str or dict, optional) – Title for the plot. If dict, it should contain additional title arguments.

  • incl_task (str or list, optional) – Specific tasks to include. If None, all tasks are included.

  • sf (matplotlib.figure.SubFigure, optional) – SubFigure object for multiple plots.

  • use_cols (list, optional) – Colors to use for the original and filtered data. Default is [“#cccccc”, “r”].

  • power_kws (dict, optional) – Additional parameters for power spectrum computation.

  • make_figure (bool, optional) – Whether to create a new figure. Default is True.

  • **kwargs (dict) – Additional plotting parameters.

Returns:

If make_figure=True, returns a figure. Otherwise, returns a DataFrame of task-averaged time series.

Return type:

matplotlib.figure.Figure or pd.DataFrame

classmethod power_spectrum(tc1, tc2, axs=None, TR=0.105, figsize=(5, 5), **kwargs)[source]

Compute power spectrum

Computes and plots the power spectrum of two time series.

Parameters:
  • tc1 (pd.DataFrame) – First time series.

  • tc2 (pd.DataFrame) – Second time series.

  • axs (matplotlib.axes._axes.Axes, optional) – Matplotlib axis object for plotting. If None, a new figure is created.

  • TR (float, optional) – Repetition time (TR) of the fMRI scan. Default is 0.105 seconds.

  • figsize (tuple, optional) – Figure size for plotting. Default is (5, 5).

  • **kwargs (dict) – Additional parameters.

Returns:

Power spectrum plot.

Return type:

matplotlib.figure.Figure

classmethod single_filter(func, filter_strategy='hp', hp_kw={}, lp_kw={}, **kwargs)[source]

Apply a single filtering step

Performs high-pass or low-pass filtering on the input data.

Parameters:
  • func (pd.DataFrame) – Functional data to be filtered.

  • filter_strategy (str or list, optional) – Filtering strategy to apply. Options: [“hp”, “lp”]. Default is “hp”.

  • hp_kw (dict, optional) – Parameters for high-pass filtering.

  • lp_kw (dict, optional) – Parameters for low-pass filtering.

  • **kwargs (dict) – Additional parameters.

Returns:

Filtered data.

Return type:

pd.DataFrame

class lazyfmri.preproc.EventRegression[source]

Bases: InitFitter

Performs event regression on functional fMRI data. This class takes functional time series and event onsets to regress out specific event-related activity.

Parameters:
  • func (pd.DataFrame) – Functional time series data.

  • onsets (pd.DataFrame) – Event onsets with associated event types.

  • TR (float, optional) – Repetition time (TR) of the fMRI scan. Default is 0.105 seconds.

  • merge (bool, optional) – Whether to merge event-related regressors. Default is False.

  • evs (list, str, optional) – List of event types to regress out. If None, all event types will be used.

  • ses (int, optional) – Session identifier, if applicable.

  • prediction_plot (bool, optional) – Whether to generate plots for predicted timecourses. Default is False.

  • result_plot (bool, optional) – Whether to generate plots for the final regression results. Default is False.

  • save_ext (str, optional) – File extension for saved plots (e.g., “svg” or “png”). Default is “svg”.

  • reg_kw (dict, optional) – Keyword arguments for regression.

  • **kwargs (dict) – Additional keyword arguments for processing.

Example

from lazyfmri.preproc import EventRegression

obj = EventRegression(
    func=df_func,
    onsets=df_onsets,
    TR=0.105,
    evs=["stimulus", "response"],
    result_plot=True
)
regressed_df = obj.df_regress
classmethod plot_model_fits(model, save=False, fig_dir=None, basename=None, TR=0.105, cm='inferno', ext='svg', time_col='time', w_ratio=[0.8, 0.2], evs=None, loc=[0, 1], **kwargs)[source]

Plot model fits

Visualizes model-predicted and observed time series for different voxels.

Parameters:
  • model (object) – Fitted model object.

  • save (bool, optional) – Whether to save the plot. Default is False.

  • fig_dir (str, optional) – Directory to save figures.

  • basename (str, optional) – Basename for saved figures.

  • TR (float, optional) – Repetition time (TR) of the fMRI scan. Default is 0.105 seconds.

  • cm (str, optional) – Colormap for plotting.

  • ext (str, optional) – File extension for saving plots.

  • **kwargs (dict) – Additional plotting parameters.

Return type:

None

plot_power_spectrum(tc2, axs=None, TR=0.105, figsize=(5, 5), **kwargs)[source]

Plot power spectrum

Computes and plots the power spectrum before and after regression.

Parameters:
  • tc1 (pd.DataFrame) – Original time series.

  • tc2 (pd.DataFrame) – Regressed time series.

  • axs (matplotlib.axes._axes.Axes, optional) – Matplotlib axis object for plotting.

  • TR (float, optional) – Repetition time (TR) of the fMRI scan. Default is 0.105 seconds.

  • figsize (tuple, optional) – Figure size. Default is (5, 5).

  • **kwargs (dict) – Additional plotting parameters.

Returns:

Power spectrum plot.

Return type:

matplotlib.figure.Figure

classmethod plot_result(raw, regr, avg=True, save=False, fig_dir=None, basename=None, TR=0.105, ext='svg', w_ratio=[0.8, 0.2], cols=['#cccccc', 'r'], evs=None, **kwargs)[source]
plot_timecourse_prediction(tc2, axs=None, figsize=(16, 4), time_col='t', t_axis=None, TR=0.105, **kwargs)[source]

Plot timecourse prediction

Plots original and predicted timecourses to visualize regression results.

Parameters:
  • tc1 (pd.DataFrame) – Original time series.

  • tc2 (pd.DataFrame) – Predicted time series from the regression model.

  • axs (matplotlib.axes._axes.Axes, optional) – Matplotlib axis object for plotting.

  • figsize (tuple, optional) – Figure size. Default is (16, 4).

  • time_col (str, optional) – Column name for time axis. Default is “t”.

  • t_axis (list or np.ndarray, optional) – Time axis values.

  • TR (float, optional) – Repetition time (TR) of the fMRI scan. Default is 0.105 seconds.

  • **kwargs (dict) – Additional plotting parameters.

Returns:

Timecourse prediction plot.

Return type:

matplotlib.figure.Figure

regress_input(**kwargs)[source]

Perform event regression on input data

Runs event regression for all subjects in the dataset.

Parameters:

**kwargs (dict) – Additional keyword arguments for processing.

regress_runs(df_func, df_onsets, basename=None, final_ev=True, make_figure=False, plot_kw={}, reg_kw={}, **kwargs)[source]

Regress out events per run

Performs event regression separately for each run.

Parameters:
  • df_func (pd.DataFrame) – Functional time series data.

  • df_onsets (pd.DataFrame) – Event onsets for each run.

  • basename (str, optional) – Basename for saving figures. Default is None.

  • final_ev (bool, optional) – Whether this is the final event to be regressed. Default is True.

  • make_figure (bool, optional) – Whether to generate plots. Default is False.

  • plot_kw (dict, optional) – Additional plotting parameters.

  • reg_kw (dict, optional) – Additional regression parameters.

  • **kwargs (dict) – Additional keyword arguments.

Returns:

Functional data with event regressors removed.

Return type:

pd.DataFrame

regress_subjects(df_func, df_onsets, evs=None, ses=None, reg_kw={}, **kwargs)[source]

Regress out events per subject

Performs event regression separately for each subject.

Parameters:
  • df_func (pd.DataFrame) – Functional time series data.

  • df_onsets (pd.DataFrame) – Event onsets for each subject.

  • evs (list, str, optional) – List of event types to regress out. Default is None (all events).

  • ses (int, optional) – Session identifier, if applicable.

  • reg_kw (dict, optional) – Additional regression parameters.

  • **kwargs (dict) – Additional keyword arguments.

Returns:

Functional data with event regressors removed.

Return type:

pd.DataFrame

regress_tasks(df_func, df_onsets, basename=None, reg_kw={}, **kwargs)[source]

Regress out events per task

Performs event regression separately for each task.

Parameters:
  • df_func (pd.DataFrame) – Functional time series data.

  • df_onsets (pd.DataFrame) – Event onsets for each task.

  • basename (str, optional) – Basename for saving figures. Default is None.

  • reg_kw (dict, optional) – Additional regression parameters.

  • **kwargs (dict) – Additional keyword arguments.

Returns:

Functional data with event regressors removed.

Return type:

pd.DataFrame

classmethod single_regression(func, onsets, reg_kw={}, **kwargs)[source]

Regress out events per subject

Performs event regression separately for each subject.

Parameters:
  • df_func (pd.DataFrame) – Functional time series data.

  • df_onsets (pd.DataFrame) – Event onsets for each subject.

  • evs (list, str, optional) – List of event types to regress out. Default is None (all events).

  • ses (int, optional) – Session identifier, if applicable.

  • reg_kw (dict, optional) – Additional regression parameters.

  • **kwargs (dict) – Additional keyword arguments.

Returns:

Functional data with event regressors removed.

Return type:

pd.DataFrame

class lazyfmri.preproc.Freq(func, *args, **kwargs)[source]

Bases: object

plot_freq(**kwargs)[source]
plot_timecourse(**kwargs)[source]
class lazyfmri.preproc.ICA[source]

Bases: object

Wrapper around scikit-learn’s FastICA, with a few visualization options. The basic input needs to be a pandas.DataFrame or numpy.ndarray describing a 2D dataset (e.g., the output of linescanning.dataset.Dataset or linescanning.dataset.ParseFuncFile).

Parameters:
  • subject (str, optional) – Subject ID to use when saving figures (e.g., sub-001)

  • data (pd.DataFrame, np.ndarray) – Dataset to be ICA’d in the format if <time,voxels>

  • n_components (int, optional) – Number of components to use, by default 10

  • filter_confs (float, optional) – Specify a high-pass frequency cut off to retain task-related frequencies, by default 0.02. If you do not want to high-pass filter the components, set filter_confs=None and keep_comps to the the components you want to retain (e.g., keep_comps=[0,1] to retain the first two components)

  • keep_comps (list, optional) – Specify a list of components to keep from the data, rather than all high-pass components. If filter_confs=None, but keep_comps is given, no high-pass filtering is applied to the components. If filter_confs=None & keep_comps=None, an error will be thrown. You must either specify filter_confs and/or keep_comps

  • verbose (bool, optional) – Turn on verbosity; prints some stuff to the terminal, by default False

  • TR (float, optional) – Repetition time or sampling rate, by default 0.105

  • save_as (str, optional) – Path pointing to the location where to save the figures. sub-<subject>_run-{self.run}_desc-ica.{self.save_ext}), by default None

  • session (int, optional) – Session ID to use when saving figures (e.g., 1), by default 1

  • run (int, optional) – Run ID to use when saving figures (e.g., 1), by default 1

  • summary_plot (bool, optional) – Make a figure regarding the efficacy of the ICA denoising, by default False

  • melodic_plot (bool, optional) – Make a figure regarding the information about the components themselves, by default False

  • ribbon (tuple, optional) – Range of gray matter voxels. If None, we’ll check the efficacy of ICA denoising over the average across the data, by default None

  • save_ext (str, optional) – Extension to use when saving figures, by default “svg”

Example

from lazyfmri.preproc import ICA

# intialize
ica_obj = ICA(
    data_obj.hp_zscore_df,
    subject=f"sub-{sub}",
    session=ses,
    run=3,
    n_components=10,
    TR=data_obj.TR,
    filter_confs=0.18,
    keep_comps=1,
    verbose=True,
    ribbon=None
)

# actually run the regression
ica_obj.regress()
melodic()[source]

Plot information about the components from the ICA. For each component until plot_comps, plot the 2D spatial profile of the component, its timecourse, and its power spectrum. If zoom_freq=True, we’ll add an extra subplot next to the power spectrum which contains a zoomed in version of the power spectrum with zoom_lim as limits.

Parameters:
  • color (str, tuple, optional) – Color for all subplots, by default “#6495ED”

  • zoom_freq (bool, optional) – Add a zoomed in version of the power spectrum, by default False

  • task_freq (float, optional) – If zoom_freq=True, add a vertical line where the task-frequency (task_freq) should be, by default 0.05

  • zoom_lim (list, optional) – Limits for the zoomed in power spectrum, by default [0,0.5]

  • plot_comps (int, optional) – Limit the number of plots being produced in case you have a lot of components, by default 10

Example

ica_obj.melodic(
    # color="r",
    zoom_freq=True,
    zoom_lim=[0,0.25]
)
regress()[source]
summary(**kwargs)[source]

Create a plot containing the power spectra of all components, the power spectra of the average GM-voxels (or all voxels, depending on the presence of gm_voxels before and after ICA, as well as the averaged timecourses before and after ICA

class lazyfmri.preproc.RegressOut(data, regressors, **kwargs)[source]

Bases: object

lazyfmri.preproc.get_freq()[source]

Create power spectra of input timeseries with the ability to select implementations from nitime. Fourier transform is implemented as per J. Siero’s implementation.

Parameters:
  • func (np.ndarray) – Array of shape(timepoints,)

  • TR (float, optional) – Repetition time, by default 0.105

  • spectrum_type (str, optional) – Method for extracting power spectra, by default ‘psd’. Must be one of ‘mtaper’, ‘fft’, ‘psd’, or ‘periodogram’, as per nitime’s implementations.

  • clip_power (_type_, optional) – _description_, by default None

Returns:

  • freq – numpy.ndarray representing the frequencies

  • power – numpy.ndarray representing the power spectra

Raises:

ValueError – If invalid spectrum_type is given. Must be one of psd, mtaper, fft, or periodogram.

lazyfmri.preproc.highpass_dct()[source]

Discrete cosine transform (DCT) is a basis set of cosine regressors of varying frequencies up to a filter cutoff of a specified number of seconds. Many software use 100s or 128s as a default cutoff, but we encourage caution that the filter cutoff isn’t too short for your specific experimental design. Longer trials will require longer filter cutoffs. See this paper for a more technical treatment of using the DCT as a high pass filter in fMRI data analysis (https://canlab.github.io/_pages/tutorials/html/high_pass_filtering.html).

Parameters:
  • func (np.ndarray) – <n_voxels, n_timepoints> representing the functional data to be fitered

  • lb (float, optional) – cutoff-frequency for low-pass (default = 0.01 Hz)

  • TR (float, optional) – Repetition time of functional run, by default 0.105

  • modes_to_remove (int, optional) – Remove first X cosines

Returns:

  • dct_data (np.ndarray) – array of shape(n_voxels, n_timepoints)

  • cosine_drift (np.ndarray) – Cosine drifts of shape(n_scans, n_drifts) plus a constant regressor at cosine_drift[:, -1]

Notes

  • High-pass filters remove low-frequency (slow) noise and pass high-freqency signals.

  • Low-pass filters remove high-frequency noise and thus smooth the data.

  • Band-pass filters allow only certain frequencies and filter everything else out

  • Notch filters remove certain frequencies

lazyfmri.preproc.lowpass_savgol()[source]

The Savitzky-Golay filter is a low pass filter that allows smoothing data. To use it, you should give as input parameter of the function the original noisy signal (as a one-dimensional array), set the window size, i.e. n° of points used to calculate the fit, and the order of the polynomial function used to fit the signal. We might be interested in using a filter, when we want to smooth our data points; that is to approximate the original function, only keeping the important features and getting rid of the meaningless fluctuations. In order to do this, successive subsets of points are fitted with a polynomial function that minimizes the fitting error.

The procedure is iterated throughout all the data points, obtaining a new series of data points fitting the original signal. If you are interested in knowing the details of the Savitzky-Golay filter, you can find a comprehensive description [here](https://en.wikipedia.org/wiki/Savitzky%E2%80%93Golay_filter).

Parameters:
  • func (np.ndarray) – <n_voxels, n_timepoints> representing the functional data to be fitered

  • window_length (int) – Length of window to use for filtering. Must be an uneven number according to the scipy-documentation (default = 7)

  • poly_order (int) – Order of polynomial fit to employ within window_length. Default = 3

Returns:

<n_voxels, n_timepoints> from which high-frequences have been removed

Return type:

np.ndarray

Notes

  • High-pass filters remove low-frequency (slow) noise and pass high-freqency signals.

  • Low-pass filters remove high-frequency noise and thus smooth the data.

  • Band-pass filters allow only certain frequencies and filter everything else out

  • Notch filters remove certain frequencies