Dataset

This file contains functions and classes related to reading and formatting input data from functional files, experiment files (Exptools2), eye-tracker files (.edf-files from EyeLink), and physiology-files from Philips.

class lazyfmri.dataset.Dataset[source]

Bases: ParseFuncFile, SetAttributes

Main class for retrieving, formatting, and preprocessing of all datatypes including fMRI (2D), eyetracker (.edf), physiology (.log [WIP]), and experiment files derived from Exptools2 (.tsv). If you leave subject and run empty, these elements will be derived from the file names. So if you have BIDS-like files, leave them empty and the dataframe will be created for you with the correct subject/run IDs.

Inherits from lazyfmri.dataset.ParseFuncFile, so all arguments from that class are available and are passed on via kwargs. Only func_file and verbose are required. The first one is necessary because if the input is an h5-file, we’ll set the attributes accordingly. Otherwise lazyfmri.dataset.ParseFuncFile is invoked. verbose is required for aesthetic reasons. Given that lazyfmri.dataset.ParseFuncFile inherits in turn from lazyfmri.dataset.ParseExpToolsFile, you can pass the arguments for that class here as well.

Parameters:
  • func_file (str, list) – path or list of paths pointing to the output file of the experiment

  • verbose (bool, optional) – Print details to the terminal, default is False

Example

from fmriproc import dataset
from lazyfmri import utils
func_dir = "/some/dir"
exp = utils.get_file_from_substring("tsv", func_dir)
funcs = utils.get_file_from_substring("bold.mat", func_dir)

# only cut from SR-runs
delete_first = 100
delete_last = 0
window = 19
order = 3

data = dataset.Dataset(
    funcs,
    deleted_first_timepoints=delete_first,
    deleted_last_timepoints=delete_last,
    tsv_file=exp,
    verbose=True)

# retrieve data
fmri = data.fetch_fmri()
onsets = data.fetch_onsets()
fetch_accuracy(strip_index=False)[source]
fetch_fmri(strip_index=False, dtype=None)[source]
fetch_onsets(strip_index=False, button=True)[source]
fetch_physio(strip_index=False)[source]
fetch_rts(strip_index=False)[source]
fetch_trace(strip_index=False)[source]
from_hdf(input_file=None)[source]
to4D(fname=None, desc=None, dtype=None, mask=None)[source]
to_hdf(h5_file=None, overwrite=False)[source]
class lazyfmri.dataset.DatasetCollector(dataset_objects)[source]

Bases: object

class lazyfmri.dataset.ParseExpToolsFile[source]

Bases: ParseEyetrackerFile, SetAttributes

Class for parsing tsv-files created during experiments with Exptools2. The class will read in the file, read when the experiment actually started, correct onset times for this start time and time deleted because of removing the first few volumes (to do this correctly, set the TR and deleted_first_timepoints). You can also provide a numpy array/file containing eye blinks that should be added to the onset times in real-world time (seconds). In principle, it will return a pandas DataFrame indexed by subject and run that can be easily concatenated over runs. This function relies on the naming used when programming the experiment. In the session.py file, you should have created phase_names=[‘iti’, ‘stim’]; the class will use these things to parse the file.

Parameters:
  • tsv_file (str, list) – path pointing to the output file of the experiment

  • subject (int) – subject number in the returned pandas DataFrame (should start with 1, …, n)

  • run (int) – run number you’d like to have the onset times for

  • button (bool) – boolean whether to include onset times of button responses (default is false). [‘space’] will be ignored as response

  • TR (float) – repetition time to correct onset times for deleted volumes

  • deleted_first_timepoints (int) – number of volumes to delete to correct onset times for deleted volumes. Can be specified for each individual run if tsv_file is a list

  • use_bids (bool, optional) – If true, we’ll read BIDS-components such as ‘sub’, ‘run’, ‘task’, etc from the input file and use those as indexers, rather than sequential 1,2,3.

  • funcs (str, list, optional) – List of functional files that is being passed down down to lazyfmri.dataset.ParseEyetrackerFile. Required for correct resampling to functional space

  • edfs (str, list, optional) – List of eyetracking output files that is being passed down down to lazyfmri.dataset.ParseEyetrackerFile.

  • verbose (bool, optional) – Print details to the terminal, default is False

  • phase_onset (int, optional) – Which phase of exptools-trial should be considered the actual stimulus trial. Usually, phase_onset=0 means the interstimulus interval. Therefore, default = 1

  • stim_duration (str, int, optional) – If desired, add stimulus duration to onset dataframe. Can be one of ‘None’, ‘stim’ (to use duration from exptools’ log file) or any given integer

  • add_events (str, list, optional) – Add additional events to onset dataframe. Must be an existing column in the exptools log file. For intance, responses and event_type = stim are read in by default, but if we have a separate column containing the onset of some target (e.g., ‘target_onset’), we can add these times to the dataframe with add_events=’target_onset’.

  • event_names (str, list, optional) – Custom names for manually added events through add_events if the column names are not the names you want to use in the dataframe. E.g., if I find target_onset too long of a name, I can specify event_names=’target’. If add_events is a list, then event_names must be a list of equal length if custom names are desired. By default we’ll take the names from add_events

  • RTs (bool, optional) – If we have a design that required some response to a stimulus, we can request the reaction times. Default = False

  • RT_relative_to (str, optional) – If RTs=True, we need to know relative to what time the button response should be offset. Only correct responses are considered, as there’s a conditional statement that requires the present of the reference time (e.g., target_onset) and button response. If there’s a response but no reference time, the reaction time cannot be calculated. If you do not have a separate reference time column, you can specify RT_relative_to=’start’ to calculate the reaction time relative to onset time. If RT_relative_to != ‘start’, I’ll assume you had a target in your experiment in X/n_trials. From this, we can calculate the accuracy and save that to self.df_accuracy, while reaction times are saved in self.df_rts

  • button_duration (float, int, optional) – Set duration for button event

  • response_window (float, int, optional) – Set window in which a response is counted

  • merge (bool, optional) – Merge the dataframes containing responses and stimulus onsets. Default = True. Onset times will be sorted in an ascending order. Select parts of dataframes with lazyfmri.utils.select_from_df()

  • resp_as_cov (bool, optional) – If you have a design where you have button presses in half of the trials, you can add a covariate consisting of -1/1 for the stimulus onsets where there was a response or not. This way, the response event will not steal variance from the stimulus event when fitting using lazyfmri.fitting.NideconvFitter(). Default = False

  • ev_onset (str, optional) – Sometimes experiments are coded such that the event name of interest is not called stim. Use this variable + phase_onset to extract the correct onset times

  • key_press (list, optional) – Specify a custom list of button presses to extract. Default = [‘b’]

  • expr (str, tuple, optional) – Add additional conditions for your extracted onset times. This follows the formulation of lazyfmri.utils.select_from_df(). E.g., expr=”movie_type != blank. Acts as an extra filter to extract the relevant onsets

  • filter_na (bool, optional) – Try to filter out NaN-events from the onset dataframe

Examples

from lazyfmri.dataset import ParseExpToolsFile
file = 'some/path/to/exptoolsfile.tsv'
parsed_file = ParseExpToolsFile(file, subject=1, run=1, button=True)
onsets = parsed_file.get_onset_df()
static array_to_df(array, columns=None, subject=1, key='RTs', run=1, set_index=False)[source]
events_per_run()[source]
events_single_run(run=1)[source]
get_accuracy(index=False)[source]

Return the indexed DataFrame containing reaction times

static get_events(df)[source]
get_onset_df(index=False)[source]

Return the indexed DataFrame containing onset times

get_responses(index=False)[source]

Return the indexed DataFrame containing reaction times

get_rts_df(index=False)[source]

Return the indexed DataFrame containing reaction times

static get_runs(df)[source]
static get_subjects(df)[source]
static index_accuracy(array, columns=None, subject=1, run=1, set_index=False)[source]
index_onset(array, columns=None, subject=1, run=1, task=None, set_index=False)[source]
onsets_to_fsl()[source]

This function creates a text file with a single column containing the onset times of a given condition. Such a file can be used for SPM or FSL modeling, but it should be noted that the onset times have been corrected for the deleted volumes at the beginning. So make sure your inputting the correct functional data in these cases.

Parameters:
  • fmt (str) – format for the onset file (default = 3-column format)

  • amplitude (int, float) – amplitude for stimulus vector

  • duration (int, float) – duration of the event; overwrite possible ‘duration’ column in onsets

  • output_dir (str) – path to output name for text file(s)

  • output_base (str) – basename for output file(s); should include full path. ‘<_task-{task}>_run-{run}_ev-{ev}.txt’ will be appended

  • from_event (bool) – take the event name as specified in the onset dataframe. By default, this is true. In some cases where your events consists of float numbers, it’s sometimes easier to number them consecutively. In that case, specify from_event=False`

Returns:

for each subject, task, and run, a text file for all events present in the onset dataframe (if only 1 task was present, this will be omitted)

Return type:

str

preprocess_exptools_file(tsv_file, task=None, run=1, delete_vols=0, phase_onset=1, duration=None)[source]
process_edf_file(**kwargs)[source]
process_exptools_files()[source]
class lazyfmri.dataset.ParseEyetrackerFile[source]

Bases: SetAttributes

Class for parsing edf-files created during experiments with Exptools2. The class will read in the file, read when the experiment actually started, correct onset times for this start time and time deleted because of removing the first few volumes (to do this correctly, set the TR and deleted_first_timepoints). You can also provide a numpy array/file containing eye blinks that should be added to the onset times in real-world time (seconds). In principle, it will return a pandas DataFrame indexed by subject and run that can be easily concatenated over runs. This function relies on the naming used when programming the experiment. In the session.py file, you should have created phase_names=['iti', 'stim']; the class will use these things to parse the file.

Parameters:
  • edf_file (str, list) – path pointing to the output file of the experiment; can be a list of multiple. Ideally, all these files belong to 1 subject, otherwise it tries to write everything to 1 file, which is too much

  • subject (int) – subject number in the returned pandas DataFrame (should start with 1, …, n)

  • run (int) – run number you’d like to have the onset times for

  • low_pass_pupil_f (float, optional) – Low-pass cutoff frequency

  • high_pass_pupil_f (float, optional) – High-pass cutoff frequency

  • TR (float, optional (fMRI)) – Repetition time of experiment. Together with nr_vols, used to determine the period that needs to be extracted after onset of the first trial. Default = None

  • nr_vols (int, optional (fMRI)) – Together with TR, used to determine the period that needs to be extracted after onset of the first trial. Default = None

  • deleted_first_timepoints (int) – number of volumes to delete to correct onset times for deleted volumes

  • h5_file (str, optional) – Custom path to h5-file in which to store the complete output from edf_file. If nothing’s specified, it’ll output an eye.h5-file in the directory of the first edf-file in the list.

Examples

from lazyfmri.dataset import ParseExpToolsFile
file = 'some/path/to/exptoolsfile.tsv'
parsed_file = ParseExpToolsFile(
    file,
    subject=1,
    run=1,
    button=True
)

onsets = parsed_file.get_onset_df()

Notes

If you have self-paced experiments or want to extract the full data from the eyetracker, keep TR and nr_vols None.

check_input(in_files)[source]
concat_dataframes()[source]
define_hdf_file()[source]
fetch_extracted_data(run, task, TR, nr_vols)[source]
fetch_eye_func_time()[source]
fetch_eye_tracker_time()[source]
fetch_relevant_info(task=None, nr_vols=None, alias=None, TR=None, save_as=None)[source]
fetch_saccades()[source]
get_base_name(subID=None, sesID=None, taskID=None)[source]
get_bids_info(file, i)[source]
get_tr(run=None)[source]
get_vols(i)[source]
plot_trace_and_heatmap(df, fname=None, screen_size=(1920, 1080), scale='screen')[source]
preprocess_edf_files()[source]
set_indices()[source]
set_task_index(df, task=None)[source]
vols(func_file)[source]
write_edf_to_hdf()[source]
class lazyfmri.dataset.ParseFuncFile[source]

Bases: ParseExpToolsFile, ParsePhysioFile

Class for parsing func-files created with Luisa’s reconstruction. It can do filtering, conversion to percent signal change, and create power spectra. It is supposed to look similar to lazyfmri.dataset.ParseExpToolsFile to make it easy to translate between the functional data and experimental data.

Parameters:
  • func_file (str, list) – path or list of paths pointing to the output file of the experiment

  • subject (int, optional) – subject number in the returned pandas DataFrame (should start with 1, …, n)

  • run (int, optional) – run number you’d like to have the onset times for

  • baseline (float, int, optional) – Duration of the baseline used to calculate the percent-signal change. This method is the default over psc_nilearn

  • baseline_units (str, optional) – Units of the baseline. Use seconds, sec, or s to imply that baseline is in seconds. We’ll convert it to volumes internally. If deleted_first_timepoints is specified, baseline will be corrected for that as well.

  • psc_nilearn (bool, optional) – Use nilearn method of calculating percent signal change. This method uses the mean of the entire timecourse, rather than the baseline period. Overwrites baseline and baseline_units. Default is False.

  • standardize (str, optional) – method of standardization (e.g., “zscore” or “psc”)

  • low_pass (bool, optional) – Temporally smooth the data. It’s a bit of a shame if this is needed

  • lb (float, optional) – lower bound for signal filtering

  • TR (float, optional) – repetition time to correct onset times for deleted volumes

  • deleted_first_timepoints (int, list, optional) – number of volumes deleted at the beginning of the timeseries. Can be specified for each individual run if func_file is a list

  • deleted_last_timepoints (int, list, optional) – number of volumes deleted at the end of the timeseries. Can be specified for each individual run if func_file is a list

  • window_size (int, optional) – size of window for rolling median and Savitsky-Golay filter

  • poly_order (int, optional) – The order of the polynomial used to fit the samples. polyorder must be less than window_length.

  • use_bids (bool, optional) – If true, we’ll read BIDS-components such as ‘sub’, ‘run’, ‘task’, etc from the input file and use those as indexers, rather than sequential 1,2,3.

  • verbose (bool, optional) – Print details to the terminal, default is False

  • retroicor (bool, optional) –

    WIP: implementation of retroicor, requires the specification of phys_file and phys_mat containing the output from

    the PhysIO-toolbox

    n_components: int, optional

    Number of components to use for WM/CSF PCA during ICA

  • select_component (int, optional) – If verbose=True and ICA=True, we’ll create a scree-plot of the PCA components. With this flag, you can re-run this call but regress out only this particular component. [Deprecated: filter_confs is much more effective]

  • filter_confs (float, optional) – High-pass filter the components from the components during ICA. This seems to be pretty effective. Default is 0.2Hz.

  • save_as (str, optional) – Directory + basename for several figures that can be created during the process

  • transpose (bool, optional) – The data needs to be in the format of <time,voxels>. We’ll be trying to force the input data into this format, but sometimes this breaks. This flag serves as an opportunity to flip whatever the default is for a particular input file (e.g., gii, npy, or np.ndarray), so that your final dataframe has the format it needs to have. For gifti-input, we transpose by default. transpose=True turns this transposing off. For npy-inputs, we do NOT transpose (we assume the numpy arrays are already in <time,voxels> format). transpose=True will transpose this input.

  • report (bool:) –

    Save a bunch of figures along the process, including:
    • Eyetracking fidelity

    • ICA components

    • tSNR before/after ICA

    Directory used is <save_as>/figures + basename that is derived from BIDS components.

Example

from lazyfmri import utils, dataset
func_file = utils.get_file_from_substring(
    f"run-1_bold.mat",
    opj('sub-001', 'ses-1', 'func')
)
func = utils.ParseFuncFile(
    func_file,
    subject=1,
    run=1,
    deleted_first_timepoints=100,
    deleted_last_timepoints=300
)

raw = func.get_raw(index=True)
psc = func.get_psc(index=True)
basic_qa(data, run=1, make_figure=False, save_as=None)[source]
get_data(filter_strategy=None, index=False, dtype='psc', ica=False)[source]
index_func(array, columns=None, subject=1, run=1, task=None, TR=0.105, set_index=False)[source]
preprocess_func_file(func_file, run=1, task=None, deleted_first_timepoints=0, deleted_last_timepoints=0, baseline=None, **kwargs)[source]
run_ica(task=None, save_as=None)[source]
to_nifti(func, fname=None)[source]
class lazyfmri.dataset.ParseGiftiFile[source]

Bases: object

Read a gifti-file into a dataframe similar to lazyfmri.dataset.ParseFuncFile. Also allows you to set the RepetitionTime in the metadata and rewrite the gifti-file. The final data is set as self.data, representing the numpy.ndarray-form of the gifti-file.

Parameters:
  • gifti_file (str) – Path pointing to gifti-file

  • set_tr (int, float, optional) – Set the TR in milliseconds in the metadata, by default None

  • *gii_args (dict) – Arguments passed to nb.gifti.GiftiDataArray

  • *gii_kwargs (dict) – Arguments passed to nb.gifti.GiftiDataArray

Raises:

ValueError – If gifti-file does not end with “.gii”.

Example

from lazyfmri import dataset
gii_file = "sub-01_ses-1_task-rest_space-fsnative_bold.gii"
obj = dataset.ParseGiftiFile(gii_file)
data = obj.data
get_tr(units='sec')[source]
set_metadata(tr=None)[source]
write_file(filename, tr=None, **kwargs)[source]
class lazyfmri.dataset.ParsePhysioFile[source]

Bases: object

In similar style to lazyfmri.dataset.ParseExpToolsFile and lazyfmri.dataset.ParseFuncFile, we use this class to read in physiology-files created with the PhysIO-toolbox (https://www.tnu.ethz.ch/en/software/tapas/ documentations/physio-toolbox) (via call_spmphysio for instance). Using the .mat-file created with PhysIO, we can also attempt to extract heart rate variability measures. If this file cannot be found, this operation will be skipped

Parameters:
  • physio_file (str) – path pointing to the regressor file created with PhysIO (e.g., call_spmphysio)

  • physio_mat (str) – path pointing to the .mat-file created with PhysIO (e.g., call_spmphysio)

  • subject (int) – subject number in the returned pandas DataFrame (should start with 1, …, n)

  • run (int) – run number you’d like to have the onset times for

  • TR (float) – repetition time to correct onset times for deleted volumes

  • orders (list) – list of orders used to create the regressor files (see call_spmphysio, but default = [2,2,2,]). This one is necessary to create the correct column names for the dataframe

  • deleted_first_timepoints (int, optional) – number of volumes deleted at the beginning of the timeseries

  • deleted_last_timepoints (int, optional) – number of volumes deleted at the end of the timeseries

Example

physio_file = opj(os.path.dirname(func_file), "sub-001_ses-1_task-SR_run-1_physio.txt")
physio_mat  = opj(os.path.dirname(func_file), "sub-001_ses-1_task-SR_run-1_physio.mat")
physio = utils.ParsePhysioFile(
    physio_file,
    physio_mat=physio_mat,
    subject=func.subject,
    run=func.run,
    TR=func.TR,
    deleted_first_timepoints=func.deleted_first_timepoints,
    deleted_last_timepoints=func.deleted_last_timepoints)

physio_df = physio.get_physio(index=False)
get_physio(index=True)[source]
preprocess_physio_file(physio_tsv, physio_mat=None, deleted_first_timepoints=0, deleted_last_timepoints=0)[source]
class lazyfmri.dataset.SetAttributes[source]

Bases: object

lazyfmri.dataset.check_input_is_list(obj, var=None, list_element=0, matcher='func_file')[source]
lazyfmri.dataset.filter_kwargs(ignore_kwargs, kwargs)[source]