Word Rate Feature Tutorial

This tutorial shows how to train encoding models using word rate features with the LeBel assembly. Word rate features are simple but effective baselines that measure the rate of word presentation.

Overview

Word rate features capture the temporal dynamics of language presentation by measuring how many words are presented per time unit. This is one of the simplest but an effective feature for brain encoding models.

Key Components

Step-by-Step Tutorial

1. Load the Assembly

from encoding.assembly.assembly_loader import load_assembly

# Load the pre-packaged LeBel assembly
assembly = load_assembly("assembly_lebel_uts03.pkl")

2. Create Word Rate Feature Extractor

from encoding.features.factory import FeatureExtractorFactory

extractor = FeatureExtractorFactory.create_extractor(
    modality="wordrate",
    model_name="wordrate",
    config={},
    cache_dir="cache",
)

3. Set Up Downsampler and Model

from encoding.downsample.downsampling import Downsampler
from encoding.models.nested_cv import NestedCVModel

downsampler = Downsampler()
model = NestedCVModel(model_name="ridge_regression")

4. Configure Training Parameters

# FIR delays for hemodynamic response modeling
fir_delays = [1, 2, 3, 4]

# Trimming configuration for LeBel dataset
trimming_config = {
    "train_features_start": 10,
    "train_features_end": -5,
    "train_targets_start": 0,
    "train_targets_end": None,
    "test_features_start": 50,
    "test_features_end": -5,
    "test_targets_start": 40,
    "test_targets_end": None,
}

downsample_config = {}

5. Create and Run Trainer

from encoding.trainer import AbstractTrainer

trainer = AbstractTrainer(
    assembly=assembly,
    feature_extractors=[extractor],
    downsampler=downsampler,
    model=model,
    fir_delays=fir_delays,
    trimming_config=trimming_config,
    use_train_test_split=True,
    logger_backend="wandb",
    wandb_project_name="lebel-wordrate",
    dataset_type="lebel",
    results_dir="results",
    downsample_config=downsample_config,
)

metrics = trainer.train()
print(f"Median correlation: {metrics.get('median_score', float('nan')):.4f}")

Understanding Word Rate Features

  1. Counting words per TR: The assembly pre-computes word rates for each TR
  2. No additional processing needed: Word rates are already aligned with brain data
  3. Simple but effective: Captures temporal dynamics of language presentation

The word rate extractor simply returns the pre-computed word rates from the assembly, making it the fastest feature type to compute.

Key Parameters

Training Configuration