This tutorial shows how to train encoding models using word rate features with the LeBel assembly. Word rate features are simple but effective baselines that measure the rate of word presentation.
Word rate features capture the temporal dynamics of language presentation by measuring how many words are presented per time unit. This is one of the simplest but an effective feature for brain encoding models.
from encoding.assembly.assembly_loader import load_assembly
# Load the pre-packaged LeBel assembly
assembly = load_assembly("assembly_lebel_uts03.pkl")
from encoding.features.factory import FeatureExtractorFactory
extractor = FeatureExtractorFactory.create_extractor(
modality="wordrate",
model_name="wordrate",
config={},
cache_dir="cache",
)
from encoding.downsample.downsampling import Downsampler
from encoding.models.nested_cv import NestedCVModel
downsampler = Downsampler()
model = NestedCVModel(model_name="ridge_regression")
# FIR delays for hemodynamic response modeling
fir_delays = [1, 2, 3, 4]
# Trimming configuration for LeBel dataset
trimming_config = {
"train_features_start": 10,
"train_features_end": -5,
"train_targets_start": 0,
"train_targets_end": None,
"test_features_start": 50,
"test_features_end": -5,
"test_targets_start": 40,
"test_targets_end": None,
}
downsample_config = {}
from encoding.trainer import AbstractTrainer
trainer = AbstractTrainer(
assembly=assembly,
feature_extractors=[extractor],
downsampler=downsampler,
model=model,
fir_delays=fir_delays,
trimming_config=trimming_config,
use_train_test_split=True,
logger_backend="wandb",
wandb_project_name="lebel-wordrate",
dataset_type="lebel",
results_dir="results",
downsample_config=downsample_config,
)
metrics = trainer.train()
print(f"Median correlation: {metrics.get('median_score', float('nan')):.4f}")
The word rate extractor simply returns the pre-computed word rates from the assembly, making it the fastest feature type to compute.
modality
: "wordrate" - specifies the feature typemodel_name
: "wordrate" - identifier for the extractorconfig
: {} - no additional configuration neededcache_dir
: "cache" - directory for caching (though word rates don't need caching)fir_delays
: [1, 2, 3, 4] - temporal delays to account for hemodynamic responsetrimming_config
: LeBel-specific trimming to avoid boundary effects