I’ve encountered with the
ROIExtraction
step in MetaboAnalystR.
I currently have 25 datasets mzML files (1800s/30min acquisition time each).
Issue: ROI extraction takes significantly longer than expected.
Attempted solution: Tested RStudio’s parallel processing but instability occurs during large batches.
Is there any recommended strategies to accelerate processing for large time-series datasets? Or any best practices for configuring stable parallelization in MetaboAnalystR?
Thank you for the query. The ROI is only related to parameter optimization step which is typically performed on the spectra from the pooled QC samples (usually 3~5). The QCs are a pooled mixture of all patients’ samples usually with better signals for training / tuning. The optimized parameters will then be applied to all the spectra for peak picking. If your data do not have pooled QCs, you can choose a few reference spectra for each groups for parameter tuning. Using ALL spectra is unnecessary and will be very resource / time consuming.