CoreFlow: a computational platform for integration, analysis and modeling of complex biological data.
Pasculescu A, Schoof EM, Creixell P, Zheng Y, Olhovsky M, Tian R, So J, Vanderlaan RD, Pawson T, Linding R, Colwill K.
A major challenge in mass spectrometry and other large-scale applications is how to handle, integrate, and model the data that is produced. Given the speed at which technology advances and the need to keep pace with biological experiments, we designed a computational platform, CoreFlow, which provides programmers with a framework to manage data in real-time. It allows users to upload data into a relational database (MySQL), and to create custom scripts in high-level languages such as R, Python, or Perl for processing, correcting and modeling this data. CoreFlow organizes these scripts into project-specific pipelines, tracks interdependencies between related tasks, and enables the generation of summary reports as well as publication-quality images. As a result, the gap between experimental and computational components of a typical large-scale biology project is reduced, decreasing the time between data generation, analysis and manuscript writing. CoreFlow is being released to the scientific community as an open-sourced software package complete with proteomics-specific examples, which include corrections for incomplete isotopic labeling of peptides (SILAC) or arginine-to-proline conversion, and modeling of multiple/selected reaction monitoring (MRM/SRM) results.