| Title | Linking big biomedical datasets to modular analysis with Portable Encapsulated Projects. |
| Publication Type | Journal Article |
| Year of Publication | 2021 |
| Authors | Sheffield NC, Stolarczyk M, Reuter VP, Rendeiro AF |
| Journal | Gigascience |
| Volume | 10 |
| Issue | 12 |
| Date Published | 2021 Dec 06 |
| ISSN | 2047-217X |
| Keywords | Computational Biology, Documentation, Metadata, Software |
| Abstract | BACKGROUND: Organizing and annotating biological sample data is critical in data-intensive bioinformatics. Unfortunately, metadata formats from a data provider are often incompatible with requirements of a processing tool. There is no broadly accepted standard to organize metadata across biological projects and bioinformatics tools, restricting the portability and reusability of both annotated datasets and analysis software. RESULTS: To address this, we present the Portable Encapsulated Project (PEP) specification, a formal specification for biological sample metadata structure. The PEP specification accommodates typical features of data-intensive bioinformatics projects with many biological samples. In addition to standardization, the PEP specification provides descriptors and modifiers for project-level and sample-level metadata, which improve portability across both computing environments and data processing tools. PEPs include a schema validator framework, allowing formal definition of required metadata attributes for data analysis broadly. We have implemented packages for reading PEPs in both Python and R to provide a language-agnostic interface for organizing project metadata. CONCLUSIONS: The PEP specification is an important step toward unifying data annotation and processing tools in data-intensive biological research projects. Links to tools and documentation are available at http://pep.databio.org/. |
| DOI | 10.1093/gigascience/giab077 |
| Alternate Journal | Gigascience |
| PubMed ID | 34890448 |
| PubMed Central ID | PMC8673555 |
| Grant List | R35 GM128636 / GM / NIGMS NIH HHS / United States |