CAP 2016 – Genomic Variant Repository

A Novel Big Data Platform to Manage Genomic Variants in the Clinical Laboratory


Context: The role of clinical next generation sequencing continues to increase. A key goal for many institutions is to bring these genomic results to the bedside to improve patient outcomes. While standards have begun to emerge for the exchange of clinical genomic data, molecular results are often found in disparate data silos, which makes it difficult to obtain a complete history of genomic testing that can be used in clinical care. Design: Our institution performs clinical next generation sequencing at three separate, local laboratories. This project included the development of a JavaScript Object Notation (JSON)-formatted data model, named JSON Variant Format (JVF), and a Hadoop-based platform (Hortonworks, Santa Clara, CA, USA) for centralized data management. Results: After interpretation, VCF data are merged with clinical annotations in the JVF model. This format, based on the existing variant call file format, is readily serializable for Web Service integration. The annotated JVF data are submitted to a Web Service interface for validation and then placed in a highly distributed file store in Hadoop for long-term storage. An Elasticsearch index (Elastic, Mountain View, CA, USA) is also used for real-time queries and analysis. Conclusions: With increased interest in precision medicine initiatives, development of genomic data standards compatible with modern technologies are needed for efficient data exchange. This novel platform and data format allow for the storage of large volumes of standardized, genomic interpretations from separate laboratories while providing the sub-second queries needed for clinical use.

