GREGoR Consortium Research Centers and Partners collect clinical, phenotype, and molecular data that is combined to form the GREGoR Consortium Dataset. The GREGoR Dataset is registered with the database of Genotypes and Phenotypes (dbGaP) and released to the scientific community via controlled access on NHGRI’s AnVIL platform.
Back to topThe GREGoR Dataset
The GREGoR v1 Dataset includes family structure, phenotype, short read WGS, short read WES, and RNA-seq data. Future data releases will include additional molecular data types and information about case status and genetics findings.
GREGoR Dataset characteristics by release version
GREGoR release version | phs003047.v1.p1 |
---|---|
Release Date | Sep 1, 2023 |
Number of participants1 |
2512 |
Consent groups2 | GRU, HMB |
Genome build | GRCh38 |
GREGoR Data Model Version | 1.1 |
Size (TB) | 72.9 |
1WGS: Whole Genome Sequencing; WES: Whole Exome Sequencing
2GRU: General Research Use; HMB: Health/Medical/Biomedical
Phenotypes in the GREGoR Dataset
Back to topAccessing GREGoR Data
There are two main paths for researchers to apply for access to GREGoR Data released to the scientific community. The same data and mechanisms to interface with the data (e.g. the same AnVIL workspaces) are available regardless of which path is used to submit a data access request. The process for application, renewal, and approval are what differ between these paths.
- dbGaP: GREGoR genomic data and phenotypic data are made available to the scientific community via dbGaP and in the AnVIL platform. For GREGoR Data, users should submit Data Access Requests for controlled-access data in dbGaP accession number phs003047 by following the NIH Scientific Data Sharing instructions for How to Request and Access Datasets from dbGaP.
- DUOS: GREGoR genomic data and phenotypic data are also made available to the scientific community via the Data Use Oversight System (DUOS) and in the AnVIL platform. For GREGoR Data, users should submit Data Access Requests (DARs) in DUOS by selecting datasets associated with phs003047 and by following the steps in How do I make a data access request in DUOS?. For additional information, see also What is DUOS?.
Alternatively, the GREGoR Consortium offers a Partner Membership tier to enable investigators to actively participate in and contribute to the scientific activities and mission of the GREGoR Consortium. For more information, please see the GREGoR Partner Members webpage.
Back to topWorking with GREGoR Data
GREGoR Data is released to the scientific community under controlled access on NHGRI’s AnVIL platform. AnVIL uses Terra, which operates on the Google Cloud Platform (GCP), to access data and run analyses. RStudio/Posit and Jupyter notebooks are available for interactive data analysis on AnVIL.
AnVIL analysis tutorials and resources:
- Getting started on AnVIL book
- Terra tutorial on Jupyter notebooks
- Terra documentation on RStudio/Posit
- Using the R/Bioconductor “AnVIL” package
Within AnVIL, GREGoR Data is separated into workspaces by participant consent group. The data in these workspaces is structured as tables that conform to the Consortium Data Model. Conceptually, researchers can think of this structure as a relational database and interrogate the data using approaches such as joining and filtering statements across the tables. The tables include pointers to the molecular data files (such as .bams and .vcfs) that may be used as inputs for running bioinformatic workflows.
To begin working with the GREGoR Data, we recommend researchers
- Set up an AnVIL account (which is the same as a Terra account)
- Link their AnVIL account to their eRA commons ID
- Clone the release workspace to conduct analysis
Additional resources
- GREGoR poster from the 2023 ASHG annual meeting
- GREGoR Publications
- GREGoR AnVIL study webpage
- GREGoR dbGaP study webpage - phs003047
Please provide feedback on GREGoR Data!
We are very interested in making the GREGoR Dataset broadly useful. Please let us know how we’re doing or how we might improve the dataset by contacting the GREGoR Data Coordinating Center.
Back to top