Skip to main content

GREGoR Data Access for the Scientific Community

GREGoR Consortium Research Centers and Partners collect clinical, phenotype, and molecular data that is combined to form the GREGoR Consortium Dataset. The GREGoR Dataset is registered with the database of Genotypes and Phenotypes (dbGaP) and released to the scientific community via controlled access on NHGRI’s AnVIL platform

Back to top

The GREGoR Dataset

The GREGoR v1 Dataset includes family structure, phenotype, short read WGS, short read WES, and RNA-seq data. Future data releases will include additional molecular data types and information about case status and genetics findings.

GREGoR Dataset characteristics by release version

GREGoR release version phs003047.v1.p1
Release Date Sep 1, 2023

Number of participants1
     Short read WGS
     Short read WES
     RNA-seq

2512
2438
997
183

Consent groups2 GRU, HMB
Genome build GRCh38
GREGoR Data Model Version 1.1
Size (TB) 72.9

1WGS: Whole Genome Sequencing; WES: Whole Exome Sequencing
2GRU: General Research Use; HMB: Health/Medical/Biomedical

Phenotypes in the GREGoR Dataset

Plot of HPO Term Counts for the first GREGoR dataset

Back to top

Accessing GREGoR Data

There are two main paths for researchers to apply for access to GREGoR Data released to the scientific community. The same data and mechanisms to interface with the data (e.g. the same AnVIL workspaces) are available regardless of which path is used to submit a data access request. The process for application, renewal, and approval are what differ between these paths.

  1. dbGaP: GREGoR genomic data and phenotypic data are made available to the scientific community via dbGaP and in the AnVIL platform. For GREGoR Data, users should submit Data Access Requests for controlled-access data in dbGaP accession number phs003047 by following the NIH Scientific Data Sharing instructions for How to Request and Access Datasets from dbGaP.
  2. DUOS: GREGoR genomic data and phenotypic data are also made available to the scientific community via the Data Use Oversight System (DUOS) and in the AnVIL platform. For GREGoR Data, users should submit Data Access Requests (DARs) in DUOS by selecting datasets associated with phs003047 and by following the steps in How do I make a data access request in DUOS?. For additional information, see also What is DUOS?.

Alternatively, the GREGoR Consortium offers a Partner Membership tier to enable investigators to actively participate in and contribute to the scientific activities and mission of the GREGoR Consortium. For more information, please see the GREGoR Partner Members webpage.

Back to top

Working with GREGoR Data

GREGoR Data is released to the scientific community under controlled access on NHGRI’s AnVIL platform. AnVIL uses Terra, which operates on the Google Cloud Platform (GCP), to access data and run analyses. RStudio/Posit and Jupyter notebooks are available for interactive data analysis on AnVIL.

AnVIL analysis tutorials and resources:

Within AnVIL, GREGoR Data is separated into workspaces by participant consent group. The data in these workspaces is structured as tables that conform to the Consortium Data Model. Conceptually, researchers can think of this structure as a relational database and interrogate the data using approaches such as joining and filtering statements across the tables. The tables include pointers to the molecular data files (such as .bams and .vcfs) that may be used as inputs for running bioinformatic workflows.

To begin working with the GREGoR Data, we recommend researchers

  1. Set up an AnVIL account (which is the same as a Terra account)
  2. Link their AnVIL account to their eRA commons ID
  3. Clone the release workspace to conduct analysis
Back to top

Additional resources

Back to top

Please provide feedback on GREGoR Data!

We are very interested in making the GREGoR Dataset broadly useful. Please let us know how we’re doing or how we might improve the dataset by contacting the GREGoR Data Coordinating Center.

Back to top