Skip to main content

GREGoR Data Access for the Scientific Community

Look for Variants in the GREGoR Data Set at the GREGoR Variant Browser

GREGoR Consortium Research Centers and Partners collect clinical, phenotype, and molecular data that is combined to form the GREGoR Dataset. The GREGoR Dataset is registered with the database of Genotypes and Phenotypes (dbGaP) and released to the scientific community via controlled access on NHGRI’s AnVIL platform. The dbGaP study webpage for GREGoR (phs003047) includes additional study design and enrollment information.

Back to top

The GREGoR Dataset

The GREGoR v1 Dataset includes family structure, phenotype, short read WGS, short read WES, and short read RNA-seq data. Future data releases will include additional molecular data types and information about case status and genetics findings.

GREGoR Dataset characteristics by release version

GREGoR release versionphs003047.v1.p1
Release DateSep 1, 2023
Number of participants
Number of families
Consent groups1
2,512
990
GRU, HMB
Available experimental data2
     Short read WGS
     Short read WES
     Short read RNA-seq

1,441
997
183
Size (TB)65.7
GREGoR Data Model Version1.1
Genome buildGRCh38
Methods documentationView pdf
Genomic variant site filesDownload location

1GRU: General Research Use; HMB: Health/Medical/Biomedical
2WGS: Whole Genome Sequencing; WES: Whole Exome Sequencing

Phenotypes in the GREGoR Dataset

Plot of HPO Term Counts for the first GREGoR dataset

Back to top

Accessing GREGoR data

Researchers from the scientific community can apply for controlled access GREGoR data stored on the AnVIL platform in one of two ways, described briefly below. For both cases, applicants should submit Data Access Requests by selecting datasets associated with dbGaP accession number phs003047.

Note: regardless of which way you apply, access will be granted to the same data in the same AnVIL workspaces. The process for application, renewal, and approval are what differ.

  1. dbGaP: Follow the NIH Scientific Data Sharing instructions for How to Request and Access Datasets from dbGaP.
  2. Data Use Oversight System (DUOS): Follow the steps in How do I make a data access request in DUOS?. See also What is DUOS?.

Alternatively, the GREGoR Consortium offers a Partner Membership opportunity to enable investigators to actively participate in and contribute to the scientific activities and mission of the GREGoR Consortium. For more information, please see the GREGoR Partner Members webpage.

Back to top

Working with GREGoR data

AnVIL is the primary repository for GREGoR data. AnVIL provides controlled-access data storage and a cloud-based analysis environment for researchers. Links to AnVIL platform tutorials and related resources are available at AnVIL Resources.

The GREGoR Consortium also regularly contributes data to several resources that support rare disease research which are described in Tools and Resources to interact with GREGoR data.

Back to top

Future Data Releases

The GREGoR Consortium continues to aggregate participant, family and phenotype data, as well as short-read DNA sequencing data, to share with the scientific community. GREGoR Research Centers and Partner Members are expanding the GREGoR Data Model to support additional data types, including short-read RNA-Seq, ATAC-Seq and long-read data generated by Oxford Nanopore Technologies (ONT) and Pacific BioSciences (PacBio) platforms. 

The GREGoR Data Coordinating Center is harmonizing short-read whole genome sequencing data using open-source workflows on the AnVIL platform (DRAGEN-GATK for data pre-processing and initial variant calling; Genomic Variant Store workflow for a joint callset of single nucleotide, short insertions and deletion variants). This harmonized data will be included in a future release.

The GREGoR Dataset will continue to expand with additional participant, family, phenotype and other data types available for analysis. We anticipate that periodic releases will continue throughout the life of the Consortium, and will remain in NIH data repositories as a valuable scientific resource.

Back to top

Additional resources

Back to top

Please provide feedback on the GREGoR Dataset!

We are very interested in making the GREGoR Dataset broadly useful. Please let us know how we’re doing or how we might improve the dataset by contacting the GREGoR Data Coordinating Center (select “Data” from the Topic dropdown menu).

Back to top