Skip to main content

GREGoR Data for the Scientific Community

Look for Variants in the GREGoR Data Set at the GREGoR Variant Browser

GREGoR Consortium Research Centers and Partners collect clinical, phenotype, and molecular data that is combined to form the GREGoR Dataset. The GREGoR Dataset is registered with the database of Genotypes and Phenotypes (dbGaP) and released to the scientific community via controlled access on NHGRI’s AnVIL platform. The dbGaP study webpage for GREGoR (phs003047) includes additional study design and enrollment information.

Back to top

The GREGoR Dataset

The GREGoR Dataset includes family structure, phenotype, case status, genetics findings, short read whole exome sequencing (WES), whole genome sequencing (WGS), long read WGS, and short read RNA-seq data. Future data releases will include additional molecular data types and information.

To see what's new in a Release, please refer to the Release Notes and the characteristics by release table, below.

GREGoR Dataset characteristics by release version

GREGoR release versionR01R02
dbGap Accessionphs003047phs003047
Release DateSeptember 2023November 2024
Number of participants
Number of families
Consent groups1
2,512
990
GRU, HMB
7,394
3,059
GRU, HMB
Available experimental data2
     Short read WGS
     Short read WES
     Long read WGS
     Short read RNA-seq

1,441
997
0
183

5,180
2,242
214
539
Size (TB)65.7 
GREGoR Data Model Version1.11.6
Genome buildGRCh38GRCh38
Methods documentationView pdfView pdf
Genomic variant sites fileDownload locationDownload location

1GRU: General Research Use; HMB: Health/Medical/Biomedical
2WGS: Whole Genome Sequencing; WES: Whole Exome Sequencing

Phenotypes in the GREGoR Dataset

Bar chart of HPO Term Counts for the second GREGoR datasetBack to top

Accessing GREGoR data

Researchers from the scientific community can apply for controlled access GREGoR data stored on the AnVIL platform in one of two ways, described briefly below. For both cases, applicants should submit Data Access Requests by selecting datasets associated with dbGaP accession number phs003047.

Note: regardless of which way you apply, access will be granted to the same data in the same AnVIL workspaces. The process for application, renewal, and approval are what differ.

  1. dbGaP: Follow the NIH Scientific Data Sharing instructions for How to Request and Access Datasets from dbGaP.
  2. Data Use Oversight System (DUOS): Follow the steps in How do I make a data access request in DUOS?. See also What is DUOS?.

Alternatively, the GREGoR Consortium offers a Partner Membership opportunity to enable investigators to actively participate in and contribute to the scientific activities and mission of the GREGoR Consortium. For more information, please see the GREGoR Partner Members webpage.

Back to top

Working with GREGoR data

AnVIL is the primary repository for GREGoR data. AnVIL provides controlled-access data storage and a cloud-based analysis environment for researchers. Links to AnVIL platform tutorials and related resources are available at AnVIL Resources.

The GREGoR Consortium also regularly contributes data to several resources that support rare disease research which are described in Tools and Resources to interact with GREGoR data.

Back to top

Future Data Releases

The GREGoR Consortium continues to collect and aggregate participant, family and phenotype data, as well as a range of molecular data to support rare disease research and to share with the scientific community. GREGoR Research Centers and Partner Members are expanding the GREGoR Data Model to support additional data types, including short-read RNA-Seq, ATAC-Seq, long-read data generated by Oxford Nanopore Technologies (ONT) and Pacific BioSciences (PacBio) platforms, and more.

Additionally, the GREGoR Data Coordinating Center is harmonizing short-read whole genome sequencing data using open-source workflows on the AnVIL platform (DRAGEN-GATK for data pre-processing and initial variant calling; Genomic Variant Store workflow for a joint callset of single nucleotide, short insertions and deletion variants). This harmonized data is also included in the released dataset.

The GREGoR Dataset will continue to expand with additional participant, family, phenotype and other data types available for analysis. We anticipate that periodic releases will continue throughout the life of the Consortium, and will remain in NIH data repositories as a valuable scientific resource.

Back to top

Additional resources

Back to top

Please provide feedback on the GREGoR Dataset!

We are very interested in making the GREGoR Dataset broadly useful. Please let us know how we’re doing or how we might improve the dataset by contacting the GREGoR Data Coordinating Center (select “Data” from the Topic dropdown menu).

Back to top