Skip to main content

Please note the final planned GREGoR website downtime in April:

  • Friday, April 19  at 5 pm PT to Monday April 22 at 8 am PT (63 hours)

During this time the GREGoR website will be down and users will be unable to log into the DCC's AnVIL management web app to link their AnVIL accounts. Please contact the DCC at gregorconsortium@uw.edu if you have questions or need help during the downtime.

GREGoR Data Access for the Scientific Community

GREGoR Consortium Research Centers and Partners collect clinical, phenotype, and molecular data that is combined to form the GREGoR Dataset. The GREGoR Dataset is registered with the database of Genotypes and Phenotypes (dbGaP) and released to the scientific community via controlled access on NHGRI’s AnVIL platform. The dbGaP study webpage for GREGoR (phs003047) includes additional study design and enrollment information.

Back to top

The GREGoR Dataset

The GREGoR v1 Dataset includes family structure, phenotype, short read WGS, short read WES, and short read RNA-seq data. Future data releases will include additional molecular data types and information about case status and genetics findings.

GREGoR Dataset characteristics by release version

GREGoR release version phs003047.v1.p1
Release Date Sep 1, 2023

Number of participants
Number of families
Consent groups1

2,512
990
GRU, HMB

Available experimental data2
     Short read WGS
     Short read WES
     Short read RNA-seq


1,441
997
183

Size (TB) 65.7
GREGoR Data Model Version 1.1
Genome build GRCh38
Methods documentation View pdf
Genomic variant site files Download location

1GRU: General Research Use; HMB: Health/Medical/Biomedical
2WGS: Whole Genome Sequencing; WES: Whole Exome Sequencing

Phenotypes in the GREGoR Dataset

Plot of HPO Term Counts for the first GREGoR dataset

Back to top

Accessing GREGoR Data

Researchers from the scientific community can apply for controlled access GREGoR Data stored on the AnVIL platform in one of two ways, described briefly below. For both cases, applicants should submit Data Access Requests by selecting datasets associated with dbGaP accession number phs003047.

Note: regardless of which way you apply, access will be granted to the same data in the same AnVIL workspaces. The process for application, renewal, and approval are what differ.

  1. dbGaP: Follow the NIH Scientific Data Sharing instructions for How to Request and Access Datasets from dbGaP.
  2. Data Use Oversight System (DUOS): Follow the steps in How do I make a data access request in DUOS?. See also What is DUOS?.

Alternatively, the GREGoR Consortium offers a Partner Membership opportunity to enable investigators to actively participate in and contribute to the scientific activities and mission of the GREGoR Consortium. For more information, please see the GREGoR Partner Members webpage.

Back to top

Working with GREGoR Data

For basic information on GREGoR data on AnVIL, including links to tutorials and related resources provided by AnVIL as well as the GREGoR Consortium, please refer to AnVIL Resources.

Back to top

Future Data Releases

The GREGoR Consortium continues to aggregate participant, family and phenotype data, as well as short-read DNA sequencing data, to share with the scientific community. GREGoR Research Centers and Partner Members are expanding the GREGoR Data Model to support additional data types, including short-read RNA-Seq, ATAC-Seq and long-read data generated by Oxford Nanopore Technologies (ONT) and Pacific BioSciences (PacBio) platforms. 

The GREGoR Data Coordinating Center is harmonizing short-read whole genome sequencing data using open-source workflows on the AnVIL platform (DRAGEN-GATK for data pre-processing and initial variant calling; Genomic Variant Store workflow for a joint callset of single nucleotide, short insertions and deletion variants). This harmonized data will be included in a future release.

The GREGoR Dataset will continue to expand with additional participant, family, phenotype and other data types available for analysis. We anticipate that periodic releases will continue throughout the life of the Consortium, and will remain in NIH data repositories as a valuable scientific resource.

Back to top

Additional resources

Back to top

Please provide feedback on the GREGoR Dataset!

We are very interested in making the GREGoR Dataset broadly useful. Please let us know how we’re doing or how we might improve the dataset by contacting the GREGoR Data Coordinating Center (select “Data” from the Topic dropdown menu).

Back to top