GREGoR Consortium Research Centers and Partners collect clinical, phenotype, and molecular data that is combined to form the GREGoR Dataset. The GREGoR Dataset is registered with the database of Genotypes and Phenotypes (dbGaP) and released to the scientific community via controlled access on NHGRI’s AnVIL platform. The dbGaP study webpage for GREGoR (phs003047) includes additional study design and enrollment information.
Back to topThe GREGoR Dataset
The GREGoR Dataset includes family structure, phenotype, case status, genetics findings, short read whole exome sequencing (WES), whole genome sequencing (WGS), long read WGS, and short read RNA-seq data. Future data releases will include additional molecular data types and information.
To see what's new in a Release, please refer to the Release Notes and the characteristics by release table, below.
GREGoR Dataset characteristics by release version
GREGoR release version | R01 | R02 |
---|---|---|
dbGap Accession | phs003047 | phs003047 |
Release Date | September 2023 | November 2024 |
Number of participants Number of families Consent groups1 | 2,512 990 GRU, HMB | 7,394 3,059 GRU, HMB |
Available experimental data2 Short read WGS Short read WES Long read WGS Short read RNA-seq | 1,441 997 0 183 | 5,182 2,242 214 539 |
Size (TB) | 61.8 | 149.48 |
GREGoR Data Model Version | 1.1 | 1.6 |
Genome build | GRCh38 | GRCh38 |
Methods documentation | View pdf | View pdf |
Genomic variant sites file | Download location |
1GRU: General Research Use; HMB: Health/Medical/Biomedical
2WGS: Whole Genome Sequencing; WES: Whole Exome Sequencing
Phenotypes in the GREGoR Dataset
Back to topAccessing GREGoR data
Researchers from the scientific community can apply for controlled access GREGoR data stored on the AnVIL platform in one of two ways, described briefly below. For both cases, applicants should submit Data Access Requests by selecting datasets associated with dbGaP accession number phs003047.
Note: regardless of which way you apply, access will be granted to the same data in the same AnVIL workspaces. The process for application, renewal, and approval are what differ.
- dbGaP: Follow the NIH Scientific Data Sharing instructions for How to Request and Access Datasets from dbGaP.
- Data Use Oversight System (DUOS): Follow the steps in How do I make a data access request in DUOS?. See also What is DUOS?.
Alternatively, the GREGoR Consortium offers a Partner Membership opportunity to enable investigators to actively participate in and contribute to the scientific activities and mission of the GREGoR Consortium. For more information, please see the GREGoR Partner Members webpage.
Back to topWorking with GREGoR data
AnVIL is the primary repository for GREGoR data. AnVIL provides controlled-access data storage and a cloud-based analysis environment for researchers. Links to AnVIL platform tutorials and related resources are available at AnVIL Resources.
The GREGoR Consortium also regularly contributes data to several resources that support rare disease research which are described in Tools and Resources to interact with GREGoR data.
Back to topFuture Data Releases
The GREGoR Consortium continues to collect and aggregate participant, family and phenotype data, as well as a range of molecular data to support rare disease research and to share with the scientific community. GREGoR Research Centers and Partner Members are expanding the GREGoR Data Model to support additional data types, including short-read RNA-Seq, ATAC-Seq, long-read data generated by Oxford Nanopore Technologies (ONT) and Pacific BioSciences (PacBio) platforms, and more.
Additionally, the GREGoR Data Coordinating Center is harmonizing short-read whole genome sequencing data using open-source workflows on the AnVIL platform (DRAGEN-GATK for data pre-processing and initial variant calling; Genomic Variant Store workflow for a joint callset of single nucleotide, short insertions and deletion variants). This harmonized data is also included in the released dataset.
The GREGoR Dataset will continue to expand with additional participant, family, phenotype and other data types available for analysis. We anticipate that periodic releases will continue throughout the life of the Consortium, and will remain in NIH data repositories as a valuable scientific resource.
Back to topAdditional resources
- GREGoR poster from the 2023 ASHG annual meeting
- GREGoR Publications
- GREGoR AnVIL study webpage
- GREGoR dbGaP study webpage - phs003047
Please provide feedback on the GREGoR Dataset!
We are very interested in making the GREGoR Dataset broadly useful. Please let us know how we’re doing or how we might improve the dataset by contacting the GREGoR Data Coordinating Center (select “Data” from the Topic dropdown menu).
Back to top