GREGoR Consortium Research Centers and Partners collect clinical, phenotype, and molecular data that is combined to form the GREGoR Dataset. The GREGoR Dataset is registered with the database of Genotypes and Phenotypes (dbGaP) and released to the scientific community via controlled access on NHGRI’s AnVIL platform. The dbGaP study webpage for GREGoR (phs003047) includes additional study design and enrollment information.
Back to topThe GREGoR Dataset
The GREGoR v1 Dataset includes family structure, phenotype, short read WGS, short read WES, and short read RNA-seq data. Future data releases will include additional molecular data types and information about case status and genetics findings.
GREGoR Dataset characteristics by release version
GREGoR release version | phs003047.v1.p1 |
---|---|
Release Date | Sep 1, 2023 |
Number of participants Number of families Consent groups1 | 2,512 990 GRU, HMB |
Available experimental data2 Short read WGS Short read WES Short read RNA-seq | 1,441 997 183 |
Size (TB) | 65.7 |
GREGoR Data Model Version | 1.1 |
Genome build | GRCh38 |
Methods documentation | View pdf |
Genomic variant site files | Download location |
1GRU: General Research Use; HMB: Health/Medical/Biomedical
2WGS: Whole Genome Sequencing; WES: Whole Exome Sequencing
Phenotypes in the GREGoR Dataset
Back to topAccessing GREGoR data
Researchers from the scientific community can apply for controlled access GREGoR data stored on the AnVIL platform in one of two ways, described briefly below. For both cases, applicants should submit Data Access Requests by selecting datasets associated with dbGaP accession number phs003047.
Note: regardless of which way you apply, access will be granted to the same data in the same AnVIL workspaces. The process for application, renewal, and approval are what differ.
- dbGaP: Follow the NIH Scientific Data Sharing instructions for How to Request and Access Datasets from dbGaP.
- Data Use Oversight System (DUOS): Follow the steps in How do I make a data access request in DUOS?. See also What is DUOS?.
Alternatively, the GREGoR Consortium offers a Partner Membership opportunity to enable investigators to actively participate in and contribute to the scientific activities and mission of the GREGoR Consortium. For more information, please see the GREGoR Partner Members webpage.
Back to topWorking with GREGoR data
AnVIL is the primary repository for GREGoR data. AnVIL provides controlled-access data storage and a cloud-based analysis environment for researchers. Links to AnVIL platform tutorials and related resources are available at AnVIL Resources.
The GREGoR Consortium also regularly contributes data to several resources that support rare disease research which are described in Tools and Resources to interact with GREGoR data.
Back to topFuture Data Releases
The GREGoR Consortium continues to aggregate participant, family and phenotype data, as well as short-read DNA sequencing data, to share with the scientific community. GREGoR Research Centers and Partner Members are expanding the GREGoR Data Model to support additional data types, including short-read RNA-Seq, ATAC-Seq and long-read data generated by Oxford Nanopore Technologies (ONT) and Pacific BioSciences (PacBio) platforms.
The GREGoR Data Coordinating Center is harmonizing short-read whole genome sequencing data using open-source workflows on the AnVIL platform (DRAGEN-GATK for data pre-processing and initial variant calling; Genomic Variant Store workflow for a joint callset of single nucleotide, short insertions and deletion variants). This harmonized data will be included in a future release.
The GREGoR Dataset will continue to expand with additional participant, family, phenotype and other data types available for analysis. We anticipate that periodic releases will continue throughout the life of the Consortium, and will remain in NIH data repositories as a valuable scientific resource.
Back to topAdditional resources
- GREGoR poster from the 2023 ASHG annual meeting
- GREGoR Publications
- GREGoR AnVIL study webpage
- GREGoR dbGaP study webpage - phs003047
Please provide feedback on the GREGoR Dataset!
We are very interested in making the GREGoR Dataset broadly useful. Please let us know how we’re doing or how we might improve the dataset by contacting the GREGoR Data Coordinating Center (select “Data” from the Topic dropdown menu).
Back to top