Skip to main content

Please note the final planned GREGoR website downtime in April:

  • Friday, April 19  at 5 pm PT to Monday April 22 at 8 am PT (63 hours)

During this time the GREGoR website will be down and users will be unable to log into the DCC's AnVIL management web app to link their AnVIL accounts. Please contact the DCC at gregorconsortium@uw.edu if you have questions or need help during the downtime.

External Data Sharing Policy

Date Approved: January 18, 2023

Date of Last Update: January 24, 2023

Version: 1.3

Back to top

Background

The central mission of the GREGoR (Genomics Research to Elucidate the Genetics of Rare Diseases) Consortium is to significantly increase the proportion of Mendelian conditions with an identified genetic cause. To advance this mission, GREGoR aims to create datasets with broad utility and disseminate beyond the consortium through data sharing. The GREGoR consortium is committed to comprehensive and rapid data sharing of “GREGoR Data” (or “Data”) which includes the following data types: phenotypic, genotyping, nucleic acid sequence, multi-omic, biomarker, and any other "Scientific Data" as defined in the Final NIH Policy for Data Management and Sharing (see section 'Definition of “Scientific Data”'). It also includes preliminary analyses, re-processed, derived, and associated data generated from samples collected and/or sequenced under the primary GREGoR grant mechanisms across the consortium. For the purposes of this policy, “external” refers to sites and resources used by the broader scientific community and not just the GREGoR consortium.

Back to top

Scope

The External Data Sharing Policy is designed to describe the required and recommended external data sharing strategies to be followed by Research Centers (RCs) within the Consortium, and by extension Research Center Collaborators who make use of GREGoR resources to generate data or results. This policy establishes and documents minimum data sharing requirements for the Consortium and supports the scope of GREGoR RC’s Institutional Review Board (IRB) protocols, GREGoR members, GREGoR partner members, and GREGoR collaborators who contribute to the generation or analyses of these data are expected to follow the GREGoR External Data Sharing Policy.

Back to top

Policy

Minimum Standards and Timeline

All required external data sharing sites, including GREGoR’s minimum standards, timelines and groups responsible for deposition, are listed in Table 1. The GREGoR Consortium recognizes that non-GREGoR resources (listed in table below) will vary in their minimum requirements to share data; however, GREGoR standards for sharing will frequently be higher. The Consortium will meet the resource-specified minimum requirements listed below, but also strive to include as much additional data as our data model allows.

Table 1. Required External Data Sharing Sites

Resource

GREGoR’s Minimum Expectation for Data Shared

Timeline for Sharing

Group responsible for deposition

GREGoR Website

Cohort level descriptions, overview of data sharing activities with links on how to find that data in sources listed below

Cohorts: as cases/cohorts enter the study, no less than quarterly

Data Coordinating Center with information provided by Research Centers

AnVIL (non-internal workspaces)

All required fields in data model, including data files

1 month after RC’s data submission the DCC will inform AnVIL that the workspace is approved for release

Data Coordinating Center with data provided by Research Centers

Matchmaker Exchange (MME) nodes

Gene, Variants, Phenotype

Ideally as candidates are identified, and no less than quarterly

Research Centers

ClinVar

Variant, gene, disease (ideally Mondo term), classification and evidence (including case-level phenotype, ACMG/AMP evidence codes, PubMed IDs)*

Variants being included with publication should be submitted before/with manuscript

 

Research Centers

GenCC

Novel or Corrected Gene-Disease relationships

Gene-disease relationships being included with publication should be submitted or updated before/with manuscript

For gene-disease relationships not scheduled for publication, submit to GenCC at least quarterly

Research Centers

*Containing enough information to follow how you arrived at your classification (for example, this could include a descriptive summary, and/or submitting ACMG/AMP criteria codes and PMIDs)

Prioritization of ClinVar Variant Submissions

Table 2. Prioritization of ClinVar Variant Submissions

Variant Type Priority Prioritization Notes

 

 

 

Known disease genes

 

High

 

 


 

Low

Definitive/Strong/Moderate in GenCC, or likely to reach that classification if not already curated:
  1. Variants classified as Pathogenic (P)/Likely Pathogenic (LP) – Especially those with few or no entries in ClinVar. Variants with clinically significant conflicts between GREGoR classification and existing ClinVar entries should also be submitted, as providing evidence would help to identify discrepancies and could facilitate conflict resolution.
  2. All variants where functional work has been performed within the GREGoR consortium
  3. Other classified variants (VUS, LB, B) that are absent from ClinVar and/or where novel evidence (case observations with phenotype, segregation, etc) can be shared.

  1. Well-established P/LP variants with many concordant entries, where one additional submission is unlikely to have an impact, unless a unique phenotype association or mechanistic understanding could be contributed.
  2. Well-established B/LB variants with many concordant entries, especially if classified primarily based on high MAF (easy for any clinical lab to assess these days)

 

Novel disease gene relationships

 

High

Limited or below in GenCC, or likely to reach that classification if not already curated:
  1. Well-established P/LP variants with many concordant entries, where one additional submission is unlikely to have an impact, unless a unique phenotype association or mechanistic understanding could be contributed.
  2. Well-established B/LB variants with many concordant entries, especially if classified primarily based on high MAF (easy for any clinical lab to assess these days)

Tracking/Reporting

To minimize manual reporting of data sharing activities, when possible, tracking of submissions to each of the required sites listed in Table 1 will be performed via the following metrics:

  • ClinVar: RCs must add “GREGoR Consortium” to the study name in the ClinVar submission portal (see image below) and add a link to AnVIL in “Citations or URLs for clinical significance without database identifiers” field in ClinVar submission excel. Additionally RCs must provide ClinVar SCV IDs in the GREGoR Data model which will link to the latest classification

Screenshot of information to add to ClinVar submissions from GREGoR

  • AnVIL (non-internal workspaces): Tracking of which workspaces have been made public will be maintained by the DCC and shared with NHGRI on a regular basis.
  • MME: Tracking of submission to Matchmaker Exchange will be tracked through the GREGoR AnVIL data model. Each RC is responsible for annotating which genes have been submitted to the node of their choice.
  • GenCC: The number of submissions to GenCC will be tracked through their browser which will track statistics from each research group (https://search.thegencc.org/submitters), as well as the GREGoR RC GenCC submission form, which will be maintained by each group.

Future Directions

The Consortium recognizes that evolving needs for interoperability with ongoing and planned federated networks (e.g. Matchmaker Exchange) may influence the recommended data structure of GREGoR shared data. Additionally, any new data types required or requested by external sites, but not currently included in our data model, will be assessed by the Data Standards & Analysis Working Group to determine if the Consortium can reasonably include this data in future data models.

Back to top Back to top

Definitions

  • “GREGoR Data” (or “Data”) - (As per our Consortium Data Sharing Agreement) Includes phenotypic data, genotyping data, nucleic acid sequence data, multi-omic, biomarker, "Scientific Data" as defined in the Final NIH Policy for Data Management and Sharing (see section 'Definition of “Scientific Data”'), preliminary analyses, re-processed, derived, and associated metadata generated from samples collected and/or sequenced under the primary GREGoR grant mechanisms outlined above and that is shared under this Agreement.
Back to top

Change Log

  • V1.3 - Revision based on Steering Committee discussion on 1/18/2023 regarding minimum data standards for external repositories
  • V1.2 - Revision based on Policy WG discussion 8/24/2022, with additional edits/comments/notes from 9/14/2022 Policy WG discussion
  • V1.1 - Includes edits/notes from Policy WG discussion 8/24/2022
  • V1.0 - Initial Policy
Back to top