Table of Contents
Introduction
The GREGoR Consortium is using the NHGRI Analysis Visualization and Informatics Lab-space (AnVIL) for GREGoR Data release to the scientific community via controlled access. This page provides basic information about GREGoR data on AnVIL, including links to tutorials and related resources provided by AnVIL as well as the GREGoR Consortium.
Back to topGetting Started
To begin working with the GREGoR Data, we recommend researchers
- Set up an AnVIL account (which is the same as a Terra account)
- Link their AnVIL account to their eRA commons ID
- Obtain access to GREGoR Data
- Set up an AnVIL billing project if your group does not have one already
- View the GREGoR Consortium's workspaces
- Clone the release workspace to conduct analysis
GREGoR Data on AnVIL
Within AnVIL, GREGoR Data is separated into workspaces by participant consent group. Once you have access to GREGoR Data, you can find workspaces associated with the GREGoR Consortium on AnVIL searching for "GREGoR" on AnVIL’s Workspaces page. The data in these workspaces is structured as tables that conform to the GREGoR Data Model. Conceptually, researchers can think of this structure as a relational database and interrogate the data using approaches such as joining and filtering statements across the tables. The tables include pointers to the molecular data files (such as .bams and .vcfs) that may be used as inputs for running bioinformatic workflows.
For Consortium sharing and release, data are stored in workspaces that are set to requester-pays, which means that users can't interact with the data without setting up their own billing project and workspace. This includes listing files, downloading files, or working with them in a cloud analysis. See this Terra support article about using requester pays in workspaces. To work with data in the requester-pays workspaces, users will need to have access to one workspace where they have either writer or owner access.
Back to topInteractive Analysis
Researchers can write code for interactive data analysis on AnVIL using RStudio, Jupyter notebooks, or Galaxy. AnVIL also supports interactive analysis with the Integrative Genomics Viewer (IGV) and seqr. AnVIL analysis tutorials and resources include:
- Terra tutorial on Jupyter notebooks
- Terra documentation on RStudio
- Using the R/Bioconductor “AnVIL” package
- How to view IGV tracks of BAM and VCF Files
- Using Seqr in AnVIL
- Example R notebook for working with GREGoR data in AnVIL
Workflows on AnVIL
AnVIL supports bulk data processing and analysis tasks encoded in theWorkflow Description Language (WDL). AnVIL users can write their own workflows, can publish workflows for others to use, and can use workflows shared by others on Dockstore. If you would like to add a workflow to the GREGoR organization on Dockstore, please contact the DCC to be added to the organization.
Links to Resources:
- WDL training course with videos
- More on WDL from Terra support
- GREGoR Consortium Workflows on Dockstore
- Find existing workflows in Dockstore or the Broad Methods Repository
- Tutorial on writing a simple workflow (slides| video)