banyan pilot
With the support of the Sabeti Lab, we built a pilot version of Banyan to visualize the full global dataset of COVID-19 sequences. With over two million sequences available and a phylogentic tree of over 600,000, we created an exceptionally fluid interface for exploring vast amounts of viral genomic data.
Challenges | Achievements |
---|---|
How can we enable virology researchers to explore the entire global COVID-19 sequencing dataset? | Created a tool that visualizes over a million COVID-19 samples without the need for a server backend. |
How can we bring visualization upstream in the research pipeline? | Surpassed the typical cap of ~4,000 sequences of most other genomic research tools. |
Through rapid rounds of iteration and feedback from researchers at the Sabeti Lab, as well as outside input from leading researchers in the field, we were able to create a tool that addresses many of the questions scientists are asking of the COVID-19 viral data – Which lineages are taking hold? Where are they becoming a problem? What does COVID-19 look like in my jurisdiction compared to my surrounding areas?
One of our key goals when building tools is to bring visualization further upstream in the research process. For example, how can it be part of the way researchers explore and understand their data—and make better use of data that is continually changing and being updated—instead of just "final" outputs for a report or presentation?
The lineage and mutation timelines provide an overall picture on how COVID-19 has changed at a global or national scale. For deeper analysis, users can look specifically at one or more lineages and compare the prevalance of individual mutations and resulting amino acid changes across sequences within a lineage, or between lineages.
The first release of this tool is the beginning of what we hope will be a continued effort on making the full dataset accessible and actionable.