Fathom Information Design

Banyan is a set of tools for working with large amounts of sequence, mutation, and phylogenetic tree data, currently focused on SARS‑CoV‑2.

The tools grew out of an initial pilot project that made it possible to query and display millions of sequences in the entire GISAID data set from within the browser.

The current tools range from simple auto-generated reports (the New England Variant Report) to small single-use tools (Mutations, Crosscut), to backend systems (a custom server and Python API) that make the tools possible.

banyan mutations

A tool for quickly exploring mutations appearing in SARS-CoV-2 sequences and comparing against mutations in existing variants of the virus, with options for easily linking to and exporting views of the data.

banyan crosscut

A tool for comparing the prevalence of SARS-CoV-2 mutations and lineages with flexible filtering by country, state and date range.

banyan contour

Track current and projected trends as SARS-CoV-2 lineages rise and fall.

lineage portraits

Quickly see at a glance what’s unique about emerging SARS-CoV-2 lineages by visualizing them alongside established ones, and apply scoring methods to highlight the key mutations driving these developments.

weather report

Automatically generated SARS-CoV-2 reports capturing the most recent data and projections in Massachusetts and other states, built for the MassCPR variants group in collaboration with the Broad Institute.

server + library

Banyan features a custom server for rapidly querying very large data sets such as GISAID. In the coming months, we plan to release a version for use by researchers who have access to GISAID data (or who would like to work with other sources such as NCBI or UCSC).

We’ve also developed a Python wrapper for the REST APIs available from the server, suitable for use from a Jupyter notebook or other Python code. This tool will be released along with the server component.

banyan pilot

The initial pilot project made it possible to query and display millions of sequences in the entire GISAID data set from within a web browser.