A search engine inspired platform enabling open sharing and access of both data and results from all genomic studies

Year of award: 2019

Grantholders

  • Dr John Lees

    Imperial College London

Project summary

Large amounts of public and charity money have been spent on genomic studies. In non-model organisms, which make up over half of all available sequence data, sharing falls far short of the standards of democratisation and dissemination that funders and researchers envisage. Common issues include paywalled publications, needing to repeat expensive basic data processing steps and difficulty even finding data. Overcoming these issues requires significant amounts of time, expertise and computational resources. A combination of inflexible database services, lack of advanced methods, and high hurdles to sharing published analyses have collectively led to this situation. Innovative solutions are desperately needed.

We will link datasets with inherently variable structures, searchable with new indexing tools and a novel ranking algorithm to make these invaluable scientific resources truly open. We will adapt technology used by search engines, which index, rank and deliver results from billions of webpages which have no unified structure. Our tool will allow users to search for sequences or other biological features and group results by samples, species or projects. Results will be ordered by relevance, linked to downloads of published data and visualisations and further enhanced with analysis performed by the tool itself. Web and programmatic interfaces will be available.

Our solution’s flexibility, relaxing almost all format requirements imposed by databases on researchers, will lead to broad uptake. Many research communities will benefit, such as microbiology, immunology, bioinformatics, low- and middle-income countries and public health, and we anticipate starting to shift open research practices in genomics.