Researchers have launched a search engine that can quickly sift through the staggering volumes of biological data housed in public repositories. The team integrated data from seven publicly funded data archives, creating 18.8million unique DNA and RNA sequence sets and 210billion amino-acid sequence sets that users can search through using text prompts. The search engine, called MetaGraph, can also uncover genetic patterns hidden deep within expansive sequencing data sets without needing those patterns to be explicitly annotated in advance.
"It's a huge achievement," says Rayan Chikhi, a biocomputing researcher at the Pasteur Institute in Paris. "They set a new standard" for analysing raw biological data - including DNA, RNA and protein sequences - from databases that can contain millions of billions of DNA letters, amounting to 'petabases' of information, more entries than all the webpages in Google's vast index.