CSIRO’s Denis Bauer on how genomics is pushing the boundaries with AI, cloud computing

CSIRO’s Denis Bauer on how genomics is pushing the boundaries with AI, cloud computing
Denis C Bauer, head of computing bioinformatics, CSIRO

What do you do when there is an increased risk of mortality or morbidity rate growing up in your country and the world in general due to disorders such as diabetes, chronic lung conditions or cancer?

According to the World Health Organization (WHO) reports, the adult mortality rate was 142 per 1,000 persons in 2016 globally and of 56.9 million global deaths in 2016, 40.5 million, or 71%, were due to non-communicable diseases (NCDs) such as cardiovascular diseases, cancers, diabetes and chronic lung diseases.

Researchers, either individually or collaboratively, have been trying to get to the root of these disorders so that they can either come up with a cure or possibly prevent the disease from occurring in the population. And according to modern science, the study of genes has become a promising area of research in trying to come out with disease-prevention solutions.


Denis C Bauer, a Germany-born mother of a five-year-old son and an employee of Australia's federal independent research agency named The Commonwealth Scientific and Industrial Research Organisation (CSIRO), has been pursuing the aforementioned path and in doing so is pushing the boundaries of technology for a breakthrough.

"The evolution of technology, especially in terms of AI and cloud computing, has provided a major boost for our research in genomics," said Bauer, who is head of cloud computing bioinformatics at CSIRO.

During her India visit, Bauer told TechCircle that this evolution of technology has helped her organisation research faster at a more affordable price as well as bring down years of work to a matter of weeks. 


"Genomes can be studied to stop diseases but every genome is two metres’ long and there are 100 trillion cells in our body. Now imagine mapping every single person in the world. The data size will be more than astronomical and this is where we need technology," Bauer explained, citing a Frost & Sullivan report that says half of the world population will have their DNA sequenced by 2025, generating more than 20 exabytes of data annually. 

But there are many challenges before that kind of data can be gathered, read for insights and then applied for clinical research.

The first challenge, Bauer says, is that of reading the genome itself. "Earlier, while researchers had to do it manually, it would take a very long time -- a matter of days or weeks. Now there is technology in place that can help in reading a genome within a day at probably a cost of $100," she said, adding that CSIRO uses a programme called GT-Scan that works on a computationally-guided genome-engineering model.


Bauer also explained that, in order to combat diseases, reading each and every genome for possible mutations is very important. "It is like finding the right grain of sand unique to a beach," she said. 

Other challenges include sequencing genome data and creating a database for queries. For these, Bauer said her team were using, to an extent, Amazon Web Services' (AWS) NoSQL databases such as Dynamo DB and Mongo DB. 

However, she says indexation is not possible owing to the huge size of data; instead, the teams calls upon data through queries as and when needed. 


To this end, Bauer says CSIRO has deployed its own machine learning-based gene-discovery engine called VariantSpark that can analyse 3,000 individuals with over 80 million features under 30 minutes, requiring 80% fewer samples to detect a statistically significant signal. 

"We developed VariantSpark, a machine-learning analysis framework for genomic data, using the BigData Spark engine to enable real-time analysis. The reason for coming out with the engine was to democratise the success of genomics. Earlier, only well-funded labs with hyper-scale computing infrastructure could conduct research," she said, adding that CSIRO also offers a web-based tool for clinicians to conduct studies. 

VariantSpark is also available on code-hosting site GitHub and is used by organisations such as the Macquarie University in Sydney, Samsung SDS and by CSIRO to find a cure to Lou Gehrig's disease, a nervous system disease that weakens muscles and impacts physical function.


Bauer also said that she and her team have been using serverless computing to great advantage, where they pay as a cloud customer only for service usage -- there being no expenses associated with idle time.

"Since our data sets are so huge maintaining them on servers of our own will add up to a huge cost and then there is running compute on the data sets. However, with companies like AWS providing compute service, the cost of infrastructure goes down," Bauer explained, adding that serverless computing helps them save 95% of cloud costs.

Giving an analogy, she said that if on-premises cloud needs Rs 35,000 a month, then serverless can get it done for Rs 1,500 a month. According to Grand View Research, serverless will grow to a $20 billion market by 2025.


Bauer claims her genomics research has socio-economic benefits, which if implemented in the right way, can cut healthcare costs by 27%. 

She added that she is in talks with Tata Memorial Centre and CSIR-Institute of Genomics and Integrative Biology, New Delhi, to conduct joint research on disease prevention. 

Other than AWS, Bauer said that CSIRO is also working with Microsoft Azure, Google and Alibaba.

Sign up for Newsletter

Select your Newsletter frequency