Cough against COVID can increase testing capacity by 43%

At our current performance, when our model is used as a screening layer, before RT-PCR, it will increase testing capacity by 1.43X without increasing the number of lab-based tests.
Until there is a vaccine or a cure for this disease, there will be a continuous need for wider testing.

So far, almost 1 million people have died as a result of contracting COVID-19. And over 30 million have contracted the disease already. Across the world, countries are recording a renewed spike in cases. The solution to controlling the pandemic, as suggested by healthcare experts around the world, is to reduce R0. This can be done by ramping up testing so those who catch the infection can be isolated and treated effectively. But doing this comes with challenges. 

For one, test kits are expensive. And along with tests come the added cost of infrastructure and specialised equipment. This is especially true in rural and remote areas around the world. Even if this is overcome, the test that is considered gold standard–Real-time polymerase chain reaction (RT-PCR)–has a long lead time. Added to the mix is that Covid-19 symptoms–cough, fever, fatigue–are not unique and can be brought on by various illnesses. 

And until there is a vaccine or a cure for this disease, there will be a continuous need for wider testing. This makes the effective utilisation of resources extremely important. 

This brings with it a need to create a screening layer, which can potentially identify those who have a high probability of testing negative for Covid-19. Our effort, Cough against COVID, could help. We demonstrate that solicited cough sounds have a COVID signature, including asymptomatic patients. At our current performance, when our model is used as a screening layer, before RT-PCR, it will increase testing capacity by 1.43X without increasing the number of lab-based tests. 

How does it work?

Physicians have used sound as an identifier in respiratory illnesses. But these sounds, especially cough, are not easily quantifiable for a layman. When we began our research, our contention was that AI may be able to identify this sound. We collected over 3,500 solicited-cough sounds from COVID centres across India. These included both positive and negatives, confirmed by an RT-PCR test. While the data collection is ongoing, we used a curated set of about 1000 individuals to present the scientific analysis.

The first step is to convert cough sounds into a model-ingestible representation. It is well known that spectrogram-based representations of sound work well in practice, especially with convolutional neural networks (CNNs). CNNs are extremely versatile, multi-step function approximators known to work exceedingly well on images.

We developed an end-to-end CNN-based framework that ingests audio samples as spectrograms and directly predicts a binary classification label indicating the probability of the presence of COVID-19. 

Our model architecture was first pre-trained on open-source cough datasets to simply predict the presence of a cough (cough detection). Next, we trained our model on the primary task of COVID-19 detection. During training, we use standard augmentation techniques and smoothened labels to make our model robust to input noise and label noise arising due to imperfect RT-PCR test results.

Our model, with statistical significance, demonstrates that solicited-cough sounds have a detectable COVID-19 signature, and this holds true even for asymptomatic individuals. 

 This could hence be a model that can be used to reliably detect COVID-19 negative individuals while we refer the positives for a confirmatory RT-PCR test. In this way, we increase the testing capacity by 43% (a 1.43x lift) when we assume a disease prevalence of 5%.

Possible ways to deploy

It can work on a basic feature phone and doesn’t require skilled personnel. It is inherently designed to scale naturally, without additional work.

Another possible way our model can be deployed is via apps on a smartphone of healthcare workers, helplines or via voice notes on instant messaging apps. Our data collection, however, is ongoing, and subsequent models will be trained on individuals beyond the subset in this study. We will also explore fast and computationally efficient inference, to enable COVID-19 testing on smartphones. This will enable large sections of the population to self-screen, support proactive testing and allow continuous monitoring.

  • Wadhwani AI

    We are an independent and nonprofit institute developing multiple AI-based solutions in healthcare and agriculture, to bring about sustainable social impact at scale through the use of artificial intelligence.

ML Engineer


An ML Engineer at Wadhwani AI will be responsible for building robust machine learning solutions to problems of societal importance; usually under the guidance of senior ML scientists, and in collaboration with dedicated software engineers. To our partners, a Wadhwani AI solution is generally a decision making tool that requires some piece of data to engage. It will be your responsibility to ensure that the information provided using that piece of data is sound. This not only requires robust learned models, but pipelines over which those models can be built, tweaked, tested, and monitored. The following subsections provide details from the perspective of solution design:

Early stage of proof of concept (PoC)

  • Setup and structure code bases that support an interactive ML experimentation process, as well as quick initial deployments
  • Develop and maintain toolsets and processes for ensuring the reproducibility of results
  • Code reviews with other technical team members at various stages of the PoC
  • Develop, extend, adopt a reliable, colab-like environment for ML

Late PoC

This is early to mid-stage of AI product development

  • Develop ETL pipelines. These can also be shared and/or owned by data engineers
  • Setup and maintain feature stores, databases, and data catalogs. Ensuring data veracity and lineage of on-demand pulls
  • Develop and support model health metrics

Post PoC

Responsibilities during production deployment

  • Develop and support A/B testing. Setup continuous integration and development (CI/CD) processes and pipelines for models
  • Develop and support continuous model monitoring
  • Define and publish service-level agreements (SLAs) for model serving. Such agreements include model latency, throughput, and reliability
  • L1/L2/L3 support for model debugging
  • Develop and support model serving environments
  • Model compression and distillation

We realize this list is broad and extensive. While the ideal candidate has some exposure to each of these topics, we also envision great candidates being experts at some subset. If either of those cases happens to be you, please apply.


Master’s degree or above in a STEM field. Several years of experience getting their hands dirty applying their craft.


  • Expert level Python programmer
  • Hands-on experience with Python libraries
    • Popular neural network libraries
    • Popular data science libraries (Pandas, numpy)
  • Knowledge of systems-level programming. Under the hood knowledge of C or C++
  • Experience and knowledge of various tools that fit into the model building pipeline. There are several – you should be able to speak to the pluses and minuses of a variety of tools given some challenge within the ML development pipeline
  • Database concepts; SQL
  • Experience with cloud platforms is a plus

ML Scientist


As an ML Scientist at Wadhwani AI, you will be responsible for building robust machine learning solutions to problems of societal importance, usually under the guidance of senior ML scientists. You will participate in translating a problem in the social sector to a well-defined AI problem, in the development and execution of algorithms and solutions to the problem, in the successful and scaled deployment of the AI solution, and in defining appropriate metrics to evaluate the effectiveness of the deployed solution.

In order to apply machine learning for social good, you will need to understand user challenges and their context, curate and transform data, train and validate models, run simulations, and broadly derive insights from data. In doing so, you will work in cross-functional teams spanning ML modeling, engineering, product, and domain experts. You will also interface with social sector organizations as appropriate.  


Associate ML scientists will have a strong academic background in a quantitative field (see below) at the Bachelor’s or Master’s level, with project experience in applied machine learning. They will possess demonstrable skills in coding, data mining and analysis, and building and implementing ML or statistical models. Where needed, they will have to learn and adapt to the requirements imposed by real-life, scaled deployments. 

Candidates should have excellent communication skills and a willingness to adapt to the challenges of doing applied work for social good. 


  • B.Tech./B.E./B.S./M.Tech./M.E./M.S./M.Sc. or equivalent in Computer Science, Electrical Engineering, Statistics, Applied Mathematics, Physics, Economics, or a relevant quantitative field. Work experience beyond the terminal degree will determine the appropriate seniority level.
  • Solid software engineering skills across one or multiple languages including Python, C++, Java.
  • Interest in applying software engineering practices to ML projects.
  • Track record of project work in applied machine learning. Experience in applying AI models to concrete real-world problems is a plus.
  • Strong verbal and written communication skills in English.