Object detection, how it works and how we use it

At Wadhwani Institute for Artificial Intelligence, we use object detection differently.

Versions of object detection are all around us. It causes a small amount of outrage on social media when people discover it. You’ve come across it when Facebook or Google managed to locate you in pictures where you weren’t “tagged”.

Facebook draws a box around your face and asks, “Is this you?”. Google does that in albums shared as well. That’s object detection. The big tech companies have managed to locate an object–you–and identify that object.  Facebook went a little deep into how they do it, worth a read if you’re so inclined.

Let’s first define what is object detection? It is the result of a series of techniques that recognize and classify objects in images. 

It is a model that is capable of identifying places, people, objects and many other types of elements within an image, and drawing conclusions from them by analyzing them.

Let’s describe this in a slightly simplified way. 

What does the internet love? Cats. There is a website which identifies cats. It is simply called: Is this a cat? You upload an image and it identifies if what you uploaded is a cat. 

The reason behind this is, well, the internet loves cats. But also, somewhere in the background is a model, which is trying to collect all the different images of cats and is learning to identify cats from, well, not cats. It is creating classes and deciding if the image uploaded is a cat or not. 

At Wadhwani Institute for Artificial Intelligence, we use object detection differently – To recognize two types of pests, the pink bollworm and the American bollworm. Both these pests can ruin cotton crops across the world. Recently, a news report stated that Pink Bollworm can affect 15% of a farm’s yield. 

How does it work?

What can really help the farmer is to know if she needs to take action. That is dictated by the number of pests on the trap. If that number crosses the safety threshold, the farmer needs to take action to stop infestation and prevent her cotton crop from being ruined.

So what happens behind the scenes? The task of the model is to locate the pests, draw boxes and identify them. The aspect ratios of these boxes are not very different. So a default set of boxes with the most common aspect ratios is formed. Most of the boxes around the objects don’t have a lot of different aspect ratios. So, a set of k default boxes (let’s call the set A) will encompass almost all the boxes. A Convolutional Neural Network(CNN) takes in images (NxN) and runs convolutions on top of them. After one set of convolutions the size of the output is reduced.

As the size keeps reducing, the model is outputting higher-level abstract features.

So, we select levels from the multiple convolution sets outputs, this gives us information about different-sized objects from different levels. One of these levels’ output is, let’s say an SxS grid. For every cell, the model tries all default boxes from A and indicates how confident it is for every class [pink bollworm, american bollworm, no object]. So, for every cell you get k default boxes, and for every box you get 3 confidence scores (number of object classes and an extra one for no object) and 4 offset values to exactly locate the box on the grid.

After this, the model refines the box predictions (removing redundant boxes, low confidence boxes, etc.), and returns tight boxes drawn around the objects, in our case, pests.

The farmers now know if there has been a pest attack. The advisory suggests how much and what pesticide can be sprayed to prevent further damage. 

  • Wadhwani AI

    We are an independent and nonprofit institute developing multiple AI-based solutions in healthcare and agriculture, to bring about sustainable social impact at scale through the use of artificial intelligence.

ML Engineer


An ML Engineer at Wadhwani AI will be responsible for building robust machine learning solutions to problems of societal importance; usually under the guidance of senior ML scientists, and in collaboration with dedicated software engineers. To our partners, a Wadhwani AI solution is generally a decision making tool that requires some piece of data to engage. It will be your responsibility to ensure that the information provided using that piece of data is sound. This not only requires robust learned models, but pipelines over which those models can be built, tweaked, tested, and monitored. The following subsections provide details from the perspective of solution design:

Early stage of proof of concept (PoC)

  • Setup and structure code bases that support an interactive ML experimentation process, as well as quick initial deployments
  • Develop and maintain toolsets and processes for ensuring the reproducibility of results
  • Code reviews with other technical team members at various stages of the PoC
  • Develop, extend, adopt a reliable, colab-like environment for ML

Late PoC

This is early to mid-stage of AI product development

  • Develop ETL pipelines. These can also be shared and/or owned by data engineers
  • Setup and maintain feature stores, databases, and data catalogs. Ensuring data veracity and lineage of on-demand pulls
  • Develop and support model health metrics

Post PoC

Responsibilities during production deployment

  • Develop and support A/B testing. Setup continuous integration and development (CI/CD) processes and pipelines for models
  • Develop and support continuous model monitoring
  • Define and publish service-level agreements (SLAs) for model serving. Such agreements include model latency, throughput, and reliability
  • L1/L2/L3 support for model debugging
  • Develop and support model serving environments
  • Model compression and distillation

We realize this list is broad and extensive. While the ideal candidate has some exposure to each of these topics, we also envision great candidates being experts at some subset. If either of those cases happens to be you, please apply.


Master’s degree or above in a STEM field. Several years of experience getting their hands dirty applying their craft.


  • Expert level Python programmer
  • Hands-on experience with Python libraries
    • Popular neural network libraries
    • Popular data science libraries (Pandas, numpy)
  • Knowledge of systems-level programming. Under the hood knowledge of C or C++
  • Experience and knowledge of various tools that fit into the model building pipeline. There are several – you should be able to speak to the pluses and minuses of a variety of tools given some challenge within the ML development pipeline
  • Database concepts; SQL
  • Experience with cloud platforms is a plus

ML Scientist


As an ML Scientist at Wadhwani AI, you will be responsible for building robust machine learning solutions to problems of societal importance, usually under the guidance of senior ML scientists. You will participate in translating a problem in the social sector to a well-defined AI problem, in the development and execution of algorithms and solutions to the problem, in the successful and scaled deployment of the AI solution, and in defining appropriate metrics to evaluate the effectiveness of the deployed solution.

In order to apply machine learning for social good, you will need to understand user challenges and their context, curate and transform data, train and validate models, run simulations, and broadly derive insights from data. In doing so, you will work in cross-functional teams spanning ML modeling, engineering, product, and domain experts. You will also interface with social sector organizations as appropriate.  


Associate ML scientists will have a strong academic background in a quantitative field (see below) at the Bachelor’s or Master’s level, with project experience in applied machine learning. They will possess demonstrable skills in coding, data mining and analysis, and building and implementing ML or statistical models. Where needed, they will have to learn and adapt to the requirements imposed by real-life, scaled deployments. 

Candidates should have excellent communication skills and a willingness to adapt to the challenges of doing applied work for social good. 


  • B.Tech./B.E./B.S./M.Tech./M.E./M.S./M.Sc. or equivalent in Computer Science, Electrical Engineering, Statistics, Applied Mathematics, Physics, Economics, or a relevant quantitative field. Work experience beyond the terminal degree will determine the appropriate seniority level.
  • Solid software engineering skills across one or multiple languages including Python, C++, Java.
  • Interest in applying software engineering practices to ML projects.
  • Track record of project work in applied machine learning. Experience in applying AI models to concrete real-world problems is a plus.
  • Strong verbal and written communication skills in English.