Our Summer of AI Challenge: Using Data to Predict Food Insecurity

For the last 7 weeks, the Kickstart AI office has become home to a group of AI heroes-in-the-making. 🦸 Led by our resident PhD candidate Cascha van Wanrooij and supported by our Head of Challenges Kevin Damstra, six talented students — Chaoyi Wang, Shray Juneja, Evangelia Toutou, Lonneke Pulles, Yongqing Liang and Adapa Maniraj Sai — have been working on a project: a big, important project that could save lives.

‍

Their mission is to improve the data processing surrounding food insecurity and predict food insecurity ahead of time. It’s one of eight projects initiated as part of Summer of AI, a summer hackathon for students in the Netherlands and Portugal. (Our friends at NS have also been working on a Summer of AI project — read all about it here). At the closing ceremony on 16 August, the students involved presented their projects and the best teams received awards. The KAI team were awarded runners-up in the 'Most responsible' and 'Best pitch' categories!

‍

‍

We’re going to take a closer look at what our Kickstart AI team have been working on. But first, some context. The standard system to classify food-insecure regions is the IPC (Integrated Food Security Phase Classification), which categorises food insecurity into five phases, ranging from the least to the most insecure. It’s a labor-intensive, manual process that may not always yield consistent results. Using data science to improve that process is the focus of Cascha’s PhD, which he’s working on with Tilburg University’s Zero Hunger Lab. With hundreds of millions of people worldwide facing malnutrition and economic loss due to food insecurity 🌍, this Summer of AI project has the potential to make a huge impact.

‍

In order to improve the existing IPC system, there were two key subtasks that needed to be tackled. Firstly, it was necessary to analyse and prepare the IPC dataset for use in the model being developed. This dataset contains information about regional IPC scores, rainfall, vegetation index, amount of violent events, price of staple food and many more. The other main task was to find a way to integrate more data in the classification process using sources like news articles, which could provide clues about the status of food security in a particular region. To tackle these challenges, the students split into two sub-teams.

‍

‍

The sub-team working with the IPC classification system got really hands-on with the data from the beginning, exploring how the data was divided into different regions, looking at how feature correlations, and handling missing value imputation. This deep exploration of the data led to some important decisions in the model’s development, for example choosing not to exclude outliers as these often contained valuable information.

‍

Meanwhile, the subteam working with the news articles explored methods of obtaining data about a region’s food security from local text sources. They compared a number of different methods to achieve this, including typical statistical methods such as Naive Bayes, as well as topic modelling. They expected that topic modelling would provide the best results, but the statistical methods were useful as a baseline to compare the effectiveness of different solutions. The topic modelling approaches they tested were Latent Dirichlet Allocation (LDA), Zero-shot classification, and BERTopic.

‍

‍

As the project continued the real challenge began: combining the two datasets. 🤓 The IPC dataset contains regions (coordinate based), but the news articles do not adhere to this standard. Furthermore, some articles mention the region in the title, some mention multiple regions, and some mention none at all. The team used ChatGPT (GPT-4) to extract the region that the article is about. Then they used the region to retrieve the coordinates using geocoding. In the case of multiple regions, they took the centre point of all regions. This can obviously be improved upon, but it was accurate enough for this case.

‍

By the second half of the hackathon period, the team had a good grasp of the problem and a plan for how to build a solution using the data provided. Again they split into sub-teams, with one group to set up the pipeline and train the model, and another group to create a prototype application. With a last push to finish the product, plus a round of testing to make sure it worked as intended, the project was complete.

‍

‍

We are truly blown away by what our team of students have achieved in such a short span of time! 👏 The app they’ve created is an excellent proof-of-concept that data science can really help to predict food insecurity. We’re very excited to see how Cascha’s research can further develop on this. Hopefully we’ll see a positive impact from the team’s work in the near future, and for years to come.

‍