‍

At Kickstart AI we are using AI to support the Dutch Food Bank (VoedselBank Nederland, VBN) reaching all households in need of food assistance across the country. Using available census data and machine learning, we can predict the fraction of households below the social minimum and the contributing factors. With this project, we aim to provide valuable insights for VBN’s strategic decisions.

The contrast of two worlds

With a GDP of 990 Billion USD, the Netherlands has the fifth-largest economy in the European Union. The country of the bikes, cheese, and charming canals also ranks among the top 5 happiest countries, according to the most recent happiness report released in March of this year.

‍

Despite the high living standards, there is still a fraction of households in The Netherlands living below the social minimum. Education level and social background are decisive factors that contribute to this situation [1,2]; however, the source of income, the household type, the age and migration background of the provider seem to be other factors that determine whether a household falls below the social minimum or not [3]. If people are living below the social minimum, they likely don’t have enough income to supply their need for food. Here is where Food Banks come to the rescue.

‍

VoedselBank Nederland (VBN) is the primary Dutch Food Bank. Currently, VBN has 10 distribution centers that collect donated food and distribute it to 174 local food banks situated throughout the country. In 2020, around 150,000 individuals benefited from the support of the Dutch food banks. However, due to language barriers, social stigma, or inaccessibility of the Food banks, the number of households in the Netherlands that need support might be higher. That is why, in 2020, VBN and the Zero Hunger Lab at Tilburg University joined forces. Together they want to develop techniques to predict the number of people that are eligible for Food Bank support. The pilot of this project in the city of Den Bosch used open census data and it was focused on predicting the households below the poverty line.

At Kickstart AI we want to use our technical expertise to help VBN reach not only vulnerable households in one city but all households in need of food across the country. By using the latest open-source libraries and machine learning algorithms, we want to predict the number of households below the social minimum and the contributing factors. These results will provide VBN with valuable insights that they can use to make strategic decisions to reach those households in need of their support.

‍

Everything starts with data

To understand the factors contributing to households falling below the social minimum and the regions in The Netherlands where they are located, we use historical Dutch census (CBS) data.

‍

The CBS data contains information on a broad range of topics that can be organized into different categories:

Proximity to facilities: It provides information on the total amount of points of interest within a certain buffer area. Some points of interest include hospitals, hotels, supermarkets, and cafes.
Housing: This category encompasses all the information related to the real estate market such as average house values, percentage of owner-occupied homes, and year of construction, among other features.
Companies: This category holds information related to the number and type of businesses that operate in different areas. For instance: Agriculture, Trade and Catering, Energy, Real estate, Financial Services, and Culture.
Population: Here you find features such as age, gender, marital status, population density, and education level. Due to GDPR, the population features are measured as totals over a region.
Economy: The macroeconomic indicators such as GDP per capita fall in this category
Region: This data contains the names of the municipalities, COROP regions, or city districts. The model that we built is on a municipality level, therefore, we only use the municipality names as one of the model features.

Other categories are related to energy consumption, type of household, income, migration background, crime, and transportation.

‍

We collected approximately 277 features which are yearly aggregates per municipality. Although some of this data has been recorded since 1995, the municipalities’ borders have changed over time, which generates inconsistencies when joining all the historical data. After an extensive data cleaning process, the final dataset contains 216 features measured from 2015 to January 2023. These, once again, are yearly aggregates that describe the status of all 345 Dutch municipalities.

‍

Measurements of vulnerable households

Among the 216 data features, the one that best describes the variable we want to predict is the fraction of households below the social minimum. When determining this feature, CBS uses private households where the main breadwinner (or in some circumstances their partner) has an income all year round and is not dependent on student finance. If the established standard social minimum incomes are used as an income limit, many households that rely exclusively on social assistance benefits will have an income just above the threshold. Therefore, we use the CBS estimates of income up to 101% of the social minimum as our target.

‍

The CBS updates the fraction of households below the social minimum with a one-year delay. For example, in fall 2023 the values for 2022 will become available. Thus, we assume that the fraction of households below the social minimum in 2022 and 2023 is the same as that in 2021.

Figure 1: Top: Distribution of the target of our model. Bottom: Target distribution over the years (left) and over the Dutch provinces (right).

‍

As you can see in Figure 1, the distribution of the fraction of households below the social minimum remains relatively consistent over time. However, it’s important to note that the distribution highly depends on the location. Therefore, the training and test sets must include all the municipalities to prevent bias in the modelling and evaluation. To build the training set we selected the data from 2015 to 2022. To build the test set, we used the data corresponding to 2023.

Modelling poverty in the Netherlands

Now we have all the pieces to create a predictive model for the fraction of households below the social minimum. For this endeavour, we used Python’s PyCaret Package to exploit its functionalities in automating data preprocessing and modelling.

‍

We performed several steps to preprocess the numerical features in the training set. The steps include removing multicollinearity, removing features with low variance, and transforming the variables to a logarithmic scale. To be able to use the auto machine learning functionality of PyCaret, we also normalised the numerical features. These preprocessing steps give us 118 numerical features (out of the 215 initial numerical variables).

‍

We also dummy-encoded the only categorical feature present in the data: the municipality name. This step gives us 345 additional variables.

‍

We used PyCaret’s compare_models function which trains and evaluates the performance of 25 regression algorithms available in its Library. We used 3 folds for the cross-validation. However, we didn’t find the best model hyperparameters during this step. We only used the compare_models functionality to search for the algorithm that gives the lowest Mean Absolute Error (MAE) on the validations folds.

‍

Table 1: Results of running the compare_models functionality in Pycaret. The error metrics are the resulting average following cross-validation.

‍

According to the table above, the best algorithm is the Bayesian Ridge Regression. It gives an average MAE of 0.2% (the target’s unit is a fraction, not a percentage).

‍

We used Bayesian optimization to find the best hyperparameters of the Bayesian Ridge Regression model. Bayesian optimization is faster than exhaustive grid search methods. It also provides better estimates of model hyperparameters. If you are interested in the details of Bayesian optimization, you can read this blog post.

‍

We evaluated the trained model on the test set. The MAE was 0.13%.

‍

The beauty of the Bayesian Ridge Regression is that we can get insights into the factors that drive poverty in The Netherlands.

‍

Figure 2: Categories that contribute to having a fraction of households below the social minimum. Within each category, we show the top 5 most important features.

We can observe in Figure 2 that the region is the most important category in determining the fraction of vulnerable households. Rotterdam stands out as the municipality with the highest fraction of households below the social minimum.

‍

Other poverty-related factors include the proximity to facilities and housing. In this regard, note that the average house value is inversely proportional to the fraction of households below the social minimum; this means that places, where houses are expensive, are expected to have fewer households in need. Other interesting factors appearing in the model are distance to hotels, number of cars with fuel gasoline, and people with AO benefits.

‍

Since our goal is estimating the total number of households below the social minimum, we combined the measurements of the total number of households per municipality with our predictions. The results are shown in the figure below.

‍

‍

Figure 3: Left: Number of households below the social minimum estimated from the model predictions. Right: Comparison of the model estimates with the actual values corresponding to 2023.

‍

The municipalities of Amsterdam, Rotterdam, and The Hague harbour the majority of households that require the need of VBN. This is expected, as these regions host a significant population of the country.

Short-term model improvements

As we mentioned before, our model predicts vulnerable households per municipality. However, the estimates remain relatively coarse. Where in Rotterdam or in Amsterdam are the households eligible for Food Bank support? To know this, we are currently building a model more granular in space where the fraction of vulnerable households can be predicted on a district level. Once such a model is ready, we will explore different variants in feature reduction and data preprocessing. For instance, we’ll explore different categorical encodings (perhaps binary encoding or feature hashing). We might incorporate Lasso regression into our pipeline to further reduce the model’s complexity.

‍

We want to improve our model by incorporating predictions on the amount and the type of food needed by vulnerable households. We plan to combine open data on food consumption with an estimate of the number of potential customers per region to get the desired predictions.

‍

We will publish our results in a follow-up blog. Stay tuned!

Benefits of Kickstart AI’s model

‍

We have developed a model to predict the fraction of households below the social minimum in each municipality of the Netherlands. This model also provides insights into what factors contribute (positively or negatively) to having vulnerable households in this country. How can VBN use this model? One way is to start evaluating the situation in the municipalities with the higher number of households below the social minimum. According to our model, these municipalities are Amsterdam, Rotterdam, The Hague, Groningen, and Utrecht. Are there enough Food banks in these regions? How far are the local food banks from the districts where the vulnerable households are located? Further analysis in collaboration with VBN is needed to answer these questions.

Our model can also be used by ONGs dedicated to helping vulnerable people or by the affected municipalities. According to the contributing factors, these organizations can focus their efforts on helping people with a Western background by getting more jobs or subsidies. Many internationals are unaware of all the allowances offered by local municipalities. Spreading this information in public spaces, in different languages, or in the websites of the municipalities, might contribute to reaching those vulnerable households.

‍

Thank you for reading! Stay tuned for more content and developments made by Kickstart AI.

References

[1] M.A Carter, L. Dubois, M. S. Tremblay, and M. Taljaard. Local social environmental factors are associated with household food insecurity in a longitudinal study of children. BMC Public Health, 12(1):1–11, 2012.

[2] M. A. Biggerstaff, P. M. Morris, and A. Nichols-Casebolt. Living on the edge: Examination of people attending food pantries and soup kitchens. Social Work, (3):267–277, 2002.

[3] B. Toering.Modeling the Demand for Foodbank Aid in The Netherlands. Master thesis, 2020.

‍