3.0 Model Fitting & Evaluation

3.1 Model Fitting

The rest of this document will look at the “how do we achieve the optimal assignment of volunteers to the current disaster?” part of the problem statement. Model fitting in our current case will basically come down to understanding how the Association, Volunteer and Disaster vectors exist in their own dimensional spaces and how these vectors interact.

Machine Learning has innumerable models and a model that fits best for a given problem statement is highly subjective to the dataset on hand. Roughly speaking, model choice will be driven by the existence of a gold standard. If there is a gold standard, we will know which associations are optimal and which are sub-optimal.

Our task will then be to model the process which “generated” the input data. Such models are called “Generative” models. When we do not have a gold standard, we will need to learn what features from the input are most useful to “discriminate” between different possible classes in the data. Such models are hence called “Discriminative” models.

Any ML project will have multiple sub-problems and will fit generative models for some and discriminative models for others. These sub-problems are highly dependent on the dataset we gather together. Since, we will still know very little about the volunteer assignment use case, we will not attempt to break the problem statement into sub-problems. The next two sub-sections will look at how generative and discriminative models apply to our use case.

3.1.1 Generative Models

Since we are assuming the presence of a gold standard for such models, we can look at the volunteer associations forecasted by our system as a classification task. For a given disaster, we want to classify each volunteer disaster association as optimal or sub-optimal. Such classification tasks usually come down to calculating conditional probabilities like:

Simply put, what is the probability of an association being optimal? The probability of the same association being sub-optimal will be given by:

For any given disaster, we will have to calculate such conditional probabilities for every volunteer - disaster association which forms a probability distribution. The accuracy of this probability distribution will depend on how well the probability distribution of all previous volunteer - disaster associations have been modeled. Again, there a lot of models and at Pirates our approach is to start with the simplest model first. Simpler models tend to have higher error rates and higher interpretability.

Fitting such models will help us in three ways:

Understand why the simpler model wasn’t a good fit and explore how we can massage the input to fit the same model better
See if there’s any subset of the input which will be a better fit and if that can solve a sub-problem of the larger problem statement
Guide our next choice of model

Our first choice for a generative model will be to fit a Naive Bayes (NB) model. This model assumes that features of a vector are independent of each other and hence fails to model many real world scenarios. But the interpretability of this model allows us to learn enough about our data to guide the next model we would want to fit. Accuracy numbers given by a NB model fit will form our first baseline. Any future model fit will have to do better than this baseline to be considered.

3.1.2 Discriminative models

Since we are assuming the absence of a gold standard, our task comes down to finding patterns in the input data and segregating them. We will need to find volunteers who have similar features and see how many segregations we can find. Similar segregations will be learned for disasters and associations. Once these segregations have been checked manually we will formulate application level rules to assign volunteers to disasters. For ex. If a certain disaster type has been assigned volunteers from only one segregation of volunteers we can write a rule to assign more volunteers from a group that has never been assigned to such disasters. The rule could also be to continue assigning from the same pool of volunteers. Formulating such rules will be a collaborative process between engineering and business stakeholders.

Segregating data this way has the following broad steps:

Find a combination of features that explains the most amount of variance for a given vector (Association, Disaster or Volunteer)
Project the vectors down to a lower dimension space
Programatically determine boundaries between vector segregations in the lower dimension space

As explained in the previous section, for discriminative models as well, our approach will be to first fit the simplest model. While it might not result in the best accuracy, it will help us understand the data much better and guide our choice of next model. A simple nearest neighbor (NN) analysis will be an excellent choice for the first baseline. NN analysis works well in lower dimension data and we will end up applying it to subsets of our data which results in a lot of learning. Usually, in segregation analysis, Principal Component Analysis (PCA) will be an improvement on NN. PCA works only if vector features display a high linear correlation. Once we have exhausted PCA by considering various subset of input data and massaged versions of the same we might move on to non-linear models to form further baselines.

3.2 Model evaluation

Once we do have a model fit, we will have to evaluate how good or bad the fit is. It will never be the case that one model will fit the entire data. Most ML projects will have multiple models which fit different subsets of the input and multiple model evaluations. Model evaluation numbers serve two purposes:

Help us decide if we want to persist with a model fit or discard it
Guide us while using forecasts from the model fit. Forecasts used from a model fit with low evaluation numbers should be trusted less than forecasts from a model fit with high evaluation numbers

3.2.1 Evaluation with a gold standard

In the presence of a gold standard we can get the best sense of how good or bad a forecast is. Model evaluation in this case comes down to evaluating how close or far apart 2 probability distributions (PDs) are. First PD will be the real PD displayed by the gold standard. This will basically be a collection of P(Optimal|Association) conditional probabilities of all Volunteer - Disaster association we have a record of. Second PD will be the predicted PD given by a model fit. This PD will be a model’s attempt to learn how the real PD might’ve been generated and mimic that process to generate the same PD. Cross entropy functions help us get a sense of the “distance” between 2 PDs. Farther the 2 PDs are, the worse the fit. Closer the 2 PDs are, the better the fit.

3.2.2 Evaluation without a gold standard

In the absence of a gold standard, the only thing we can evaluate is how good or bad the segregations we have learned are. Generally speaking, the more the input data variance considered while lowering the number of dimensions used result in useful segregations. For ex. If a model considers 70% of the input variance but uses say 7 dimensions to do so, it results in segregations that are not useful. But a model which considers only 60% of the input variance using only 2 dimensions will result in much better segregations. In these scenarios model evaluation comes down to evaluating how much of the input variance is explained using how many dimensions.

< Proposal

3.0 Model Fitting & Evaluation

3.1 Model Fitting

3.1.1 Generative Models

Simply put, what is the probability of an association being optimal? The probability of the same association being sub-optimal will be given by:

Fitting such models will help us in three ways:

Understand why the simpler model wasn’t a good fit and explore how we can massage the input to fit the same model better

See if there’s any subset of the input which will be a better fit and if that can solve a sub-problem of the larger problem statement

Guide our next choice of model

3.1.2 Discriminative models

Segregating data this way has the following broad steps:

Find a combination of features that explains the most amount of variance for a given vector (Association, Disaster or Volunteer)

Project the vectors down to a lower dimension space

Programatically determine boundaries between vector segregations in the lower dimension space

3.2 Model evaluation

Help us decide if we want to persist with a model fit or discard it

Guide us while using forecasts from the model fit. Forecasts used from a model fit with low evaluation numbers should be trusted less than forecasts from a model fit with high evaluation numbers

3.2.1 Evaluation with a gold standard

3.2.2 Evaluation without a gold standard

Pirates

We created Pirates to give enterprises a platform to build their future by engaging a network of startups.