Embeddable AI

 View Only

Estimating duration of a classifier training with Watson Discovery

By TAKESHI INAGAKI posted Sat June 19, 2021 08:49 AM

  

Training of a machine learning model is one of heavy tasks on Watson Discovery. Some of models may be needed to be updated periodically and knowing duration for training a model is important for making that plan. One of such a machine learning model training capability of Watson Discovery is document and text classifier. It predicts labels on documents by classification categories. When number of category labels is increased, training of a classifier becomes taking more time. Machine learning model internally contains connecting parameters to be fixed between input words to output labels and number of parameters to be fixed becomes larger linearly by increase of number of output labels. As the result, it is expected that training becomes taking more time when number of output labels increased. Is it increased linearly depending on number of labels? For example, does a model which outputs 100 labels need 10 times long time to train than a mode which outputs 10 labels? At first sight, it seems likely. However, that is the case. The purpose of this post is sharing that example and considering the reason.

 

Training of classifier consists of two phases. First phase is preparation of data for training. Every word in text of training data is converted into a feature vector for machine learning. Time required for this first phase depends on size of training data. Second phase is training of a model. How long will this second phase take? For N times many of output labels, calculation of one gradient descent iteration of training a model needs N times many of computing steps. With this, we can predict it will take at least N times longer time. However, this is not the end of story.  Every gradient descent iteration just train one label as a positive example. To converge learning for all labels, it may take N times larger number of iterations for N times many of labels at most. With this observation, we can guess, training of N times many labels of model may take N squared of time at max.

 

Which case does happen, N times longer or N squared times longer? To know actual ratio, we trained two models with different number of labels with same order of number of training data. The result was middle of both cases. Time was N to the 1.3 power. This observed number may be different case by case, but this example helps us to be aware of that this order of time we should allocate for training of a classifier model with large number of labels.

 


#BuildwithWatsonApps
#EmbeddableAI
0 comments
9 views

Permalink