Embeddable AI

 View Only
Expand all | Collapse all

Classifiers assembled with identical training sets using Natural Language Understanding and Natural Language Classifier services yield (very) different results

  • 1.  Classifiers assembled with identical training sets using Natural Language Understanding and Natural Language Classifier services yield (very) different results

    Posted Thu November 25, 2021 02:22 PM
    Hi all, 

    Everyone actively using the Natural Language Classifier service from IBM Watson has seen the following message, while using the API:

    "On 9 August 2021, IBM announced the deprecation of the Natural Language Classifier service.
    The service will no longer be available from 8 August 2022. As of 9 September 2021, you will not be able to create new instances.
    Existing instances will be supported until 8 August 2022. Any instance that still exists on that date will be deleted.
    For more information, see IBM Cloud Docs"

    I have migrated a classification model from Natural Language Classifier to Natural Language Understanding. Since I did not dive into the technological background of either service, I wanted to compare the output of both services. In order to do so, I followed the migration guidelines provided by IBM (NLC --> NLU migration guidelines). To recreate the NLC classifier in NLU, I downloaded the complete set of training data used to create the initial classifier built in the NLC service. So the data sets used to train the NLC and NLU classifiers are identical. Recreation of the classifier in NLU was straightforward and the classifier training took about the same time as in NLC. 

    To compare the performance, I then assembled a test set of phrases that was not used for training purposes in either the NLC or NLU service. The test set contains 100 phrases that were passed through both the NLC and NLU classifiers. The classifiers are binary classifiers. They either give a "true" or "false" label to each of the analyzed phrases. To my big surprise, the differences are substantial. Out of 100, 18 results are different (more than 0.30 difference in confidence value), or 37 out of 100 when accepting a difference of 0.2 in confidence value. In my opinion, this difference is too large to blindly move on to migrating all NLC models to NLU without any hesitation. The results I obtained so far warrant further investigation using a manual curation step by a SME of the yielded analysis results. I am not too happy about this. I was wondering whether more users have seen this issue and/or have the same observation. I hope someone (perhaps at IBM) can shed a light on the differences in analysis results between the NLC and NLU services.

    Please find below an excerpt of the analysis results of comparison:
    title NLC NLU
    "Microbial Volatile Organic Compound (VOC)-Driven Dissolution and Surface Modification of Phosphorus-Containing Soil Minerals for Plant Nutrition: An Indirect Route for VOC-Based Plant-Microbe Communications" 0,01 0,05 comparable
    "Valorization of kiwi agricultural waste and industry by-products by recovering bioactive compounds and applications as food additives: A circular economy model" 0,01 0,05 comparable
    "Quantitatively unravelling the effect of altitude of cultivation on the volatiles fingerprint of wheat by a chemometric approach" 0,70 0,39 different
    "Identification of volatile biomarkers for high-throughput sensing of soft rot and Pythium leak diseases in stored potatoes" 0,01 0,33 different
    "Impact of Electrolyzed Water on the Microbial Spoilage Profile of Piedmontese Steak Tartare" 0,08 0,50 different
    "Review on factors affecting Coffee Volatiles: From Seed to Cup" 0,67 0,90 different
    "Chemometric analysis of the volatile profile in peduncles of cashew clones and its correlation with sensory attributes" 0,79 0,98 comparable
    "Surface-enhanced Raman scattering sensors for biomedical and molecular detection applications in space" 0,00 0,00 comparable
    "Understanding the flavor signature of the rice grown in different regions of China via metabolite profiling" 0,26 0,70 different
    "Nutritional composition, antioxidant activity, volatile compounds, and stability properties of sweet potato residues fermented with selected lactic acid bacteria and bifidobacteria" 0,77 0,87 comparable


    ------------------------------
    Joost Vos
    ------------------------------

    #BuildwithWatsonApps
    #EmbeddableAI


  • 2.  RE: Classifiers assembled with identical training sets using Natural Language Understanding and Natural Language Classifier services yield (very) different results

    Posted Wed December 15, 2021 01:47 PM
    Hi, 

    Thanks for raising this issue. I would validate this with my team and get back to you. 

    I've also reached out to you via Slack DM.

    ------------------------------
    Bikalpa Neupane
    ------------------------------