IBM Security QRadar SOAR

 View Only
  • 1.  Machine Learning Integration Error

    Posted Wed March 24, 2021 10:08 PM
    Hi, 
    I'm attempting to build a ML model per the instructions, however on running the build command : /usr/local/bin/res-ml build -c resilient_incidents.csv -o first_model.ml

    I get the following error : 

    Usecols do not match columns, columns expected but not found: ['list_of_fields_for_features_separated_by_comma']
    Traceback (most recent call last):
    File "/usr/local/lib/python3.6/site-packages/fn_machine_learning/lib/ml_logistic_regression.py", line 81, in build
    self.extract_csv(csv_file, features, prediction)
    File "/usr/local/lib/python3.6/site-packages/fn_machine_learning/lib/ml_model_common.py", line 101, in extract_csv
    quotechar='"')
    File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
    File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
    File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 948, in __init__
    self._make_engine(self.engine)
    File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
    File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 2056, in __init__
    _validate_usecols_names(usecols, self.orig_names)
    File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 1305, in _validate_usecols_names
    f"Usecols do not match columns, columns expected but not found: {missing}"
    ValueError: Usecols do not match columns, columns expected but not found: ['list_of_fields_for_features_separated_by_comma']
    Traceback (most recent call last):
    File "/usr/local/bin/res-ml", line 8, in <module>
    sys.exit(main())
    File "/usr/local/lib/python3.6/site-packages/fn_machine_learning/bin/res_ml.py", line 263, in main
    build_new_model(args, opt_parser)
    File "/usr/local/lib/python3.6/site-packages/fn_machine_learning/bin/res_ml.py", line 451, in build_new_model
    build_model(file_name, opt_parser, csv_file)
    File "/usr/local/lib/python3.6/site-packages/fn_machine_learning/bin/res_ml.py", line 430, in build_model
    unwanted_values=mlconfig.unwanted_values)
    File "/usr/local/lib/python3.6/site-packages/fn_machine_learning/lib/ml_logistic_regression.py", line 149, in build
    raise e
    File "/usr/local/lib/python3.6/site-packages/fn_machine_learning/lib/ml_logistic_regression.py", line 81, in build
    self.extract_csv(csv_file, features, prediction)
    File "/usr/local/lib/python3.6/site-packages/fn_machine_learning/lib/ml_model_common.py", line 101, in extract_csv
    quotechar='"')
    File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
    File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
    File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 948, in __init__
    self._make_engine(self.engine)
    File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
    File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 2056, in __init__
    _validate_usecols_names(usecols, self.orig_names)
    File "/usr/local/lib64/python3.6/site-packages/pandas/io/parsers.py", line 1305, in _validate_usecols_names
    f"Usecols do not match columns, columns expected but not found: {missing}"
    ValueError: Usecols do not match columns, columns expected but not found: ['list_of_fields_for_features_separated_by_comma']

    Does anyone know what I"m doing incorrectly here ? I've following the pdf file in the package. Currently running v40.1 of Resilient and the updated Resilient circuits. 

    Thanks, 
    Clinton

    ------------------------------
    Clinton Dsouza
    ------------------------------


  • 2.  RE: Machine Learning Integration Error

    Posted Tue March 30, 2021 04:17 PM
    Hi, 
    An update, I was able to fix the the above error by providing a list of fields in the ml.config file. However I get a new error , anyone experienced this error before ? 

    The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

    Traceback (most recent call last):

    File "/usr/local/lib/python3.6/site-packages/fn_machine_learning/lib/ml_logistic_regression.py", line 92, in build

    self.split_samples(test_prediction)

    File "/usr/local/lib/python3.6/site-packages/fn_machine_learning/lib/ml_model_common.py", line 268, in split_samples

    stratify=self.y)

    File "/usr/local/lib64/python3.6/site-packages/sklearn/model_selection/_split.py", line 2197, in train_test_split

    train, test = next(cv.split(X=arrays[0], y=stratify))

    File "/usr/local/lib64/python3.6/site-packages/sklearn/model_selection/_split.py", line 1387, in split

    for train, test in self._iter_indices(X, y, groups):

    File "/usr/local/lib64/python3.6/site-packages/sklearn/model_selection/_split.py", line 1715, in _iter_indices

    raise ValueError("The least populated class in y has only 1"

    ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.



    ------------------------------
    Clinton Dsouza
    ------------------------------



  • 3.  RE: Machine Learning Integration Error

    Posted Wed March 31, 2021 04:37 AM
    Hi 
    Thank you for raising this in the community forum.

    I took a look at your most recent issue and I think I can offer a suggestion. Can I ask, the dataset you are using is it by any chance a smaller subset of data you want to test with or a large one ? Could you give this a try and let me know what happens for you: This could potentially coming from the use of train_test_split() in the ml_model_config.py file. For the train_test_split() function we pass the stratify option by default and we have once seen this error while training a model. Could you try commenting out the stratify option in fn_machine_learning/lib/ml_model_common.py? 
    If this works we can make a work item to make stratify conditional in some way.

    Best, 
    Ryan


    ------------------------------
    Ryan Gordon
    Security Software Engineer
    IBM
    ------------------------------



  • 4.  RE: Machine Learning Integration Error

    Posted Thu April 15, 2021 08:52 AM
    Hi Ryan, 
    Thought I posted an update here but looks like it didn't go through. This worked. After uncommenting the line of code it ran without error. 
    I do see these warning and am wondering if this is of concern ? 

    Using 873 samples to train.
    /usr/local/lib64/python3.6/site-packages/sklearn/linear_model/_logistic.py:765: ConvergenceWarning: lbfgs failed to converge (status=1):
    STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

    Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
    Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
    extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
    /usr/local/lib64/python3.6/site-packages/sklearn/metrics/_classification.py:1245: UndefinedMetricWarning: Precision is ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.
    _warn_prf(average, modifier, msg_start, len(result))
    --------
    Summary:
    --------
    File: /home/integration/.resilient/ml_models/first_model.ml
    Build time: 2021-04-15 12:50:47
    Num_samples: 1746
    Algorithm: Logistic Regression
    Method: None
    Prediction: severity_code
    Features: incident_type_ids, incident_category
    Class weight: balanced
    Upsampling: False
    Unwanted Values: None
    Accuracy: 0.15005727376861397
    F1: 0.09786259427526127
    Accuracy for severity_code value:
    50: 0.13012048192771083
    51: 0.6052631578947368
    52: 0.0
    1169: 0.0

    ------------------------------
    Clinton Dsouza
    ------------------------------