AIOps

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

View Only

Back to discussions

Expand all | Collapse all

Model training debug

1. Model training debug

Like
Danilo Luna
Posted Fri August 27, 2021 07:05 AM

Reply
Hi all.

I just started with Watson AIOPs 3.1.1 and would like to train some models based on log files coming from ELK. I was able to create the integration and it tests successfully, however, after creating a new model and starting the training, I get and error: "Start training failed: Could not find any data to train on". The error is quite obvious, however, I can see data using Kibana for the defined training period.

I would like to know what are the general steps to debug the issue in this case. Mainly:

- Where are the relevant logs located?
- How can I check why/if data is not being transferred over from my target towards the internal Elastic?
- Any other hint?

Thanks

Danilo

------------------------------
Danilo Luna
------------------------------
2. RE: Model training debug

Like
Veeramani Nambi
Posted Fri August 27, 2021 07:11 AM

Reply
Hi. Thanks for the question. @Angus Jamieson @Fred Harald Klein, For your attention please.

------------------------------
VEERAMANI NAMBI
Offering Manager, GoToMarket - Communities
------------------------------

Original Message
3. RE: Model training debug

Like
Angus Jamieson
Posted Fri August 27, 2021 10:14 AM
Edited by Angus Jamieson Fri August 27, 2021 10:15 AM

Reply
Hi Danilo,

Some ideas from my colleagues. As you are a Business Partner if send me your email I can share some internal material with you if this doesn't get things going.

You'd need a training definition in the model management section covering the relevant dates

Then you should set the integration to historical data for initial log training

And wait for the files to be transferred into elastic

To check look into elastic using curl **

As I know there may be few ways this could have happened.

Either the data is not transformed to training,

data is not enough or data is not healthy or

we need to see if the cluster is not broken

** To verify the data is present, we can use below steps track current data flow which is enabled,

oc projects <namepace>

oc get pods | grep api-server

oc exec -it api-server-pod-name bash

curl -X GET -u $ES_USERNAME:$ES_PASSWORD $ES_URL/_cat/indices -k |sort (use this curl command to see the logs indices).

For an app without much logging you may need to wait a number of days to start training (you need this amount of days to generate data enough for training)

These are all common use case we faced for this error but yes if these looks fine, then it may be something else we need to debug even more to see this.

Also a couple of other things;
i) Make sure there is a data available for those dates that you are trying to train your model.
ii) Please check the kafka integration is set correctly as shown in the image below.

Best regards

Angus

------------------------------
Angus Jamieson
IT Service Management Solutions Architect
IBM
Edinburgh
------------------------------

Original Message
4. RE: Model training debug

Like
Danilo Luna
Posted Fri August 27, 2021 11:09 AM

Reply
Thanks for the fast answer. I tried the command mentioned and got the following:

sh-4.4$ curl -X GET -u $ES_USERNAME:$ES_PASSWORD $ES_URL/_cat/indices -k |sort yellow open 1000-1000-20210823-logtrain tUfS8nWcQ769QOco9AVvAw 3 1 1302000 0 143mb 143mb yellow open 1000-1000-20210826-logtrain glKokYdgSF-XOTeV8N6z8g 3 1 5651888 0 440.6mb 440.6mb yellow open algorithmregistry TgB0lRAuRuOtXgTZDn8uow 1 1 4 0 19.9kb 19.9kb yellow open buildended PLPeek70RvO10za3IU75aw 1 1 0 0 208b 208b yellow open buildinfo l7feMWMgQ3iJ7RVKduI3kQ 1 1 0 0 208b 208b yellow open buildresult Ye9Y7rqyQfmr7RvfzBVQWQ 1 1 0 0 208b 208b yellow open buildstarted 27GYtmRmRaeXDxQfrEjF4g 1 1 0 0 208b 208b yellow open comment zNfEQWASS2m25EyIbrBdUg 1 1 0 0 208b 208b yellow open commit kONcVRJjQdWaIYGuIYca7w 1 1 0 0 208b 208b yellow open connection dsGVfusKRcqTNBUxzSvMXQ 1 1 0 0 208b 208b yellow open dataset nuKfCvtJQ0aj6tqUpGMK7g 1 1 1 1 5.2kb 5.2kb yellow open filechange nazyejwVQca9HIA9tlelfQ 1 1 0 0 208b 208b yellow open issue eg68Y4DqQAGcGYiwT7yyvg 1 1 0 0 208b 208b yellow open language fjd12P1hT2mgeQWyvPB5JA 1 1 0 0 208b 208b yellow open postchecktrainingdetails V7bKW23sQ_WSz4nCpSJiEQ 1 1 0 0 208b 208b yellow open prechecktrainingdetails WPQdHUU3QJi5wu8fA0Pt2w 1 1 0 0 208b 208b yellow open pullrequest oc6lpTp7SCeOF9OZnxxuIw 1 1 0 0 208b 208b yellow open repository I3BohAqrQKinXAyiIm3QqA 1 1 0 0 208b 208b yellow open repositoryscan NN64k02FTrSkZ5Y5z3KPqw 1 1 0 0 208b 208b yellow open repositoryscanreport kZUe-zpIQcqBFnbPMG-QKQ 1 1 0 0 208b 208b yellow open repositoryscanreportdata 9LYUsJQbTEKTyLHyIn6vKQ 1 1 0 0 208b 208b yellow open snowchangerequest Z8qaHZMjTQ2ZS53N1n59ew 1 1 0 0 208b 208b yellow open snowincident ATSzyZLdQb2N1GHmsEUlTQ 1 1 0 0 208b 208b yellow open snowproblem 4wUzvPRNSsq-6_c-h1az3g 1 1 0 0 208b 208b yellow open trainingdefinition RSFvStURSkKh98qIefolvA 1 1 1 0 8.7kb 8.7kb yellow open trainingsrunning QS03CgRJQNG__cEDQdA6kw 1 1 0 0 208b 208b

Not sure how to interpret this though. The first two lines looks promising :)

I will find a way to share my email with you via private channel.

------------------------------
Danilo Luna
------------------------------

Original Message
5. RE: Model training debug

Like
Angus Jamieson
Posted Fri August 27, 2021 04:42 PM

Reply
Yes that looks good …

You should set the parallelism in your log connector to 4 both for base and for source (field is below the date range there).

This is assuming you have turned on the connection for historical data for initial training and set consistent dates there and in the training definition under model management.

------------------------------
Angus Jamieson
IT Service Management Solutions Architect
IBM
Edinburgh
------------------------------

Original Message

AIOps

AIOps

Model training debug

Danilo LunaFri August 27, 2021 07:05 AM

Veeramani NambiFri August 27, 2021 07:11 AM

Angus JamiesonFri August 27, 2021 10:14 AM

Danilo LunaFri August 27, 2021 11:09 AM

Angus JamiesonFri August 27, 2021 04:42 PM

1. Model training debug

2. RE: Model training debug

3. RE: Model training debug

4. RE: Model training debug

5. RE: Model training debug

Additional
Resources

Office

Quick Links

AIOps

AIOps

Model training debug

Danilo LunaFri August 27, 2021 07:05 AM

Veeramani NambiFri August 27, 2021 07:11 AM

Angus JamiesonFri August 27, 2021 10:14 AM

Danilo LunaFri August 27, 2021 11:09 AM

Angus JamiesonFri August 27, 2021 04:42 PM

1. Model training debug

2. RE: Model training debug

3. RE: Model training debug

4. RE: Model training debug

5. RE: Model training debug

Additional Resources

Office

Quick Links

Additional
Resources