Yes that looks good …
You should set the parallelism in your log connector to 4 both for base and for source (field is below the date range there).
This is assuming you have turned on the connection for historical data for initial training and set consistent dates there and in the training definition under model management.
------------------------------
Angus Jamieson
IT Service Management Solutions Architect
IBM
Edinburgh
------------------------------
Original Message:
Sent: Fri August 27, 2021 11:09 AM
From: Danilo Luna
Subject: Model training debug
Thanks for the fast answer. I tried the command mentioned and got the following:
sh-4.4$ curl -X GET -u $ES_USERNAME:$ES_PASSWORD $ES_URL/_cat/indices -k |sortyellow open 1000-1000-20210823-logtrain tUfS8nWcQ769QOco9AVvAw 3 1 1302000 0 143mb 143mbyellow open 1000-1000-20210826-logtrain glKokYdgSF-XOTeV8N6z8g 3 1 5651888 0 440.6mb 440.6mbyellow open algorithmregistry TgB0lRAuRuOtXgTZDn8uow 1 1 4 0 19.9kb 19.9kbyellow open buildended PLPeek70RvO10za3IU75aw 1 1 0 0 208b 208byellow open buildinfo l7feMWMgQ3iJ7RVKduI3kQ 1 1 0 0 208b 208byellow open buildresult Ye9Y7rqyQfmr7RvfzBVQWQ 1 1 0 0 208b 208byellow open buildstarted 27GYtmRmRaeXDxQfrEjF4g 1 1 0 0 208b 208byellow open comment zNfEQWASS2m25EyIbrBdUg 1 1 0 0 208b 208byellow open commit kONcVRJjQdWaIYGuIYca7w 1 1 0 0 208b 208byellow open connection dsGVfusKRcqTNBUxzSvMXQ 1 1 0 0 208b 208byellow open dataset nuKfCvtJQ0aj6tqUpGMK7g 1 1 1 1 5.2kb 5.2kbyellow open filechange nazyejwVQca9HIA9tlelfQ 1 1 0 0 208b 208byellow open issue eg68Y4DqQAGcGYiwT7yyvg 1 1 0 0 208b 208byellow open language fjd12P1hT2mgeQWyvPB5JA 1 1 0 0 208b 208byellow open postchecktrainingdetails V7bKW23sQ_WSz4nCpSJiEQ 1 1 0 0 208b 208byellow open prechecktrainingdetails WPQdHUU3QJi5wu8fA0Pt2w 1 1 0 0 208b 208byellow open pullrequest oc6lpTp7SCeOF9OZnxxuIw 1 1 0 0 208b 208byellow open repository I3BohAqrQKinXAyiIm3QqA 1 1 0 0 208b 208byellow open repositoryscan NN64k02FTrSkZ5Y5z3KPqw 1 1 0 0 208b 208byellow open repositoryscanreport kZUe-zpIQcqBFnbPMG-QKQ 1 1 0 0 208b 208byellow open repositoryscanreportdata 9LYUsJQbTEKTyLHyIn6vKQ 1 1 0 0 208b 208byellow open snowchangerequest Z8qaHZMjTQ2ZS53N1n59ew 1 1 0 0 208b 208byellow open snowincident ATSzyZLdQb2N1GHmsEUlTQ 1 1 0 0 208b 208byellow open snowproblem 4wUzvPRNSsq-6_c-h1az3g 1 1 0 0 208b 208byellow open trainingdefinition RSFvStURSkKh98qIefolvA 1 1 1 0 8.7kb 8.7kbyellow open trainingsrunning QS03CgRJQNG__cEDQdA6kw 1 1 0 0 208b 208b
Not sure how to interpret this though. The first two lines looks promising :)
I will find a way to share my email with you via private channel.
------------------------------
Danilo Luna
Original Message:
Sent: Fri August 27, 2021 10:13 AM
From: Angus Jamieson
Subject: Model training debug
Hi Danilo,
Some ideas from my colleagues. As you are a Business Partner if send me your email I can share some internal material with you if this doesn't get things going.
You'd need a training definition in the model management section covering the relevant dates
Then you should set the integration to historical data for initial log training
And wait for the files to be transferred into elastic
To check look into elastic using curl **
As I know there may be few ways this could have happened.
- Either the data is not transformed to training,
- data is not enough or data is not healthy or
- we need to see if the cluster is not broken
** To verify the data is present, we can use below steps track current data flow which is enabled,
oc projects <namepace>
oc get pods | grep api-server
oc exec -it api-server-pod-name bash
curl -X GET -u $ES_USERNAME:$ES_PASSWORD $ES_URL/_cat/indices -k |sort (use this curl command to see the logs indices).
For an app without much logging you may need to wait a number of days to start training (you need this amount of days to generate data enough for training)
These are all common use case we faced for this error but yes if these looks fine, then it may be something else we need to debug even more to see this.
Also a couple of other things;
i) Make sure there is a data available for those dates that you are trying to train your model.
ii) Please check the kafka integration is set correctly as shown in the image below.
------------------------------
Angus Jamieson
IT Service Management Solutions Architect
IBM
Edinburgh
Original Message:
Sent: Fri August 27, 2021 06:13 AM
From: Danilo Luna
Subject: Model training debug
Hi all.
I just started with Watson AIOPs 3.1.1 and would like to train some models based on log files coming from ELK. I was able to create the integration and it tests successfully, however, after creating a new model and starting the training, I get and error: "Start training failed: Could not find any data to train on". The error is quite obvious, however, I can see data using Kibana for the defined training period.
I would like to know what are the general steps to debug the issue in this case. Mainly:
- Where are the relevant logs located?
- How can I check why/if data is not being transferred over from my target towards the internal Elastic?
- Any other hint?
Thanks
Danilo
------------------------------
Danilo Luna
------------------------------