Content Management and Capture

 View Only
  • 1.  Datacap configuration probably needs to be simplified.

    Posted Wed October 02, 2024 10:26 PM
    Edited by dsakai Wed October 02, 2024 10:26 PM

    Do you think Datacap configuration is easy or at least reasonable?

    I think Datacap configuration is too complicated for anyone to configure. A client here is now considering switching to a product called AI-Inside

    because this product is easier for the client's engineer to configure and lets them to "experience OCR better".

    I am not sure how Datacap configuration can get better, but it has too many configuration items and registry items,

    and when there are multiple OCR Applications and Tasks and server components, the configuration items increase proportionally.

    I also think Navigator Desktop is heavy, and I and my colleagues almost always use TMWeb for quick configuration.

    WAS adds more complexity to Datacap configurations and I think I will try avoiding Navigator Desktop in future.

    After 9 years of working with Datacap, I think I am still the only one in Japan to know how to configure Datacap feasibly.

    I hope Datacap configuration gets refactored in tune with AI age.



    ------------------------------
    dsakai
    ------------------------------



  • 2.  RE: Datacap configuration probably needs to be simplified.

    Posted Thu October 03, 2024 12:39 PM

    Hello Dsaki, 

    I too have been working with Datacap for a long time.  Configuring datacap and setting it up is not hard.  The hard part is developing a application to fits customer needs.  Once you get it going, it work well.  With the exception if you don't add odd new type of document to process.  I don't know much about AI-Inside but it sure sounds like a document classification using some AI to classify a document type.  Datacap has a new feature to leverage Watson X AI for document classification.  It's call WatsonX AI.  Is is  new on Datacap 9.19 with the latest ifix 5.

    Here some link.  Check with IBM sale for more info.   IMO it basically using AI to classify a document type leveraging LLM language.   I saw the sale demo and it's pretty slick. Basically this bypass FingerPrint and OCR.  The AI just identify the page for you.  Without using OCR and there's no need for Finger printing. 


    https://community.ibm.com/community/user/automation/blogs/krish-lakshminarayanan/2024/06/21/revolutionizing-document-processing-with-ibm-datac
    https://www.ibm.com/support/pages/node/7157978

    By the way tmweb is great but that's kinda dated because no new development are done with tmweb.  Yes Datacap Navigator is challenging but once you get use to it it's okay.  I say it's like tmweb.  Yes there might be some pros and con.   However, i don't think you can avoid Datacap Navigator. 



    ------------------------------
    Duke Lam
    ------------------------------



  • 3.  RE: Datacap configuration probably needs to be simplified.

    Posted 25 days ago
    Edited by dsakai 25 days ago

    Thank you Shaun, Frank, Duke for the answer.

    We still see Floppy Disk icon. It looks like configuration is unchanged from 90's.

    Here are some configuration items I am not sure why exists.

    (1) Datacap Application Rulerunner panel.

    The engineer needs to manually type in Task name from Datacap Studio if they created a new Task and add here.

    In addition to that, they need to logon to TMWeb and configure permission for that task for Users and Stations.

    If TMWeb can display Tasks and has permission setting on UI, why is there this Task panel on Datacap Application Manager.

    (2) DB_INDEX update by manual SQL

    Everytime I create a new DB for an application, I need to manually update with SQL, DB_INDEX on Admin and Engine DB. 

    If I forget, OCR Application may crash. Can this ID update be done automatically by Tool?

    InfoCenter:

    "Multiple engine Databases that have the same db_index, cause locking issues when more than one application is launched"

    (3) Encryption Key and dcskey command

    The engineer needs to copy Encryption Key from Datacap Server to each Rulerunner Server in the Installation phase.

    Then for each OS Account on Rulerunner Server, they need to import the key by executing dcskey command.

    If the OS Account's password needs to change, the engineer first needs to remove the encryption key from the Account by "dcskey -d" command,

    change the OS Account password, and then run "dcskey -i" command to recreate encryption key again.

    For each OS Account and for every password change opportunity, this manual operation needs to be done.

    It can be automated by Powershell Script. But that will need lots of hours for develop / test / documentation / and skill transfer.

    But the hardest part is Rulerunner Server remembers encrypted password in Registry.

    After the engineer changed the password for Rulerunner Service account, they must recreate Rulerunner Thread configuration

    because  this is the only way for them to refresh this encrypted password in Registry.

    (4) Why do we manually define Station ID?

    For every new Rulerunner Server, can this ID be automatically generated?

    There are many other configurations and at least no one around me can understand necessity and design of many configurations.

    Learning curve is very big. I hope Datacap can automatically add Tasks, IDs, and components so the engineer can readily start Developing applications.

    The above configuration needs to be done for each OCR Application and added Rulerunner Server as well.

    One mistake used to take me hours to find it out and fix.

    I do hope Watson makes configuration easier because it makes OCR Application easier to develop, thus less complexity on Infra configuration. 



    ------------------------------
    dsakai
    ------------------------------



  • 4.  RE: Datacap configuration probably needs to be simplified.

    Posted Fri October 04, 2024 04:21 AM

    Hi Dsakai,

    Like others, I've worked with Datacap for many years. I would say that it has a steep learning curve but like anything, practice makes perfect.

    Having said that, fingerprinting and locate rules don't work well in every situation and the addition of the watsonx.ai integration allows classification and extraction to be configured realtively painlessly. It's certainly worth investigating.

    I agree that there are a lot of configuration UIs and perhaps steps could have been taken to better integrate their functions into fewer applications.

    Datacap Navigator by itself does require a significant footprint but if a customer is also using Navigator desktops for other purposes, e.g. workflow or document management, the overhead is easier to justify. TMWeb is great - it's no longer being actively developed but for most applications, it's sufficient for administrative purposes in my opinion.



    ------------------------------
    Shaun McDowall
    ------------------------------



  • 5.  RE: Datacap configuration probably needs to be simplified.

    IBM Champion
    Posted 28 days ago
    Edited by Frank Trila 28 days ago

    Hello Dsakai, 
    for the initial setup you're probably right, but once it's done it normally runs pretty reliable within WAS. Most of our customers have Navigator already in place so that the configuration ends up in adding the Datacap Plugin to the existing Navigator instance. But even in this case there is one thing to mention, because the plugin a the Navigator version have a pretty tight dependency. 
    To have faster and even better extraction and classification results you really should check the new LLM Integration with WatsonX.Ai! The configuration is very straight forward and the results come fast and in a never seen before quality. 
    We did some testing and tried some invoices against the Granite Model. The results for classification where nearly 100% correct and most of the key values, Addresses and dates where found very good. Unfortunately we still have some issues with the  table detection, but maybe another model delivers better results for this use case.
    best regards
    Frank



    ------------------------------
    Frank Trila
    Teamlead Enterprise Content Management
    TIMETOACT Software & Consulting
    Cologne
    +4915117166667
    ------------------------------