Data and AI Learning Group

Expand all | Collapse all

How do you scale Google Cloud Document AI processing?

  • 1.  How do you scale Google Cloud Document AI processing?

    Posted Fri September 25, 2020 12:47 PM

    From https://cloud.google.com/document-ai/docs/process-forms, I can see some examples of processing single files. But in most cases, companies have buckets of documents. In that case, how do you scale the document ai processing? Do you use the document ai in conjunction with Spark? Or is there another way?



    ------------------------------
    Cloudy Tech
    ------------------------------


  • 2.  RE: How do you scale Google Cloud Document AI processing?

    IBM Select
    Posted Mon September 28, 2020 08:09 AM
    Well if you are not locked into one cloud provider, I would recommend AWS or Azure.  Both have tools that do the majority of this work for you.  As you review the article, you linked to,  there are multiple examples of code for you to configure. This is all done for you in AWS Textract and it auto scales.  It is also a pay as you go service. So no servers or other things to mess with. Check out an overview here.

    ------------------------------
    James Hagist
    ------------------------------



  • 3.  RE: How do you scale Google Cloud Document AI processing?

    Posted Wed September 01, 2021 07:57 AM

    I could only find the following: batch_process_documents process many documents and return an async response that'll get saved in cloud storage.

    From there, I think that we can parametrise our job by adding an input path of the bucket prefix and distribute the job over several machines.

    All of that could be orchestrated via Airflow for example. For more you should probably join Google Cloud Training without getting fail.



    ------------------------------
    Sarfaraz Khan
    SEO Associate
    Edureka
    Bengaluru
    09606058405
    ------------------------------