Complimentary Coursera offer for all new members
From https://cloud.google.com/document-ai/docs/process-forms, I can see some examples of processing single files. But in most cases, companies have buckets of documents. In that case, how do you scale the document ai processing? Do you use the document ai in conjunction with Spark? Or is there another way?
I could only find the following: batch_process_documents process many documents and return an async response that'll get saved in cloud storage.
From there, I think that we can parametrise our job by adding an input path of the bucket prefix and distribute the job over several machines.
All of that could be orchestrated via Airflow for example. For more you should probably join Google Cloud Training without getting fail.