Watson Discovery

Expand all | Collapse all

How to get a list of files ingested a Watson Discovery collection?

  • 1.  How to get a list of files ingested a Watson Discovery collection?

    Posted Mon October 12, 2020 10:22 AM
    Hello everyone, 
    I am working on a Watson Discovery collection containing more than 10,000 documents. I would like to retrieve a list with all ingested documents from that collection. I am using the Python SDK for interacting with the WDC API. I am able to get the first 10,000 documents (max. retrieval count) using a query parameter (query="*.*"). But if I adjust the offset and count parameters (offsett=10000 and count=20000), the following error is raised:

    "error" : "Result window is too large, count + offset must be less than or equal to 10000"

    Does anyone of you know how you can retrieve a list of all documents in a Discovery collection?

    Looking forward to your reply!
    Best Regards, 
    Joost


    ------------------------------
    Joost Vos
    ------------------------------


  • 2.  RE: How to get a list of files ingested a Watson Discovery collection?

    Posted Fri November 20, 2020 04:49 PM

    Hi Joost, 

    The best way to return all documents is to get it out in 10K doc chunks using filters. Depending on what metadata you have available you could filter on ranges of document_ids or something like the document hash (under extracted_metadata). There's an example described in the last comment here https://github.com/watson-developer-cloud/python-sdk/issues/314



    ------------------------------
    ANISH MATHUR
    ------------------------------



  • 3.  RE: How to get a list of files ingested a Watson Discovery collection?

    Posted Mon November 23, 2020 03:57 AM
    Hi Anish,

    Thanks for your reply! This is very helpful!

    Have a good day!
    Joost

    ------------------------------
    Joost Vos
    ------------------------------