Watson Discovery

Expand all | Collapse all

How to get a list of files ingested a Watson Discovery collection?

  • 1.  How to get a list of files ingested a Watson Discovery collection?

    Posted Mon October 12, 2020 10:22 AM
    Hello everyone, 
    I am working on a Watson Discovery collection containing more than 10,000 documents. I would like to retrieve a list with all ingested documents from that collection. I am using the Python SDK for interacting with the WDC API. I am able to get the first 10,000 documents (max. retrieval count) using a query parameter (query="*.*"). But if I adjust the offset and count parameters (offsett=10000 and count=20000), the following error is raised:

    "error" : "Result window is too large, count + offset must be less than or equal to 10000"

    Does anyone of you know how you can retrieve a list of all documents in a Discovery collection?

    Looking forward to your reply!
    Best Regards, 
    Joost


    ------------------------------
    Joost Vos
    ------------------------------


  • 2.  RE: How to get a list of files ingested a Watson Discovery collection?

    Posted 7 days ago

    Hi Joost, 

    The best way to return all documents is to get it out in 10K doc chunks using filters. Depending on what metadata you have available you could filter on ranges of document_ids or something like the document hash (under extracted_metadata). There's an example described in the last comment here https://github.com/watson-developer-cloud/python-sdk/issues/314



    ------------------------------
    ANISH MATHUR
    ------------------------------



  • 3.  RE: How to get a list of files ingested a Watson Discovery collection?

    Posted 4 days ago
    Hi Anish,

    Thanks for your reply! This is very helpful!

    Have a good day!
    Joost

    ------------------------------
    Joost Vos
    ------------------------------