Content Management and Capture

 View Only
  • 1.  IBM FileNet Content Manager - Best Practice about the use of Document Classes many vs few.

    Posted 17 days ago

    Hello Everyone,

    I have been thinking about the best practice to follow while using document classes (more than 10 vs less than 10) in IBM FileNet Content Manger and thought to put on the forum to hear your thoughts and experiences around this topic. 

    As per my understanding it may become complex to maintain with less document classes as requirements does not fit with existing document classes. We may also need to customize with navigator request and response filter if we use same document class for different business units where we need to show different metadata for business units in navigator. We can use document type as one differentiator when using same document class and can use marking set for security needs. On the other side it may be clean by creating document classes if metadata differs with existing document classes and clear separation of security but here, we end up creating too many document classes though there are no limitations from product end as per my knowledge.

    I am sure there is no definite answer for this question but would like to hear your thoughts and the approach that you are following and your experiences around this topic.

     



    ------------------------------
    Krishna
    ------------------------------


  • 2.  RE: IBM FileNet Content Manager - Best Practice about the use of Document Classes many vs few.

    Posted 16 days ago

    Hi Krishna,

    In my experience, the final solution depends on the business requirements, the amount of maintenance, implementation, and customization you plan to put into it.
     
    In my opinion, the decision to create different document classes should be based on a different set of metadata.
    One of my customers manages 400 document classes without any problem.
    Olga


    ------------------------------
    Olga Voronovitch
    ------------------------------



  • 3.  RE: IBM FileNet Content Manager - Best Practice about the use of Document Classes many vs few.

    Posted 16 days ago

    Hi,

    This is a constant topic in discussions wit customers when designing the data model. Out in the wild life I have seen a handful to approx. 400. Certainly the handling of large amount of document classes has improved over the years. Whatever, there is no way around a carefully designed document hierarchy.

    Reasons for using different classes (NOT complete)

    • different properties
    • different security
    • different business logic (manifesting themselves as different event actions, etc)
    • different auditing

    I would say most of the time there (with OUR customers) there is a maximum of a few dozen document classes and an additional property that details what kind of document (document type) this really is, e.g. the document class is 'invoice' and there is a property called 'invoice_type' that can be 'debit' or 'credit' (I'm making this up, not that I'm suggesting you use it that way).

    The more document classes you have, the more difficult classification will get, not only technically, but you have to train the users...

    As a last point note that obj_class_id in docversion is NOT indexed by default, so selecting documents will always result in a sequential table scan (unless you additionally specify something more selective). All of our customers (except one who going to add it) have additionally defined that index.

    Hope this helps,

    /Gerold



    ------------------------------
    Gerold Krommer
    Managing Director
    The Document Content Profesionals, G.m.bH
    Wien
    +436602408515
    ------------------------------



  • 4.  RE: IBM FileNet Content Manager - Best Practice about the use of Document Classes many vs few.

    Posted 15 days ago
    At one of our customers, too much specialist knowledge has been mapped in the data model in some cases. There are so many classes here that users are overwhelmed when it comes to filing.
    You should always consider how documents are stored, but also how they are researched. If a document is not found, it may be filed incorrectly or searched for wrongly.
    The point about business logic in Event Actions is also important.
    As an additional point to Gerold's points: it can be useful to access the information more flexibly in the user interface by using predefined entry and search templates. This can be used to map a lot of business logic that is not required directly on a class basis.


    ------------------------------
    Christoph Sünderkamp
    ------------------------------



  • 5.  RE: IBM FileNet Content Manager - Best Practice about the use of Document Classes many vs few.

    Posted 15 days ago

    Well, you hit the 2 big reasons for different Document Classes, Security, and properties.  And correct there is no real product limit; while there might be a technical limit it is beyond the practical limit.  The concept of Sub-Classes might help your implementation if there is a large number of them.  Gerold's comments cover this.

    Try best to remember the purpose of the document being stored.  What do you need to find it?  How can it be identified?  I know many places I have worked with, and I think the content needs to be separate because each business unit is unique.  The aspect of taxonomy might be helpful.  Be careful not to overload properties to mean different things for different business units.  

    Sometimes, you lose the battle, but remind the business and IT that while the product supports the complexity of multiple document classes and security models, you can create something that you cannot support after day 1.  I have lost the battle and created a "Misc" document type.

    While not a specific number, the business unit typically has its own document class, which is typically based on security.  HR typically has a slightly different security model, so HR would have 2-3 document classes to handle employees, corporate, and their own documents.  As you stated there is not going to be a great answer, but hopefully some different perspective will help.



    ------------------------------
    Mike Prentice
    Director of Solutions
    Pyramid Solutions
    ------------------------------



  • 6.  RE: IBM FileNet Content Manager - Best Practice about the use of Document Classes many vs few.

    Posted 15 days ago

    Hi Krishna,

    I would also add another reason to consider when deciding why to use/how many classes to use: In FileNet, you can enable or disable full-text search indexing (if I understood correctly, also for AI) at the class level. This means that either all documents within a class are indexed, or none of them are.

    BR, Jana



    ------------------------------
    Jana Kolodziejová
    ------------------------------



  • 7.  RE: IBM FileNet Content Manager - Best Practice about the use of Document Classes many vs few.

    Posted 15 days ago
    Edited by Ronald Heerema 15 days ago

    Hi Krishna,

    It depends on the way you use Document Management and Archiving.

    We use it mainly from an archiving perspective. That means for us that you only have metadata that is relevant for archiving and retrieving. Usually this is a generic set. We do allow small variations when it is relevant for searching and retrieving from a business perspective.

    Technically you are bound to the table-row limit of your database as properties of all document classes are stored in the same DocVersion table that has a maximum width. Off course you can solve this using separate object stores but each solution comes with it's own "additional" complexity.

    Security we do with Security Proxy Objects and Role Based Access so we can enable multi dimensional security (role-based, subject based, activity/process based, etc.) so that doesn't require additional/extra document classes.

    But having said that there is also an implementation that has started here that will allow 400-2000 document classes. They will allow our primary processes to store any relevant process data onto the documents. Again there is no hard reason not to do this but our environment is based on generic government metadata models and we are building up to 100 billion archived documents. Just imagine what happens when the generic model is still up to change and you have to implement this change 400-2000 times over such a large collection.

    So in short it depends as IBM would say. Based on both your functional and non-functional requirements there is an architecture that will fit your situation the best.

    Regards,

    Ronald



    ------------------------------
    Ronald Heerema
    Architect
    Dutch Tax Office
    Apeldoorn
    +31613009685
    ------------------------------



  • 8.  RE: IBM FileNet Content Manager - Best Practice about the use of Document Classes many vs few.

    Posted 9 days ago

    I don't think the important question is many vs few document classes.

    I think the important question is why are you using document classes at all.

    (caveat:  My experience is with P8/Content Engine not Content Manager, so I will try to keep my rationales product-agnostic and ask questions rather than state answers that may be wrong for CM).

    Do you have a taxonomy, or formalized system for classifying your content?

    If the answer is, "no", then I suggest you stop here and go do that, because the taxonomy is your logical data model.

    Now, I'm not saying you need to have a massive taxonomy of 110 hierarchical classes and 500 attributes spread across them.  One class and one attribute might be good enough.  I've seen plenty of informal taxonomies that consist of "folder #" and "document type".  And that simple taxonomy fits just fine into one document class and a couple of attributes.

    I won't go overboard talking about taxonomies here.  But it's important to say that taxonomies drive three core capabilities:

    1. Security -- you need to know what something  is to be able to security it properly.
    2. Retention -- you need to know what something is to be able to accurately determine its disposition and manage its retention
    3. Search -- you need to be able to find stuff (and quality classification also helps with training data)

    What you need in your taxonomy to be able to satisfy these three core capabilities is unique to you.  The first two might be able to be satisfied with a handful of attributes.  In P8/CE, the hierarchical nature of document classes and inherited properties lends itself to using the document class object model to partition class-specific search attributes, but there's no rule that say you can't do that with one, giant document class (DOCTABA or DOCVERSION, anyone?).

    So, once you figure out what your logical data model looks like, creating your physical data model is just an exercise in mapping.  I can't help you there with Content Manager.

    Do you use one document class or a few or several or many is up to you.  More = complexity.  Fewer can be simpler to build, but more obscure in the long run.  Nothing prevents you from filling attributes that aren't valid for a classification anyway if they're all available.  I think it's a balance between ongoing complexity vs potential obscurity.  And, frankly, the complexity isn't that bad.

    What I advocate for in P8 is using three attributes for classification:

    1. Document Class
    2. Document Type
    3. Document Subtype

    This isn't arbitrary.  Document Class is more coupled to the line of business.  The two-tier type/subtype pair is specifically because of Choice Lists and Choice Groups.  You can leverage those and tie them at the document class level to associated model-wide properties to prevent content from being committed that isn't valid in the taxonomy.  Let the platform protect you from dross.



    ------------------------------
    A. Caleb Gattegno
    ECM Architect
    New York Life Insurance Company
    New York NY
    ------------------------------



  • 9.  RE: IBM FileNet Content Manager - Best Practice about the use of Document Classes many vs few.

    Posted 6 days ago

    " There are so many classes here that users are overwhelmed when it comes to filing."
    Yes, that was a common problem and it lead to people not wanting to use the system or wanting to replace it with something else.
    In the early days it was funny to watch people replacing Vendor A systems with Vendor B and vice-versa for the same reasons. 
    The grass is always greener as they say.

    Here are some observations.  I will not call them best practices because, as said, it depends.. 

    I have seen customers run out of table width space or have search issues because of excessive conditions. Early on, one of the first customers hit a table width limit, Later, another hit the 1000 column hard limit in Oracle and was also searching for a document by docId and 10 other conditions and complained of performance. DocId was all they needed since it is the key identifier.

    I've been called upon to look at a customer's system which was rebuilt and migrated to solve a performance issue.  The problem was they had too many properties (500) and the underlying problem was never resolved so the new system was just as slow.  When the system was originally implemented, the customer committee was allowed to define the property requirements. When the vendor later migrated the system, the HR classes could have been split into its own objectstore but was not. You can define as many property templates as you like, but a database row is only so wide and only support so many indexes. Some customers run dedicated Oracle instances, others will use a shared instance. (Shared instance may work for a while but in the long run are a pain when you need to migrate something and get throttled because you're using too much resources or redo log - Suddenly a 3 day project turns into a months long project). Even if you use separate tables (object stores) and partitions, it's probably all going through the same database. 

    Partitioning full-text searches by subclass is a good idea. But, remember, not all docs are suitable. One customer spent $10M and their new system would get overwhelmed one minute after start of business when their customers started looking up shipping records. They had a warehouse of scanners and were using full text searches but did not OCR the scanned images.  Simple but critical mistake. 

    As Gerold notes, the object_class_id is not indexed so you may want to create an index for that, e.g., createdate + object_class_id.  If using Oracle or DB2, remember they are case sensitive. If you use composite indexes, do so based on empirical testing. P8 will add some columns by default for security and empirical testing will show if your composite index works or not. If you can, use a functional index to avoid dealing with case in your front end. (note: SQL combines indexes and Oracle states it can merge single column indexes after 12c but I've never seen them get picked by the query plan when a composite index was available - even if it didn't make sense. 

    Choice List vs DocClass - Many people create a "docType" choice list.  DocClasses can have different inherited security and be assigned different physical storage areas.  Maybe one needs retention and the other does not. If you need retention, does your storage support object level or container level retention?  The decision process can be layered. When it comes time to delete (dispose of) documents, how easy it is to group them?  A multi-valued field value might be stored in the list of strings table. A sweep can look at all records in the database but it might take days to complete. If I want to perform an operation on 20k documents per day but it take 3 days for a sweep to complete, it won't finish in time. 

    Searches - When you retrieve a document, some properties will come back as unevaluated and require a network round trip to resolve. If you are going to be processing docs in bulk based on some property, you want to assign an index to that property or at least store it as a column in the docversion table to minimize network round trips. 

    If a field will be mostly null or have limited values, try to avoid indexing it to avoid skew.

    You can create multiple object stores to split the docversion table for data or access reasons. If needed, you can do cross-repository searches. 

    In my experience, most customers have a custom front end.  Those apps often store their business data in their own database anyway and many times all they really need is the object_id.  FileNet data types are often generalized whereas the app database is domain specific.

    Space - If have a 2 letter state code, don't define it as a 64 character wide varchar.  Avoid padding field widths just in case.  If needed, you can split a database into partitions. If you need a "text" field, or long string, you can assign it it's own tablespace as often done for blob columns. 

    Adding indexes will slow down insert / update database operations.  You can define 500 properties but how many can have indexes before the database slows down? 20?

    While on the subject - do not file documents into folders. After about 3M searches will time out and prevent the documents therein from being deleted. 

    Understand the Business Owners and "Committees" are going to want to add every field they can think of with the belief they will someday find a use for it.  They never do. Business Users will be burdened with data entry and will find ways to avoid using the system if it is too cumbersome.  As the admin/architect/engineer/chief-bottle-washer, you have to push back against the impractical. The better the planning, the less problems you'll have to deal with later. 

    Steve

    Chief-Bottle-Washer



    ------------------------------
    Stephen Weckesser
    ------------------------------