" There are so many classes here that users are overwhelmed when it comes to filing."
Yes, that was a common problem and it lead to people not wanting to use the system or wanting to replace it with something else.
In the early days it was funny to watch people replacing Vendor A systems with Vendor B and vice-versa for the same reasons.
The grass is always greener as they say.
Here are some observations. I will not call them best practices because, as said, it depends..
I have seen customers run out of table width space or have search issues because of excessive conditions. Early on, one of the first customers hit a table width limit, Later, another hit the 1000 column hard limit in Oracle and was also searching for a document by docId and 10 other conditions and complained of performance. DocId was all they needed since it is the key identifier.
I've been called upon to look at a customer's system which was rebuilt and migrated to solve a performance issue. The problem was they had too many properties (500) and the underlying problem was never resolved so the new system was just as slow. When the system was originally implemented, the customer committee was allowed to define the property requirements. When the vendor later migrated the system, the HR classes could have been split into its own objectstore but was not. You can define as many property templates as you like, but a database row is only so wide and only support so many indexes. Some customers run dedicated Oracle instances, others will use a shared instance. (Shared instance may work for a while but in the long run are a pain when you need to migrate something and get throttled because you're using too much resources or redo log - Suddenly a 3 day project turns into a months long project). Even if you use separate tables (object stores) and partitions, it's probably all going through the same database.
Partitioning full-text searches by subclass is a good idea. But, remember, not all docs are suitable. One customer spent $10M and their new system would get overwhelmed one minute after start of business when their customers started looking up shipping records. They had a warehouse of scanners and were using full text searches but did not OCR the scanned images. Simple but critical mistake.
As Gerold notes, the object_class_id is not indexed so you may want to create an index for that, e.g., createdate + object_class_id. If using Oracle or DB2, remember they are case sensitive. If you use composite indexes, do so based on empirical testing. P8 will add some columns by default for security and empirical testing will show if your composite index works or not. If you can, use a functional index to avoid dealing with case in your front end. (note: SQL combines indexes and Oracle states it can merge single column indexes after 12c but I've never seen them get picked by the query plan when a composite index was available - even if it didn't make sense.
Choice List vs DocClass - Many people create a "docType" choice list. DocClasses can have different inherited security and be assigned different physical storage areas. Maybe one needs retention and the other does not. If you need retention, does your storage support object level or container level retention? The decision process can be layered. When it comes time to delete (dispose of) documents, how easy it is to group them? A multi-valued field value might be stored in the list of strings table. A sweep can look at all records in the database but it might take days to complete. If I want to perform an operation on 20k documents per day but it take 3 days for a sweep to complete, it won't finish in time.
Searches - When you retrieve a document, some properties will come back as unevaluated and require a network round trip to resolve. If you are going to be processing docs in bulk based on some property, you want to assign an index to that property or at least store it as a column in the docversion table to minimize network round trips.
If a field will be mostly null or have limited values, try to avoid indexing it to avoid skew.
You can create multiple object stores to split the docversion table for data or access reasons. If needed, you can do cross-repository searches.
In my experience, most customers have a custom front end. Those apps often store their business data in their own database anyway and many times all they really need is the object_id. FileNet data types are often generalized whereas the app database is domain specific.
Space - If have a 2 letter state code, don't define it as a 64 character wide varchar. Avoid padding field widths just in case. If needed, you can split a database into partitions. If you need a "text" field, or long string, you can assign it it's own tablespace as often done for blob columns.
Adding indexes will slow down insert / update database operations. You can define 500 properties but how many can have indexes before the database slows down? 20?
While on the subject - do not file documents into folders. After about 3M searches will time out and prevent the documents therein from being deleted.
Understand the Business Owners and "Committees" are going to want to add every field they can think of with the belief they will someday find a use for it. They never do. Business Users will be burdened with data entry and will find ways to avoid using the system if it is too cumbersome. As the admin/architect/engineer/chief-bottle-washer, you have to push back against the impractical. The better the planning, the less problems you'll have to deal with later.
Steve
Chief-Bottle-Washer
------------------------------
Stephen Weckesser
------------------------------
Original Message:
Sent: Sun February 16, 2025 04:00 PM
From: A. Caleb Gattegno
Subject: IBM FileNet Content Manager - Best Practice about the use of Document Classes many vs few.
I don't think the important question is many vs few document classes.
I think the important question is why are you using document classes at all.
(caveat: My experience is with P8/Content Engine not Content Manager, so I will try to keep my rationales product-agnostic and ask questions rather than state answers that may be wrong for CM).
Do you have a taxonomy, or formalized system for classifying your content?
If the answer is, "no", then I suggest you stop here and go do that, because the taxonomy is your logical data model.
Now, I'm not saying you need to have a massive taxonomy of 110 hierarchical classes and 500 attributes spread across them. One class and one attribute might be good enough. I've seen plenty of informal taxonomies that consist of "folder #" and "document type". And that simple taxonomy fits just fine into one document class and a couple of attributes.
I won't go overboard talking about taxonomies here. But it's important to say that taxonomies drive three core capabilities:
- Security -- you need to know what something is to be able to security it properly.
- Retention -- you need to know what something is to be able to accurately determine its disposition and manage its retention
- Search -- you need to be able to find stuff (and quality classification also helps with training data)
What you need in your taxonomy to be able to satisfy these three core capabilities is unique to you. The first two might be able to be satisfied with a handful of attributes. In P8/CE, the hierarchical nature of document classes and inherited properties lends itself to using the document class object model to partition class-specific search attributes, but there's no rule that say you can't do that with one, giant document class (DOCTABA or DOCVERSION, anyone?).
So, once you figure out what your logical data model looks like, creating your physical data model is just an exercise in mapping. I can't help you there with Content Manager.
Do you use one document class or a few or several or many is up to you. More = complexity. Fewer can be simpler to build, but more obscure in the long run. Nothing prevents you from filling attributes that aren't valid for a classification anyway if they're all available. I think it's a balance between ongoing complexity vs potential obscurity. And, frankly, the complexity isn't that bad.
What I advocate for in P8 is using three attributes for classification:
- Document Class
- Document Type
- Document Subtype
This isn't arbitrary. Document Class is more coupled to the line of business. The two-tier type/subtype pair is specifically because of Choice Lists and Choice Groups. You can leverage those and tie them at the document class level to associated model-wide properties to prevent content from being committed that isn't valid in the taxonomy. Let the platform protect you from dross.
------------------------------
A. Caleb Gattegno
ECM Architect
New York Life Insurance Company
New York NY
Original Message:
Sent: Sun February 09, 2025 10:41 AM
From: Krishna Vaddireddy
Subject: IBM FileNet Content Manager - Best Practice about the use of Document Classes many vs few.
Hello Everyone,
I have been thinking about the best practice to follow while using document classes (more than 10 vs less than 10) in IBM FileNet Content Manger and thought to put on the forum to hear your thoughts and experiences around this topic.
As per my understanding it may become complex to maintain with less document classes as requirements does not fit with existing document classes. We may also need to customize with navigator request and response filter if we use same document class for different business units where we need to show different metadata for business units in navigator. We can use document type as one differentiator when using same document class and can use marking set for security needs. On the other side it may be clean by creating document classes if metadata differs with existing document classes and clear separation of security but here, we end up creating too many document classes though there are no limitations from product end as per my knowledge.
I am sure there is no definite answer for this question but would like to hear your thoughts and the approach that you are following and your experiences around this topic.
------------------------------
Krishna
------------------------------