Platform

Platform

A place for Apptio product users to learn, connect, share and grow together.

 View Only

Apptio Quick Tips - Object Identifiers (TBM Studio) 

Wed April 26, 2023 07:03 PM

@Mark Nealey explains the importance of Object Identifiers and how they can be used to improve the performance and efficacy of your reports! 

"The foundation of every Model are the Objects it contains, and the granularity (or "level of detail") they run at. By default, objects run at an automatic unique granularity. As the underlying row counts increase, your performance may decrease, and objects over 10,000 rows will begin to experience noticeable performance degradation.

Defining your Object Identifiers can help with project calculation times and real time model performance. An object Identifier only needs to contain what is going to be used from that object. This is anything you are:

  • Trending By
  • Allocating By
  • Slicing By
  • Pivoting On
  • Grouping By
  • Reporting On"

For more information, see the following articles on Apptio HelpCenter:


#TBMStudio

Statistics
0 Favorited
14 Views
0 Files
0 Shares
0 Downloads

Comments

Tue September 24, 2024 06:11 PM

@Mark Johnson

My high level feedback around performance is essentially a basic statement of “less is typically better”  This isn’t intended to be a cop out, but it’s more of what you should always be thinking about.

Recommendations from a THA are very general and conservative.

Specifically for object identifiers there is a lot of nuance that goes into what really matters.  Under the covers the columns you choose for the OID are fed into the key function to generate the .id column for each object.  To generate these, each column has to be read from the string dictionary, how well this performs depends on how large your string dictionary is.  There is even more nuance than just that statement as the actual way the ends up being accessed changes depending on the actual number of strings they are generating.

So that all being said, as you choose more columns to be in your identifier, each one of those is more reads from the dictionary.  So a million row table with 5 columns is 5 million reads (simplified).  If you could use fewer columns it would be faster, but maybe not materially so for your usecase.   Additionally it might not be 5 million reads because as you note some of these are pretty basic Yes/No columns, which we know there are just 2 values, ~ 1,000,000 times.

There is a debate about using a large concatenated column then using one column as the OID or just using the OID configuration.  My personal recommendation is to use the OID selection as that is the way the system was designed, and any optimizations done to the underlying code will work off the assumption you are doing that.  With that being said at some level and with some customers configurations currently it might be faster to do a concatenation of those columns into one, as we will still do the 5 million reads to generate the string, but we persist that string, so when it becomes time to make that identifier for use, it’s only 1 million reads.  If your strings are staying static, and have lots of re-use this might end up being faster generate the string with 5 million reads, and store it, same values every month net fewer reads.  If your strings are changing all the time, then there is a good chance this configuration will end up being slower as you’re paying the 5 million reads to make the string, then another 1 million to read them back into the OID.  Versus just doing the 5 million reads every month anyways.  Additionally the problem this causes is it fills up the string dictionary with a bunch of garbage strings that don’t end up being used, which we don’t purge until an upgrade or cache clear.  To put this another way, if you calculate your project, and are generating 1 million strings per month for an object, then you add another column to your identifier, all those strings are kept in the dictionary, and now we generate a net new 1 million strings, and so on.  Versus just adding the column to the OID, we aren’t generating new strings because the =key() function just reads the strings in.

For the columns that are “is x/is y”  = “Yes/No” The values in there are pretty static, and pretty binary, so they don’t really impact the same way as a highly variable column does (column of 30 concatenated values)

In my personal testing, when it comes to pre-calculation having 10 or 20 columns in the OID versus doing a concatenation of 10 or 20 columns and having 1 column in the OID, having the 10 or 20 columns was faster for very large tables (10-20 million rows).  This testing made some assumptions, and was generated data, and tested in isolation.

There is no real way end users can profile this, we can, by making a change at some level of your project, watching the threads execute and then seeing if it’s spending more time doing something like =key() or the A && B && C.  This profiling is only part of the story as really it mostly relates to real time performance.

Hopefully platform improvements in the near (1 year) future will make this whole issue moot, which is why today I suggest customers just use the OID checkboxes.

To sum this all up, I go back to my initial point, if something isn’t being used remove it, if you can consolidate strings/prevent needless strings from being made/re-use strings for many periods, then your system will perform better.  It’s very had to give a general recommendation because there actually isn’t a easy answer.

Something I will note, is having NO OID selected is basically the worst, as we run the =Key() function on all columns, so I encourage everyone to always set the OID for every object.

If you have more questions let me know.


#TBMStudio

Thu September 19, 2024 06:32 PM

@Mark Nealey - the advice from a technical health assessment performed by Apptio was that:

It is recommended to have 3 or fewer columns selected for an object identifier, up to a maximum of 5 columns selected. 

In order to support the outcomes of our multiple overlaid models (Cost, Budget, Recovery, Project Ledger, etc), all of which allocate costs with a high level of precision, can you please provide further guidance to this THA recommendation.

Cost Source has OOTB flags - Is Vendor, Is Project, Is Depr, etc. It has Accounts and Cost Centres. I've already hit 5 dimensions here that need granularity and we have our own custom ones that are used in both allocation and reporting for the Cost metric alone.

While I understand the importance of balancing model object granularity to ensure performance and calculation efficiency, I am unclear how to restrict OIDs to just 5 columns when the OOTB product has far more than this as a starting point and we have our own custom requirements to also be included.

Thanks Mark


#TBMStudio

Thu September 07, 2023 01:44 PM

@Steve Bennett the recommended rows threshold reference is the "allocation" rows not the count of rows in the object or table.

If you have an object with 10k rows, there's a good chance you're already at the recommended limit. However, it depends on which rows are subject to the allocation and how many rows on the receiving object's end are in play.

If you see a high allocation row count far exceeding the recommended threshold, consider breaking apart the allocation into smaller chunks.


Quick example...

Object A's total row count = 1,000,000 rows

Object B's total row count = 750,000,000 rows

500 rows of Object A are subjected to an allocation targeting 25 rows of Object B.

Object A (500 rows) X Object B (25 rows) = 12,500 allocation rows


#TBMStudio

Tue June 13, 2023 05:45 AM

Hi, 

I'd be very grateful for more insight into row counts and why a threshold of 10,000 rows is stated?

To be clear, this doesn't mean allocation rows, the size refers to number of rows in an object, right?

Many Thanks

Steve Bennett, 


#TBMStudio