DevOps Automation

DevOps Automation

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

A Survey of ClearCase Performance Tuning Tweaks

By Brad Poulliot posted 7 days ago

  

HCL Software DevOps Services, a strategic partner of IBM, recently engaged with a multinational aerospace and defense corporation to conduct a ClearCase on Red Hat Enterprise Linux (RHEL) performance assessment.  During the assessment we reviewed previous recommendations from product engineering and customer support, as well as customer insights. We focused on a deep dive into current state (i.e., configurations) and Grafana ClearCase performance metrics developed by the customer’s DevOps team to build a set of solutions targeting incremental performance improvements.

Challenges / Solutions

1)    Make View Wrapper Script – A ‘makeView’ wrapper script was creating temporary views just to access config_spec templates stored in a ClearCase VOB, create the end-user’s project view from the selected template, then delete the temporary view.  Creating the temporary view added unproductive time while the subsequent clean-up increased cycle times resulting in unnecessary end-user wait times, extra steps, and complexity.

View creation data collected over a month indicates:

 

min

max

Avg

makeView per Hour

0

167

39

makeView per Day

268

1816

906

rmView per Hour

0

150

38

rmView per Day

313

1862

889

SOLUTION:  Do not use temporary views, especially at the current volume.  A significant improvement in efficiency and effectiveness will be realized by either publishing the view config_spec templates on a managed network share outside of ClearCase or leveraging view-extended path names to the appropriate template where the ‘makeView’ script can access and read them.

2)    Build Servers – The customer’s physical build servers are configured with 22 cores.  HCL product engineering has discovered that when 8-12 user-space processes are simultaneously using the MVFS performance begins to level off. The actual number varies based on the operating system and the specific type of load.  Eventually, beyond 8-12 cores, performance may degrade. This is because the single MVFS needs to manage its cache against each build and with multiple builds the MVFS must spend significant effort correctly managing shared cache resources.

SOLUTION:  Consider transitioning physical to virtualized build servers.  Virtualization offers better flexibility for supporting multiple different target environments – without incurring additional hardware expenses.  Additionally, this approach improves server reliability and availability, lowers total operational cost, enables better utilization of physical servers and more efficient utilization of power.  Start with 8 CPUs per virtual build host.  Experience has shown that the Linux KVM (Kernel-based Virtual Machine) hypervisor is more than adequate for this purpose.

3)    ALBD Server Log Warnings – We discovered thousands of warnings in the ‘albd_server’ log with this pattern:

‘Albd(<PID>): Warning: Request from host <IP> for server <UUID>@<VOB(or VIEW)_path> with wrong path <VOB(or View)_path>’

These warnings occur when a client host requests a server process running in the context of one path, but the server process is already running on another path. For example, the client may request a ‘vob/vobrpc’ server process located at ‘/vobstg/...’, but the albd's child process was started with the ‘/net/...’ path. 

SOLUTION:  The causes for these warnings include VOBs or views that a user removed or re-registered and re-tagged leaving old VOB or view processes on the registry server.  Additionally, this can occur when the VOB storage is moved without ClearCase being shutdown on the source server or the view server process is not ended.

Verify the path differences by doing one of the following:

Run ‘ps -elf | grep "vob.*server"’. The paths that the albd started the servers on will be displayed.

Run ‘{cc install}/etc/utils/albd_list’. This command will display the albd child processes and the path that the albd thinks they are using.

If the paths reported are the ‘/net’ automount paths, restart ClearCase or kill those server processes. Note that killing the ‘vobrpc_server’ processes will cause views to stall until the view server realizes that the connected ‘vobrpc_server’ is offline and retries.

REFERENCE:  TechNote 348725 – Albd(<PID>): Warning: Request from host <IP> for server <UUID>@<VOB(or VIEW)_path> with wrong path <VOB(or View)_path>

4)    VOB RPC Server Errors – The following errors appear in the ClearCase logs after running ‘dbcheck’ (which runs without errors post VOB Snapshot Backups scheduled job):

¨ error_log:

Thursday 10/04/23 17:09:18. host "vobhost", pid 11795, user "root"

vobrpc_server(11795)/tbs: Ok: Internal Error detected in "../map_db.c" line 1404

vobrpc_server(11795)/vob/map: Error: Something not found in VOB database: "/vob/myvob".

¨ vobrpc_server log:

10/04/23 17:09:18 vobrpc_server(11795): Error: INTERNAL ERROR detected and logged in "/var/adm/atria/log/error_log".

10/04/23 18:09:49 vobrpc_server(11795): Error: Database identifier (dbid) not found in database: "/vob/myvob".

10/04/23 18:09:49 vobrpc_server(11795): Error: INTERNAL ERROR detected and logged in "/var/adm/atria/log/error_log".

SOLUTION: 

This is a defect which has been closed as a permanent restriction and will not be fixed.  It does not impact the functionality of ‘dbcheck’ or the database in anyway.  You may safely ignore these messages.

REFERENCE:  TechNote 333951 – After running dbcheck or a failed client side command log errors indicate that something is not found in database.

5)    Asynchronous Licensing – This optimization feature was previously recommended, but not implemented.   

SOLUTION:  You can select the Asynchronous License Acquisition feature during the client installation process for ClearCase or add it later by modifying the previous installation via the IBM Installation Manager. It only needs to be installed on the client as it alters the way the client handles obtaining a license.

With asynchronous license acquisition, ClearCase does not wait for a response from the license server. When a response arrives, the result is processed by a background process and is faster for the end-user in the normal case where a license is available.

6)    Derived Object (DO) Shopping – The customer has noted that build times seem to be growing.  They have requested the ability to quantify the impact of DO shopping on overall build times. 

SOLUTION:  Two approaches are highlighted below.

¨ clearmake’ build timings are captured and reportable via the ‘-d’ switch and trace output:

>>> 13:12:47.919 (clearmake_167031_bullwinkle): Time building with abes(s)        : 144.931979
>>> 13:12:48.079 (clearmake_167031_bullwinkle): Time building (no abes) (s)       : 0.000000
>>> 13:12:48.080 (clearmake_167031_bullwinkle): Time spent winking-in (s)         : 381.904773
>>> 13:12:48.081 (clearmake_167031_bullwinkle):
Time evaluating targets (s)       : 169352.986633
>>> 13:12:48.081 (clearmake_167031_bullwinkle): Time stalled waiting for hosts (s): 0.000000
>>> 13:12:48.081 (clearmake_167031_bullwinkle): Time stalled waiting for abes (s) : 0.000000
>>> 13:12:48.082 (clearmake_167031_bullwinkle): Time with parallel interference(s): 0.000000
>>> 13:12:48.082 (clearmake_167031_bullwinkle): Time elapsed for build (s)        : 573.327087
>>> 13:12:48.083 (clearmake_167031_bullwinkle):
>>> 13:12:48.084 (clearmake_167031_bullwinkle): # --- Active make-related environment variables ---
>>> 13:12:48.084 (clearmake_167031_bullwinkle): #       CCASE_MAKEFLAGS=d -C gnu
>>> 13:12:48.085 (clearmake_167031_bullwinkle): #       MAKECMDGOALS=


*** Config Lookup Statistics ***

Targets looked up:                        833
Previous DOs evaluated:                   355
Same-as-prev wins:                        0
Previous DOs that matched:                355
Shared candidates evaluated:              0
Unshared candidates evaluated:            478
Candidates with no data:                  0
Previous DO candidates:                   0
Shared candidates that matched:           0
Unshared candidates that matched:         477

 

¨ You can get the same output if you set the following 2 TRACE Environment Variables (EV) before running the build:

TRACE_SUBSYS=bldr_show_times:bldr_shopping_stats

TRACE_VERBOSITY=2

EV-only output from clearmake:


>>> 13:25:02.468 (clearmake_179780_bullwinkle): Time building with abes(s)        : 297.522551
>>> 13:25:02.469 (clearmake_179780_bullwinkle): Time building (no abes) (s)       : 0.000000
>>> 13:25:02.469 (clearmake_179780_bullwinkle): Time spent shopping (s)           : 0.000000
>>> 13:25:02.469 (clearmake_179780_bullwinkle): Time spent winking-in (s)         : 0.000000
>>> 13:25:02.469 (clearmake_179780_bullwinkle):
Time evaluating targets (s)       : 18.206027
>>> 13:25:02.469 (clearmake_179780_bullwinkle): Time stalled waiting for hosts (s): 0.000000
>>> 13:25:02.470 (clearmake_179780_bullwinkle): Time stalled waiting for abes (s) : 0.000000
>>> 13:25:02.470 (clearmake_179780_bullwinkle): Time with parallel interference(s): 0.000000
>>> 13:25:02.470 (clearmake_179780_bullwinkle): Time elapsed for build (s)        : 300.466095

REFERENCE:  TechNote 333955 – How to trace Rational ClearCase build tools

7)    Clearmake Usage – The customer has also requested ideas on small changes to their use of ‘clearmake’ that can augment other configuration changes already implemented offering the potential for gradational performance improvements.

SOLUTION:  Two simple clearmake’ optimization approaches follow.

¨ Consider setting the CCASE_BLD_NOWAIT environment variable as part of production builds.  This environment variable disables the clearmake sleep-check cycle during a build. When set, clearmake does not check for a VOB lock (or wait for the VOB to be unlocked), otherwise the build fails and must be restarted adding rework.  Ensure the CCASE_BLD_NOWAIT is enabled ONLY outside the ClearCase backup timeframe as the VOBs are locked during this period.

REFERENCE:  IBM DevOps Code ClearCase v11.0.0 - Build time environment variables

¨ Additionally, consider setting the CCASE_WINKIN_VIEWS environment variable with a list of white space-separated view tags as part of production builds.  If set in the environment or in the makefile, clearmake winks in only derived objects that were built in the specified views. If no derived objects are available, clearmake rebuilds in the current view.

REFERENCE:  IBM DevOps Code ClearCase v11.0.0 Documentation – env-ccase

8)    Incorrect Customer Automation and Manual Removal of Views – ‘rgy_check -views’ identified 4,529 orphaned view storage registry entries with no view tags.  

LOGS:

¨ mvfslog’ errors:  Many log entries containing, “locked, exceeded retry limit, ignored.”

¨ admin_log’ – many referenced view tags having very OLD timestamps in the name but the error dated ‘TODAY’. Additionally, there are a lot of “.vws isn’t a view, No such file or directory” log entries.

SOLUTION:  Clean-up orphaned registry entries via the following command.

ct rmview -force -all -uuid <view_uuid>

REFERENCE:  TechNote 329893 – How to remove a VOB or View from the ClearCase registry whose storage was deleted

9)    Large Number of config_spec Rules – View config_spec sizes were reaching 200-300 lines which can cause prolonged rule evaluation times to select the specified branches and versions of files and directories.

SOLUTION:  Minus comments and blank lines the effective size of the customer’s largest config_spec template was 46 lines.  Note that the ClearCase product engineering team uses config_specs to build ClearCase that have over 200 rules with great success.

BE AWARE OF:

§  At the extreme, avoid edge cases like 40,000 line config-specs.  IBM Tech Support had a customer who specified an exact version of EVERY element in a build view.

§  When a file/directory is accessed through the MVFS, ClearCase will look in the MVFS cache first.  If not there, it looks in the view cache.  If not there, it has to evaluate the config spec. 

§  Config_spec evaluation is sequential, whether that matters is a classic "that depends" case.  Complex config specs can be improved if the most frequently-used rules are near the top of the config spec.  You can also improve performance by using name-scoped rules (e.g., “element /vobs/component1/… …/component1_mybranch/LATEST” instead of “element * …/component1_mybranch/LATEST).  Use what is needed to create the appropriate configuration and then look for ways to tune it.

§  Some events will cause the view cache to be flushed slowing down config_spec evaluation until the cache is repopulated: 

Ø  cleartool setcs 

Ø  cleartool edcs

Ø  cleartool startview (rarely)

10) Administrative VOBs – Across the 153 VOBs in the customer’s ClearCase environment there were a number of VOBs that were configured as administrative VOBs but were unused, had broken links in the VOB hierarchy, and were not leveraged for global definitions.  Managing product lines with multiple customers for each product prevented using ADMIN VOBs for their intended purpose.

SOLUTION:  Clean-up unused administrative VOB hierarchies by removing its ‘AdminVOB’ hyperlinks and then remove all of the ‘GlobalDefinition’ hyperlinks that support the global types in it.  Refer to the following for details on removing subject hyperlinks:

IBM DevOps Code ClearCase v11.0.0 – Removing the AdminVOB hyperlink 

IBM DevOps Code ClearCase v11.0.0 – Removing all GlobalDefinition hyperlinks

11) MVFS and Build View Caches – The customer requested that MVFS and build view caches be examined to determine if they can be tuned to improve build times.

SOLUTION: 

¨ View Caches – Site-wide Default Size

GUIDELINES:

1)    The value cannot be smaller than 512 KB.

2)    Do not specify a value larger than the amount of physical memory on the server host that you want to dedicate to this view.

3)    Larger cache sizes generally improve view performance.

4)    Verify your changes by checking the hit rates and utilization percentages periodically to see whether they have improved.

CUSTOMER SETTING:

ct getcache -view -site’ @ 12.5 MB (Default setting is 4 MB)

VERIFICATION: 

Based upon obtaining view cache information on a busy build server the hit rate for ‘Lookup’ was 99% which is outstanding! 

¨ MVFS Cache

GUIDELINES:

For UNIX/Linux, the scaling factor can be up to a value of 24, which is the recommended maximum. You can manually adjust the value to be higher. However, when considering setting the scaling factor higher than 24, be aware that a larger cache might not improve performance on all hosts. The cost of managing that cache (particularly the locking cost on multiprocessor hosts) can exceed the benefits.

CURRENT SETTING:

ct getcache -mvfs’ @ a scaling factor of 64

Note that this value is higher than the recommended maximum, however, previous tuning efforts, based upon 22 Core (88 CPU’s) and 768 GB of memory, determined 64 was the optimum scaling factor.

VERIFICATION: 

A pattern of full caches and low hit rates to identify a cache that might benefit from an increased size was not found at the customer.

Based upon obtaining MVFS cache information (‘mvfsstat -cl’) on a busy build server the hit rate was 99% which is outstanding!  Hit rates at or above 90% provide good performance.

The ‘ct getcache -mvfs’ we ran to get the current MVFS cache settings also provided us confirmation as the miss rate for parallel builds was only 0.74%!

12) Unnecessarily Imposing View Server Load – One of the customer’s Jenkins jobs leverages the ‘catcs’ command to obtain product details needlessly starting thousands of ‘view_server’ processes which negatively impact the view server host performance.  This creates an additional load on view servers, particularly when iterating over many views as part of periodic custom scripted view maintenance.   Each of these views is then subject to ‘reloading view cache’ which adds even more stress on the view server.

SOLUTION:  Discontinue the use of ‘catcs’ in the Jenkins job.  A better approach is to have the associated script DIRECTLY read the text config_spec file that is stored in the view storage directory (.vws).  Be aware that an in-use config_spec file exists in both text file (called ‘config_spec’) and compiled (called ‘.compiled_spec’) formats. 

This change has no risk to the integrity of a view as it is a read-only action on an existing text file and will eliminate the side-effect the Jenkins job has of waking up all views (1000’s) and the inherent performance hit.

13) Embedded Release Details – The customer embeds release information into its software in order to audit software deployed to the field.  The result of their implementation is that source files with embedded keywords cause merge conflicts on the same line containing the keyword.  To work around this merge conflict problem they have developed custom wrapper scripts to handle diffs/merges that make temporary copies of these files and remove the keyword expansion prior to compare or merge.  The approach to embedding release information and custom scripts to automate the process actually increase cycle times and  impose unnecessary end-user wait times, extra steps, and complexity.

SOLUTION:  ClearCase product engineering uses '.h' header files that include the major release info (e.g., 11.0.0.03):

#define TBS_IDCC_RELEASE_GROUP "11"

#define TBS_IDCC_RELEASE_MAJOR "0"

#define TBS_IDCC_RELEASE_MINOR "0"

#define TBS_IDCC_RELEASE_BUILD "03"

#define TBS_IDCC_FCC_PRODUCT_ID  TBS_IDCC_RELEASE_GROUP "." TBS_IDCC_RELEASE_MAJOR "." \ TBS_IDCC_RELEASE_MINOR "." TBS_IDCC_RELEASE_BUILD

Part of this release process is to update the numbers when targeting a particular release.  The release information is built into the public release of the product. 

Additionally, there are a set of runtime routines that can take that version and return different forms of it.  The product team uses a build rule that includes the date/time of the product build in a ‘-D’ argument to the compiler so it can include the date/time of the build for callers that want it included (e.g., ‘cleartool -ver’)

This is a common approach that many of our customers have implemented for C and C++ source code. 

14) Dynamic Symbolic Links – Dynamically generating their ‘symlink farm’ and makefiles for every build takes a significant amount of time as well as adding complexity and overhead.  On average it takes about 20 minutes for this stage of a build to complete. 

SOLUTION:  Modify the build process to avoid dynamically creating the symlink farm and makefiles for each and every build. 

IBM Technical Support recommends a static solution: 

·      Create a permanent ‘symlink farm’ for all the header (.h) files once

·      Check them into ClearCase

·      Re-use them for each build

·      Consider this approach for makefiles, as well

Results

The solutions offered above provide primarily simple adjustments and clean-up that have been missed during previous maintenance cycles.  The exception is converting the “hard iron” build server to multiple virtualized build servers to eliminate inherent MVFS performance degradation issues related to the current configuration and increase build performance.

One of the primary Grafana performance indicators we reviewed no longer exhibits the spiky performance experienced prior to the last round of process and configuration changes completed.  The one remaining spike is from a known source, a Jenkins job to clean-up views overnight showing up around 2-3 am in the morning.  That spike goes away 20 minutes later as all views that were “woken up” go back to sleep as the view_server processes self-terminate.  All other Grafana performance indicators appear to show a stable environment with little if any poor performance issues. 

While these solutions provided this customer with incremental performance improvements, please note that every client’s environment has differences in configuration, operational behavior and work load that render any guarantees of performance improvement moot.

“Thank you for coming out and spending the time onsite and giving us key solutions to work on so we can improve our environment and make our end user experience better.  Most importantly, you have been proactive on collaborating on our IBM Tech Support Cases positioning yourself as a trusted advisor, prepared and ready to assist us as future needs arise ... we greatly appreciate you!”

-- Product Owner

Brad Poulliot

Senior Software Architect
HCL Software DevOps Services 

Solution Component:  IBM DevOps Automation IBM DevOps Code ClearCase 

0 comments
18 views

Permalink