Background:
More than 500 users (developers/testers) registered in RTC, and about 200 concurrent user access (looking at license consumption). Almost every evening, RTC server has slow response. Sometimes it look like stopping (not really but users think so). We've increased server resources (Linux, DB2, CPU=16/Memory=48GB/Network Bandwidth 75Mbps) and still haven't got enough. RTC workitem / scm / build features are used as well.
Examples:
1.Last year, we've got a hint from RCS that if "Pending Change sets" increases, it causes high server CPU load because of "compare workspace" feature done by RTC server - if 100 pending change sets, and there are 200 developer's repository workspaces, RTC will work 100x200 calculation every 15min(?) -> I made a rule to make BL frequently and announced to accept to users.
2.We made an operation guide for users, not to Load all component resources. One person of the group load all resources then burn a DVD and share with others, then others re-sync their repository workspace with RTC server. - Especially network environment in China is bad.
3.We use build feature to automate EAR creation, JUNIT/JTEST, those are done at another server, for not every users to do it on their own PC.
Question:
Rather than better sizing information / server performance tuning / Configuration (such as Load balance), I'm more interested in "wise use". What real RTC operations by number of users cause server resource load? What should be avoided? What is the way to find out the cause of slow server response? What user operations should I look at? Is it because of a bad operation rule? Is it the limit of server/network resource? What can I report to my boss as a reasonable cause? In order to avoid slow server performance in big projects, does anyone have any ideas for rules that I can make? All input will be very welcomed.