TWS 性能调优
最近的一个TWS测试,在大量Job(几十万个),Job Stream(十几万)同时提交的情况下,TWS效率有明显的下降,文档也只说明了在40000以上Jobs 对象以上,需要在各个方面进行调优,也没有确切的承载极限,下面是从Admin Guide摘出来,所用到的性能调优参数:
@Network traffic
Optimize the network performance on admin guide page 141
1.Critical business activities must be as close as possible to the master domain manager
2. The domain manager must be installed on as powerful a workstation as possible
3.A similarly powerful backup domain manager must be included in the network
4.The network link between the domain manager and its backup must be as fast as possible to pass all the updates received from the subtree
5.If intervention is needed directly on the domain, either give shell access to the operators to use the Tivoli Workload Scheduler command line, orinstall a connector so that the Tivoli Dynamic Workload Console and the Job Scheduling Console can be used.
About the downstream node capability:
The maximum:20 for Solaris, 50 for Windows, 100 for other UNIX workstations
Typical downstream connections is : 10 for Solaris, 15 for Windows, 20 for other UNIX workstations.
@Tracing
Log file metric: Admin guide page 193
@Symphony file size
The most important thing is making file system has enough space for Symphony file, and there is a way to estimate the Symphony file size.
Please go to admin guide page 192
@UNIX MDM handle lots of FTA
We may need change some kernel parameters for better performance, there is reference in admin guide page 266 for HP UNIX tuning.
@Planning space of queue
In admin guide page 143, there is a table with all the queue and their purpose.
“evtsize -show” to show the length of queue, evtsize to change the queue max length
@Mailman server
In admin page 148
@ To improve job processing performance
:Use optman to change the configuration, which is described in user guide page 106
@ Mail box caching – Default is Yes, parameter: mm cache mailbox
:localopts
Caching Advantage: deduce IO, but will bring the potential consistent issue
Caching Disadvantage: More real time msg handling, but bring more IO consuming
More details in admin guide page 269
@Sync level parameter – Only impact UNIX environment
:In localopts
synch level = high
Each write operation on the event files is immediately physically written to
disk. This has a heavy impact on performance caused by the high I/O
dependency.
synch level = medium
Each write event is considered as a single operation. For example, while
TWS_write_event_lock contains only one action, TWS_write_event_update
comprises five actions. With synch level at medium, the five actions in this
write event would be completed in one physical disk access, thus
drastically reducing the I/O overhead.
synch level = low (default)
The operating system decides how and when to synchronize the data to
disk. The impact of this option is more difficult to assess, because the rules
are different for each operating system and file system.
@FTA Switch manager
The fault-tolerant switch manager is enabled by setting the enSwfaultTol global option to yes. When it is set, the master domain manager
distributes messages to all fault-tolerant agents with FullStatus set to yes. Enable this feature will impact:
1. network traffic
2. Disk space
@Scalability, In large number of job scheduling object
1.Impact on Jnextplan: JnextPlan need handle lots of job scheduling. For jobs exceed 40000,
JVM: It is better to extend the JVM heap size to be 1024M, default is 512M
DB2: Default DB2 configuration can only handle 180000 instance, if exceed this, we need to extend the DB2 log capabililty
2.Impact on Reporting:
JVM: If Reporting contain more than 70000 objects, the JVM heap size need to be extended to 1024M
3.Impact on event rule deployment
JVM: if event rule exceed 8000 rules, heap size need to be extended to be 1024M
JVM Change: change server.xml of WAS, and update the generic JVM parameters
DB2 Change: determine how big size we need, and how to change the log capacity on admin guide page 274