I was wondering if you could provide some guidence on setting up NC to run multi-threaded and multi-CPU apps.
In case of multi-threaded, one way I can think of is to control 'CPUS' resource. For example, a single-threaded job would request -r+ CPUS/1 and 4-threaded -r+ 4/CPUS. Would this work, any pitfalls? How can default all jobs to request 1 CPU (i.e. decrement CPU count)a
thanks in advance
max
Question: what does it mean when I receive a message like:
vovsh May 31 09:19:00 startup: Too many clients in the system
when trying to run NetworkComputer commands?
Answer:
This message means that the vovserver running NC has run out of file descriptors.
It is a serious error, and you may need to stop and re-start the vovserver to recover, if you can not determine the cause of the excess clients and reduce them. You should investigate and increase the number of descriptors in any case.
You can get an idea of what the limit is by getting a shell on the NC server host as the owner of NC and using the 'limit' command.
If you can get the server to respond, check the clients page:
http://nc-host:nc-port/server?page=clients
to try to see what the clients are.
This page may take a while to render, if there are lots of clients, have patience.
This condition usually happens because jobs are being submitted incorrectly, for example, giving the -wl option to pipe the log back to the submitter's terminal, and putting the job in background, which permits many jobs to run concurrently, each using extra descriptors. See background information at the end of this article.
How are your users submitting jobs? Has there been a recent change?
The vovserver has limits on the number of notify clients per user to defend against accidental or deliberate occurrences of file-descriptor exhaustion. There should have been an alert before you hit this limit, at 85% of the server capacity, which is also shown on the Admin page. The server reserves a couple of descriptors for access from the local machine, so you may be able to get it to respond from there even when remote access via vovsh is not possible.
You may be able to get the server to be responsive by asking users to shut down all unneeded NC GUIs and other clients. If not, you will need to do an emergency stop and restart of the vovserver with more descriptors available.
The way to do this is to get a shell on the vovserver host as the owner of NC. Change to the parent of vnc.swd, and then change env-var VOV_HOST_NAME from the network hostname to 'localhost', so vovsh will use the loopback interface. This will use one of the reserved descriptors for vovsh connections.
The detailed steps (for csh/tcsh) are:
% ssh vovserver-host nc-owner
% /bin/su -
(supply root passwd)
# cd /etc/security
vi limits.conf
(change the limits to something like:
* hard nofile 8192
* soft nofile 8192
write and quit the file)
# exit
Get a new shell as the NC owner, so that the new limits are in effect, and raise the file-descriptors limit as high as possible:
% unlimit descriptors
% limit
(verify that the descriptors limit has been raised)
Stop and re-start the vovserver:
% cd $VOVDIR/../../vnc
% ves vnc.swd/setup.tcl
(enter the NC environment)
% setenv VOV_HOST_NAME localhost
% vovsh -Y 'vtk_generic_get project P; puts $P(pid)'
shows the process ID of vovserver; you can also use the ps() command, but the above also shows whether you can get the vovserver to respond; the following will be ineffective if you can not)
If the server responds with the process-id, then shut it down by:
% vovsh -Y 'vtk_server_config suddenshutdown server-process-id'
(this will tell the vovslaves to wait for a new server to be started
and is at the heart of what 'ncmgr stop -freeze' does)
Then, restart the server with the new limit of descriptors in effect:
% vovserver -jsb $VOV_PROJECT_NAME
If the server does not respond to the vovsh commands, your only option is to kill it using OS commands instead before restarting it. Remember to increase the limits and unlimit descriptors, before killing the server with -9 and restarting it as above.
Also, for your setup, check the values of :
set config(maxNormalClients) set config(maxNotifyClients)
from the policy.tcl file in your NC server's vnc.swd directory?
For docs on the server configuration parameters, please see:
http://nc-host:nc-port/doc/FTadmin/srvconfig.html
These need to be related to the number of file descriptors available to the server process. You can find out the typical limit by
% limit -h descriptors # csh/tcsh $ ulimit -a # sh/bash
You may want to try to raise the number of file descriptors available to the vovserver. On Linux, the limit is enforced by PAM, and you can edit /etc/security/limits.conf to raise it. It is NOT necessary to reboot the machine. Once the limit has been raised, newly-created processes will have the higher limit, as described above.
Background Information
The vovserver that runs NetworkComputer communicates with its clients via TCP/IP sockets. The clients are the vovslaves, the NC GUI and vovconsoles, and other jobs that use the server's event stream, including jobs using 'nc wait', or those submitted with the -wl option.
While active, each client takes one or more file descriptors on the server host for the socket connections. No matter how high the descriptor limit is set, it could eventually be reached. Hence, the vovserver implements some protection against accidental or deliberate denial-of-service attacks caused by file-descriptor exhaustion. There are limits on the number of normal and notify clients per user, controlled by the policy.tcl settings described above. The server will post alerts when reaching 85% of its capacity.
Advice:
Hi,
I would like to see some sort of capability to post an announcement on the CGI page. Preferably, something that pops up and needs to be acknowledged upon launch, or reload. Our most likely application of this would be to announce upcoming maintenance, license outages, and other conditions that might be beneficial to communicate to the user community.
Announcements in the comments section at the bottom of the "Project Home" , are insufficient, as they are easy to ignore, because users are used to seeing the same thing. I am also looking for something more targeted than an email broadcast which is often ignored, as well.
Thanks,
--Jon