Troubleshooting Hipchat Data Center
Before you allow users in to your Hipchat Data Center deployment, you should make sure everything is configured and working appropriately. This page contains a checklist that we hope will help you verify the deployment, and catch any errors before you open the deployment to users.
On this page:
System monitoring and alerts
You can configure Hipchat Data Center with a recipient email address for system alerts from the Hipchat Data Center admin UI.
An alert email is sent when any of the following conditions are met:
System Utilization
- Memory utilization over 98% for three cycles
- Swap file utilization over 10% for three cycles
- CPU (user) utilization over 95% for three cycles
- CPU (system) utilization over 95% for three cycles
- CPU (wait) utilization over 99% for three cycles
サービス
- 'gearman' over 30% CPU utilization
- 'nginx' over 20% CPU utilization for five cycles
- 'ntpd' becomes unavailable
- 'php5' restarts three times within five cycles
- 'punjab' unavailable for three cycles, or over 45% CPU for three cycles
- 'rsyslog' over 75% CPU for three cycles
Set up SNMP monitoring
Hipchat Data Center implements SNMP v2c using standard Ubuntu MIBs that can be enabled at the command line.
- To turn SNMP on or off:
hipchat service -n "on" OR "off"
- To set up the community string:
hipchat service -c <communitystring>
Example: hipchat service -c public
To add TRAP recipient server list:
hipchat service -t trap.server.com
Example: snmp1.exmaple.com,snmp2.example.com
Note: Add
\
prior to a special character as indollar\$ign
Troubleshooting and logs
Log files are available in the /var/log/
directory of each node. The Hipchat service logs can be found inside /var/log/hipchat/
.
Once per day, the log files from each node are copied to the /file_store/shared/logs
subdirectory of your network-attached storage volume. They follow a /YYYYMMDD/machineid/log-files
naming convention.
Configuration management is managed by chef-solo. It is run at boot, upgrade, and during service restarts. You can find the chef-solo log file in at /var/log/chef.log
To retrieve all your logs, run hipchat log -r
on each node. This copies the logs to the /file_store/shared/logs
folder, which you can then compress and include with your support request.
If you need to open a Support request, make sure you download and attach your logs if possible. This helps us speed up the troubleshooting process.
Logs commands
コマンド | 用途 | 注意 |
---|---|---|
hipchat log --rotate | Force a log rotation | This will force all logs to conform to the log rotation configuration specified in /etc/logrotate.conf and /etc/logrotate.d |
hipchat log --purge | Truncates the contents of all logs in /var/log | Be sure to backup any logs required for troubleshooting before executing this command. |
Log file reference
リソース | 用途 | 注意 |
---|---|---|
/var/log/chef.log | chef runs for installing/updating/configuring | Logging starts from first boot. Most system configuration changes will trigger a chef run. |
/var/log/hipchat/nginx.log | nginx logs AND coral logs | Includes nginx-access entries alongside coral entries. nginx.err.log only logs ERROR and above. Any entries in nginx.err.log are indicative of a problem. |
/var/log/hipchat/kern.log | Ubuntu kernel logging | |
/var/log/schema_upgrade.log | Logs any schema upgrade changes that occur during upgrades | Useful for seeing upgrade history. |
/var/log/hipchat/atlassian-crowd.log | External directory (Crowd/AD/LDAP) integration and authentication | Related to user authentication and external directory synchronization. |
/var/log/hipchat/coral.log | APIv2 logs | Many services rely on coral for authentication, so this log is often referenced while tracing a problem. coral.err.log only logs ERROR and above. Any entries in coral.err.log are indicative of a problem. |
/var/log/hipchat/cron.log | Entries related to cron job schedules on the server | |
/var/log/hipchat/web.log | WebUI logging (i.e. the php-based administration) | Good starting point for any error messages or stack traces occurring in the web interface. web.err.log only logs ERROR and above. Any entries in web.err.log are indicative of a problem. |
/var/log/hipchat/update.log | Detailed output of upgrades (and errors) | Critical for troubleshooting upgrade issues, along with chef.log. |
/var/log/hipchat/tetra.log | Core chat service log | Errors here are often critical. tetra.err.log only logs ERROR and above. Any entries in tetra.err.log are indicative of a problem. |
/var/log/hipchat/hup.log | Logs when services are restarted | Helpful for troubleshooting a broken service/upgrade. "services starting" is to prevent access to the system before it is fully initialized, the hup.log is the orderly start - the last statement should be "maintenance_mode now OFF". |
/var/log/hipchat/hcapp.log | Hipchat-specific subprocesses:
| Entries include associated service name for easy parsing, such as:grep scissortail hcapp.log |
/var/log/hipchat/database.log | redis master log, there is another redis log for stats | If this file is very large, then most likely sudo /bin/dont-blame-hipchat; chown redis /mnt is required. |
/var/log/hipchat/daemon.log | Logs for the various daemons, including monit and ntpd | Useful for observing emergency service restarts via monit. Entries include daemon names for parsing, similar to hcapp.log |
/var/log/hipchat/runtime.log | Lists server processes, disk space, server status (including CPU, memory, active user counts, etc.) | This is a great place to start for root cause analysis. |
/var/log/hipchat/mypsql.log | Output related to connection with external PostgreSQL database. |