Cloudsoft AMP - Operations Manual

This guide describes sources of information for understanding when things go wrong.

Whether you’re customizing out-of-the-box blueprints, or developing your own custom blueprints, you will inevitably have to deal with entity failure. Thankfully AMP provides plenty of information to help you locate and resolve any issues you may encounter.

Web-console Runtime Error Information

Entity Hierarchy

The AMP web-console includes a tree view of the entities within an application. Errors within the application are represented visually, showing a “fire” image on the entity.

When an error causes an entire application to be unexpectedly down, the error is generally propagated to the top-level entity - i.e. marking it as “on fire”. To find the underlying error, one should expand the entity hierarchy tree to find the specific entities that have actually failed.

Entity’s Error Status

Many entities have some common sensors (i.e. attributes) that give details of the error status:

  • service.isUp (often referred to as “service up”) is a boolean, saying whether the service is up. For many software processes, this is inferred from whether the “service.notUp.indicators” is empty. It is also possible for some entities to set this attribute directly.
  • service.notUp.indicators is a map of errors. This often gives much more information than the single service.isUp attribute. For example, there may be many health-check indicators for a component: is the root URL reachable, it the management api reporting healthy, is the process running, etc.
  • service.problems is a map of namespaced indicators of problems with a service.
  • service.state is the actual state of the service - e.g. CREATED, STARTING, RUNNING, STOPPING, STOPPED, DESTROYED and ON_FIRE.
  • service.state.expected indicates the state the service is expected to be in (and when it transitioned to that). For example, is the service expected to be starting, running, stopping, etc.

These sensor values are shown in the “sensors” tab - see below.

Sensors View

The “Sensors” tab in the AMP web-console shows the attribute values of a particular entity. This gives lots of runtime information, including about the health of the entity - the set of attributes will vary between different entity types.

Sensors view in the AMP debug console.

Note that null (or not set) sensors are hidden by default. You can click on the Show/hide empty records icon (highlighted in yellow above) to see these sensors as well.

The sensors view is also tabulated. You can configure the numbers of sensors shown per page (at the bottom). There is also a search bar (at the top) to filter the sensors shown.

Activity View

The activity view shows the tasks executed by a given entity. The top-level tasks are the effectors (i.e. operations) invoked on that entity. This view allows one to drill into the task, to see details of errors.

Select the entity, and then click on the Activities tab.

In the table showing the tasks, each row is a link - clicking on the row will drill into the details of that task, including sub-tasks:

Task failure error in the AMP debug console.

For ssh tasks, this allows one to drill down to see the env, stdin, stdout and stderr. That is, you can see the commands executed (stdin) and environment variables (env), and the output from executing that (stdout and stderr).

For tasks that did not fail, one can still drill into the tasks to see what was done.

It’s always worth looking at the Detailed Status section as sometimes that will give you the information you need. For example, it can show the exception stack trace in the thread that was executing the task that failed.

Log Files

AMP’s logging is configurable, for the files created, the logging levels, etc.

With out-of-the-box logging, brooklyn.info.log and brooklyn.debug.log files are created. These are by default rolling log files: when the log reaches a given size, it is compressed and a new log file is started. Therefore check the timestamps of the log files to ensure you are looking in the correct file for the time of your error.

With out-of-the-box logging, info, warnings and errors are written to the brooklyn.info.log file. This gives a summary of the important actions and errors. However, it does not contain full stacktraces for errors.

To find the exception, we’ll need to look in AMP’s debug log file. By default, the debug log file is named brooklyn.debug.log. You can use your favourite tools for viewing large text files.

One possible tool is less, e.g. less brooklyn.debug.log. We can quickly find the last exception by navigating to the end of the log file (using Shift-G), then performing a reverse-lookup by typing ?Exception and pressing Enter. Sometimes an error results in multiple exceptions being logged (e.g. first for the entity, then for the cluster, then for the app). If you know the text of the error message (e.g. copy-pasted from the Activities view of the web-console) then one can search explicitly for that text.

The grep command is also extremely helpful. Useful things to grep for include:

  • The entity id (see the “summary” tab of the entity in the web-console for the id).
  • The entity type name (if there are only a small number of entities of that type).
  • The VM IP address.
  • A particular error message (e.g. copy-pasted from the Activities view of the web-console).
  • The word WARN etc, such as grep -E "WARN|ERROR" brooklyn.info.log.

Grep’ing for particular log messages is also useful. Some examples are shown below:

  • INFO: “Started application”, “Stopping application” and “Stopped application”
  • INFO: “Creating VM “
  • DEBUG: “Finished VM “

This guide describes common problems encountered when deploying applications.

YAML deployment errors

The error Invalid YAML: Plan not in acceptable format: Cannot convert ... means that the text is not valid YAML. Common reasons include that the indentation is incorrect, or that there are non-matching brackets.

The error Unrecognized application blueprint format: no services defined means that the services: section is missing.

An error like Deployment plan item Service[name=<null>,description=<null>,serviceType=com.acme.Foo,characteristics=[],customAttributes={}] cannot be matched means that the given entity type (in this case com.acme.Foo) is not in the catalog or on the classpath.

An error like Illegal parameter for 'location' (aws-ec3); not resolvable: java.util.NoSuchElementException: Unknown location 'aws-ec3': either this location is not recognised or there is a problem with location resolver configuration means that the given location (in this case aws-ec3) was unknown. This means it does not match any of the named locations in brooklyn.properties, nor any of the clouds enabled in the jclouds support, nor any of the locations added dynamically through the catalog API.

VM Provisioning Failures

There are many stages at which VM provisioning can fail! An error Failure running task provisioning means there was some problem obtaining or connecting to the machine.

An error like ... Not authorized to access cloud ... usually means the wrong identity/credential was used.

An error like Unable to match required VM template constraints means that a matching image (e.g. AMI in AWS terminology) could not be found. This could be because an incorrect explicit image id was supplied, or because the match-criteria could not be satisfied using the given images available in the given cloud. The first time this error is encountered, a listing of all images in that cloud/region will be written to the debug log.

Failure to form an ssh connection to the newly provisioned VM can be reported in several different ways, depending on the nature of the error. This breaks down into failures at different points:

  • Failure to reach the ssh port (e.g. ... could not connect to any ip address port 22 on node ...).
  • Failure to do the very initial ssh login (e.g. ... Exhausted available authentication methods ...).
  • Failure to ssh using the newly created user.

There are many possible reasons for this ssh failure, which include:

  • The VM was “dead on arrival” (DOA) - sometimes a cloud will return an unusable VM. One can work around this using the machineCreateAttempts configuration option, to automatically retry with a new VM.
  • Local network restrictions. On some guest wifis, external access to port 22 is forbidden. Check by manually trying to reach port 22 on a different machine that you have access it.
  • NAT rules not set up correctly. On some clouds that have only private IPs, AMP can automatically create NAT rules to provide access to port 22. If this NAT rule creation fails for some reason, then AMP will not be able to reach the VM. If NAT rules are being created for your cloud, then check the logs for warnings or errors about the NAT rule creation.
  • ssh credentials incorrectly configured. The AMP configuration is very flexible in how ssh credentials can be configured. However, if a more advanced configuration is used incorrectly (e.g. the wrong login user, or invalid ssh keys) then this will fail.
  • Wrong login user. The initial login user to use when first logging into the new VM is inferred from the metadata provided by the cloud provider about that image. This can sometimes be incomplete, so the wrong user may be used. This can be explicitly set using the loginUser configuration option. An example of this is with some Ubuntu VMs, where the “ubuntu” user should be used. However, on some clouds it defaults to trying to ssh as “root”.
  • Bad choice of user. By default, AMP will create a user with the same name as the user running the AMP process; the choice of user name is configurable. If this user already exists on the machine, then the user setup will not behave as expected. Subsequent attempts to ssh using this user could then fail.
  • Custom credentials on the VM. Most clouds will automatically set the ssh login details (e.g. in AWS using
    the key-pair, or in CloudStack by auto-generating a password). However, with some custom images the VM will have hard-coded credentials that must be used. If AMP’s configuration does not match that, then it will fail.
  • Guest customisation by the cloud. On some clouds (e.g. vCloud Air), the VM can be configured to do guest customisation immediately after the VM starts. This can include changing the root password. If AMP is not configured with the expected changed password, then the VM provisioning may fail (depending if AMP connects before or after the password is changed!).

A very useful debug configuration is to set destroyOnFailure to false. This will allow ssh failures to be more easily investigated.

Timeout Waiting For Service-Up

A common generic error message is that there was a timeout waiting for service-up.

This just means that the entity did not get to service-up in the pre-defined time period (the default is two minutes, and can be configured using the start.timeout config key; the timer begins after the start tasks are completed).

See the overview for where to find additional information, especially the section on “Entity’s Error Status”.

A common problem when setting up an application in the cloud is getting the basic connectivity right - how do I get my service (e.g. a TCP host:port) publicly accessible over the internet?

This varies a lot - e.g. Is the VM public or in a private network? Is the service only accessible through a load balancer? Should the service be globally reachable or only to a particular CIDR?

This guide gives some general tips for debugging connectivity issues, which are applicable to a range of different service types. Choose those that are appropriate for your use-case.

VM reachable

If the VM is supposed to be accessible directly (e.g. from the public internet, or if in a private network then from a jump host)…

ping

Can you ping the VM from the machine you are trying to reach it from?

However, ping is over ICMP. If the VM is unreachable, it could be that the firewall forbids ICMP but still lets TCP traffic through.

telnet to TCP port

You can check if a given TCP port is reachable and listening using telnet <host> <port>, such as telnet www.google.com 80, which gives output like:

Trying 31.55.163.219... Connected to www.google.com. Escape character is '^]'.

If this is very slow to respond, it can be caused by a firewall blocking access. If it is fast, it could be that the server is just not listening on that port.

DNS and routing

If using a hostname rather than IP, then is it resolving to a sensible IP?

Is the route to the server sensible? (e.g. one can hit problems with proxy servers in a corporate network, or ISPs returning a default result for unknown hosts).

The following commands can be useful:

  • host is a DNS lookup utility. e.g. host www.google.com.
  • dig stands for “domain information groper”. e.g. dig www.google.com.
  • traceroute prints the route that packets take to a network host. e.g. traceroute www.google.com.

Proxy settings

Depending on the type of location, AMP might use HTTP to provision machines (clocker, jclouds). If the host environment defines proxy settings, these might interfere with the reachability of the respective HTTP service.

One such case is using VirtualBox with host-only or private internal network settings, while using an external proxy for accessing the internet. It is clear that the external proxy won’t be able to route HTTP calls properly, but that might not be clear when reading the logs (although AMP will present the failing URL).

Try accessing the web-service URLs from a browser via the proxy, or perhaps try running AMP with proxy disabled: export http_proxy= bin/brooklyn launch

Service is listening

Service responds

Try connecting to the service from the VM itself. For example, curl http://localhost:8080 for a web-service.

On dev/test VMs, don’t be afraid to install the utilities you need such as curl, telnet, nc, etc. Cloud VMs often have a very cut-down set of packages installed. For example, execute sudo apt-get update; sudo apt-get install -y curl or sudo yum install -y curl.

Listening on port

Check that the service is listening on the port, and on the correct NIC(s).

Execute netstat -antp (or on OS X netstat -antp TCP) to list the TCP ports in use (or use -anup for UDP). You should expect to see the something like the output below for a service.

Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 0 0 :::8080 :::* LISTEN 8276/java

In this case a Java process with pid 8276 is listening on port 8080. The local address :::8080 format means all NICs (in IPv6 address format). You may also see 0.0.0.0:8080 for IPv4 format. If it says 127.0.0.1:8080 then your service will most likely not be reachable externally.

Use ip addr show (or the obsolete ifconfig -a) to see the network interfaces on your server.

For netstat, run with sudo to see the pid for all listed ports.

Firewalls

On Linux, check if iptables is preventing the remote connection. On Windows, check the Windows Firewall.

If it is acceptable (e.g. it is not a server in production), try turning off the firewall temporarily, and testing connectivity again. Remember to re-enable it afterwards! On CentOS, this is sudo service iptables stop. On Ubuntu, use sudo ufw disable. On Windows, press the Windows key and type ‘Windows Firewall with Advanced Security’ to open the firewall tools, then click ‘Windows Firewall Properties’ and set the firewall state to ‘Off’ in the Domain, Public and Private profiles.

If you cannot temporarily turn off the firewall, then look carefully at the firewall settings. For example, execute sudo iptables -n --list and iptables -t nat -n --list.

Cloud firewalls

Some clouds offer a firewall service, where ports need to be explicitly listed to be reachable.

For example, [security groups for EC2-classic] (http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#ec2-classic-security-groups) have rules for the protocols and ports to be reachable from specific CIDRs.

Check these settings via the cloud provider’s web-console (or API).

Quick test of a listener port

It can be useful to start listening on a given port, and to then check if that port is reachable. This is useful for testing basic connectivity when your service is not yet running, or to a different port to compare behaviour, or to compare with another VM in the network.

The nc netcat tool is useful for this. For example, nc -l 0.0.0.0 8080 will listen on port TCP 8080 on all network interfaces. On another server, you can then run echo hello from client | nc <hostname> 8080. If all works well, this will send “hello from client” over the TCP port 8080, which will be written out by the nc -l process before exiting.

Similarly for UDP, you use -lU.

You may first have to install nc, e.g. with sudo yum install -y nc or sudo apt-get install netcat.

Cloud load balancers

For some use-cases, it is good practice to use the load balancer service offered by the cloud provider (e.g. ELB in AWS or the [Cloudstack Load Balancer] (http://docs.cloudstack.apache.org/projects/cloudstack-installation/en/latest/network_setup.html#management-server-load-balancing))

The VMs can all be isolated within a private network, with access only through the load balancer service.

Debugging techniques here include ensuring connectivity from another jump server within the private network, and careful checking of the load-balancer configuration from the Cloud Provider’s web-console.

DNAT

Use of DNAT is appropriate for some use-cases, where a particular port on a particular VM is to be made available.

Debugging connectivity issues here is similar to the steps for a cloud load balancer. Ensure connectivity from another jump server within the private network. Carefully check the NAT rules from the Cloud Provider’s web-console.

Guest wifi

It is common for guest wifi to restrict access to only specific ports (e.g. 80 and 443, restricting ssh over port 22 etc).

Normally your best bet is then to abandon the guest wifi (e.g. to tether to a mobile phone instead).

There are some unconventional workarounds such as configuring sshd to listen on port 80 so you can use an ssh tunnel. However, the firewall may well inspect traffic so sending non-http traffic over port 80 may still fail.

There are many possible causes for a AMP server becoming slow or unresponsive. This guide describes some possible reasons, and some commands and tools that can help diagnose the problem.

Possible reasons include:

  • CPU is max’ed out
  • Memory usage is extremely high
  • SSH’ing is very slow due (e.g. due to lack of entropy)
  • Out of disk space

See AMP Requirements for details of server requirements.

Machine Diagnostics

The following commands will collect OS-level diagnostics about the machine, and about the AMP process. The commands below assume use of CentOS 6.x. Minor adjustments may be required for other platforms.

OS and Machine Details

To display system information, run:

uname -a

To show details of the CPU and memory available to the machine, run:

cat /proc/cpuinfo
cat /proc/meminfo

User Limits

To display information about user limits, run the command below (while logged in as the same user who runs AMP):

ulimit -a

If AMP is run as a different user (e.g. with user name “adalovelace”), then instead run:

ulimit -a -u adalovelace

Of particular interest is the limit for “open files”.

Disk Space

The command below will list the disk size for each partition, including the amount used and available. If the AMP base directory, persistence directory or logging directory are close to 0% available, this can cause serious problems:

df -h

CPU and Memory Usage

To view the CPU and memory usage of all processes, and of the machine as a whole, one can use the top command. This runs interactively, updating every few seconds. To collect the output once (e.g. to share diagnostic information in a bug report), run:

top -n 1 -b > top.txt

File and Network Usage

To count the number of open files for the AMP process (which includes open socket connections):

BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin
BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java)
lsof -p $BROOKLYN_PID | wc -l

To count (or view the number of “established” internet connections, run:

netstat -an | grep ESTABLISHED | wc -l

Linux Kernel Entropy

A lack of entropy can cause random number generation to be extremely slow. This can cause tasks like ssh to also be extremely slow. See linux kernel entropy for details of how to work around this.

Process Diagnostics

Thread and Memory Usage

To get memory and thread usage for the AMP (Java) process, two useful tools are jstack and jmap. These require the “development kit” to also be installed (e.g. yum install java-1.7.0-openjdk-devel). Some useful commands are shown below:

BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin
BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java)

jstack $BROOKLYN_PID
jmap -histo:live $BROOKLYN_PID
jmap -heap $BROOKLYN_PID

Runnable Threads

The jstack-active script is a convenient light-weight way to quickly see which threads of a running AMP server are attempting to consume the CPU. It filters the output of jstack, to show only the “really-runnable” threads (as opposed to those that are blocked).

BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin
BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java)

curl -O https://raw.githubusercontent.com/apache/brooklyn-dist/master/scripts/jstack-active.sh

jstack-active $BROOKLYN_PID

Profiling

If an in-depth investigation of the CPU usage (and/or object creation) of a AMP Server is requiring, there are many profiling tools designed specifically for this purpose. These generally require that the process be launched in such a way that a profiler can attach, which may not be appropriate for a production server.

Remote Debugging

If the AMP Server was originally run to allow a remote debugger to connect (strongly discouraged in production!), then this provides a convenient way to investigate why AMP is being slow or unresonsive.

Log Files

Cloudsoft AMP will by default create brooklyn.info.log and brooklyn.debug.log files. See the Logging docs for more information.

The following are useful log messages to search for (e.g. using grep). Note the wording of these messages (or their very presence) may change in future version of AMP.

Normal Logging

The lines below are commonly logged, and can be useful to search for when finding the start of a section of logging.

2016-05-30 17:05:51,458 INFO  o.a.b.l.AMPWebServer [main]: Started AMP console at http://127.0.0.1:8081/, running classpath://brooklyn.war
2016-05-30 17:06:04,098 INFO  o.a.b.c.m.h.HighAvailabilityManagerImpl [main]: Management node tF3GPvQ5 running as HA MASTER autodetected
2016-05-30 17:06:08,982 INFO  o.a.b.c.m.r.InitialFullRebindIteration [brooklyn-execmanager-rvpnFTeL-0]: Rebinding from /home/compose/compose-amp-state/brooklyn-persisted-state/data for master rvpnFTeL...
2016-05-30 17:06:11,105 INFO  o.a.b.c.m.r.RebindIteration [brooklyn-execmanager-rvpnFTeL-0]: Rebind complete (MASTER) in 2s: 19 apps, 54 entities, 50 locations, 46 policies, 704 enrichers, 0 feeds, 160 catalog items

Memory Usage

The debug log includes (every minute) a log statement about the memory usage and task activity. For example:

2016-05-27 12:20:19,395 DEBUG o.a.b.c.m.i.AMPGarbageCollector [brooklyn-gc]: AMP gc (before) - using 328 MB / 496 MB memory (5.58 kB soft); 69 threads; storage: {datagrid={size=7, createCount=7}, refsMapSize=0, listsMapSize=0}; tasks: 10 active, 33 unfinished; 78 remembered, 1696906 total submitted)
2016-05-27 12:20:19,395 DEBUG o.a.b.c.m.i.AMPGarbageCollector [brooklyn-gc]: AMP gc (after) - using 328 MB / 496 MB memory (5.58 kB soft); 69 threads; storage: {datagrid={size=7, createCount=7}, refsMapSize=0, listsMapSize=0}; tasks: 10 active, 33 unfinished; 78 remembered, 1696906 total submitted)

These can be extremely useful if investigating a memory or thread leak, or to determine whether a surprisingly high number of tasks are being executed.

Subscriptions

One source of high CPU in AMP is when a subscription (e.g. for a policy or enricher) is being triggered many times (i.e. handling many events). A log message like that below will be logged on every 1000 events handled by a given single subscription.

2016-05-30 17:29:09,125 DEBUG o.a.b.c.m.i.LocalSubscriptionManager [brooklyn-execmanager-rvpnFTeL-8]: 1000 events for subscriber Subscription[SCFnav9g;CanopyComposeApp{id=gIeTwhU2}@gIeTwhU2:webapp.url]

If a subscription is handling a huge number of events, there are a couple of common reasons: * first, it could be subscribing to too much activity - e.g. a wildcard subscription for all events from all entities. * second it could be an infinite loop (e.g. where an enricher responds to a sensor-changed event by setting that same sensor, thus triggering another sensor-changed event).

User Activity

All activity triggered by the REST API or web-console will be logged. Some examples are shown below:

2016-05-19 17:52:30,150 INFO  o.a.b.r.r.ApplicationResource [brooklyn-jetty-server-8081-qtp1058726153-17473]: Launched from YAML: name: My Example App
location: aws-ec2:us-east-1
services:
- type: org.apache.brooklyn.entity.webapp.tomcat.TomcatServer

2016-05-30 14:46:19,516 DEBUG brooklyn.REST [brooklyn-jetty-server-8081-qtp1104967201-20881]: Request Tisj14 starting: POST /v1/applications/NiBy0v8Q/entities/NiBy0v8Q/expunge from 77.70.102.66

Entity Activity

If investigating the behaviour of a particular entity (e.g. on failure), it can be very useful to grep the info and debug log for the entity’s id. For a software process, the debug log will include the stdout and stderr of all the commands executed by that entity.

It can also be very useful to search for all effector invocations, to see where the behaviour has been triggered:

2016-05-27 12:45:43,529 DEBUG o.a.b.c.m.i.EffectorUtils [brooklyn-execmanager-gvP7MuZF-14364]: Invoking effector stop on TomcatServerImpl{id=mPujYmPd}

If you wish to send a detailed report, then depending on the nature of the problem, consider collecting the following information.

See AMP Slow or Unresponse docs for details of these commands.

BROOKLYN_HOME=/home/users/brooklyn/apache-brooklyn-0.9.0-bin
BROOKLYN_PID=$(cat $BROOKLYN_HOME/pid_java)
REPORT_DIR=/tmp/brooklyn-report/
DEBUG_LOG=${BROOKLYN_HOME}/brooklyn.debug.log

uname -a > ${REPORT_DIR}/uname.txt
df -h > ${REPORT_DIR}/df.txt
cat /proc/cpuinfo > ${REPORT_DIR}/cpuinfo.txt
cat /proc/meminfo > ${REPORT_DIR}/meminfo.txt
ulimit -a > ${REPORT_DIR}/ulimit.txt
cat /proc/${BROOKLYN_PID}/limits >> ${REPORT_DIR}/ulimit.txt
top -n 1 -b > ${REPORT_DIR}/top.txt
lsof -p ${BROOKLYN_PID} > ${REPORT_DIR}/lsof.txt
netstat -an > ${REPORT_DIR}/netstat.txt

jmap -histo:live ${BROOKLYN_PID} > ${REPORT_DIR}/jmap-histo.txt
jmap -heap ${BROOKLYN_PID} > ${REPORT_DIR}/jmap-heap.txt
for i in {1..10}; do
  jstack ${BROOKLYN_PID} > ${REPORT_DIR}/jstack.${i}.txt
  sleep 1
done
grep "brooklyn gc" ${DEBUG_LOG} > ${REPORT_DIR}/brooklyn-gc.txt
grep "events for subscriber" ${DEBUG_LOG} > ${REPORT_DIR}/events-for-subscriber.txt
tar czf brooklyn-report.tgz ${REPORT_DIR}

Also consider providing your log files and persisted state, though extreme care should be taken if these might contain cloud or machine credentials (especially if Externalised Configuration is not being used for credential storage).

The troubleshooting overview in AMP gives information for how to find more information about errors.

If that doesn’t give enough information to diagnose, fix or workaround the problem, then it can be required to login to the machine, to investigate further. This guide applies to entities that are types of “SoftwareProcess” in AMP, or that follows those conventions.

VM connection details

The ssh connection details for an entity is published to a sensor host.sshAddress. The login credentials will depend on the AMP configuration. The default is to use the ~/.ssh/id_rsa or ~/.ssh/id_dsa on the AMP host (uploading the associated ~/.ssh/id_rsa.pub to the machine’s authorised_keys). However, this can be overridden (e.g. with specific passwords etc) in the location’s configuration.

For Windows, there is a similar sensor with the name host.winrmAddress.

Install and Run Directories

For ssh-based software processes, the install directory and the run directory are published as sensors install.dir and run.dir respectively.

For some entities, files are unpacked into the install dir; configuration files are written to the run dir along with log files. For some other entities, these directories may be mostly empty - e.g. if installing RPMs, and that software writes its logs to a different standard location.

Most entities have a sensor log.location. It is generally worth checking this, along with other files in the run directory (such as console output).

Process and OS Health

It is worth checking that the process is running, e.g. using ps aux to look for the desired process. Some entities also write the pid of the process to pid.txt in the run directory.

It is also worth checking if the required port is accessible. This is discussed in the guide “Troubleshooting Server Connectivity Issues in the Cloud”, including listing the ports in use: execute netstat -antp (or on OS X netstat -antp TCP) to list the TCP ports in use (or use -anup for UDP).

It is also worth checking the disk space on the server, e.g. using df -m, to check that there is sufficient space on each of the required partitions.

This guide takes a deep look at the Java and log messages for some failure scenarios, giving common steps used to identify the issues.

Script Failure

Many blueprints run bash scripts as part of the installation. This section highlights how to identify a problem with a bash script.

First let’s take a look at the customize() method of the Tomcat server blueprint:

@Override
public void customize() {
    newScript(CUSTOMIZING)
        .body.append("mkdir -p conf logs webapps temp")
        .failOnNonZeroResultCode()
        .execute();

    copyTemplate(entity.getConfig(TomcatServer.SERVER_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "server.xml"));
    copyTemplate(entity.getConfig(TomcatServer.WEB_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "web.xml"));

    if (isProtocolEnabled("HTTPS")) {
        String keystoreUrl = Preconditions.checkNotNull(getSslKeystoreUrl(), "keystore URL must be specified if using HTTPS for " + entity);
        String destinationSslKeystoreFile = getHttpsSslKeystoreFile();
        InputStream keystoreStream = resource.getResourceFromUrl(keystoreUrl);
        getMachine().copyTo(keystoreStream, destinationSslKeystoreFile);
    }

    getEntity().deployInitialWars();
}

Here we can see that it’s running a script to create four directories before continuing with the customization. Let’s introduce an error by changing mkdir to mkrid:

newScript(CUSTOMIZING)
    .body.append("mkrid -p conf logs webapps temp") // `mkdir` changed to `mkrid`
    .failOnNonZeroResultCode()
    .execute();

Now let’s try deploying this using the following YAML:

name: Tomcat failure test
location: localhost
services:
- type: org.apache.brooklyn.entity.webapp.tomcat.TomcatServer

Shortly after deployment, the entity fails with the following error:

Failure running task ssh: customizing TomcatServerImpl{id=e1HP2s8x} (HmyPAozV): Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}

Script failure error in the AMP debug console.

By selecting the Activities tab, we can drill into the task that failed. The list of tasks shown (where the effectors are shown as top-level tasks) are clickable links. Selecting that row will show the details of that particular task, including its sub-tasks. We can eventually get to the specific sub-task that failed:

Task failure error in the AMP debug console.

By clicking on the stderr link, we can see the script failed with the following error:

/tmp/brooklyn-20150721-132251052-l4b9-customizing_TomcatServerImpl_i.sh: line 10: mkrid: command not found

This tells us what went wrong, but doesn’t tell us where. In order to find that, we’ll need to look at the stack trace that was logged when the exception was thrown.

It’s always worth looking at the Detailed Status section as sometimes this will give you the information you need. In this case, the stack trace is limited to the thread that was used to execute the task that ran the script:

Failed after 40ms

STDERR
/tmp/brooklyn-20150721-132251052-l4b9-customizing_TomcatServerImpl_i.sh: line 10: mkrid: command not found


STDOUT
Executed /tmp/brooklyn-20150721-132251052-l4b9-customizing_TomcatServerImpl_i.sh, result 127: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}

java.lang.IllegalStateException: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}
    at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.logWithDetailsAndThrow(ScriptHelper.java:390)
    at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:379)
    at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:289)
    at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:287)
    at org.apache.brooklyn.core.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343)
    at org.apache.brooklyn.core.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

In order to find the exception, we’ll need to look in AMP’s debug log file. By default, the debug log file is named brooklyn.debug.log. Usually the easiest way to navigate the log file is to use less, e.g. less brooklyn.debug.log. We can quickly find find the stack trace by first navigating to the end of the log file with Shift-G, then performing a reverse-lookup by typing ?Tomcat and pressing Enter. If searching for the blueprint type (in this case Tomcat) simply matches tasks unrelated to the exception, you can also search for the text of the error message, in this case ? invalid result 127. You can make the search case-insensitivity by typing -i before performing the search. To skip the current match and move to the next one (i.e. ‘up’ as we’re performing a reverse-lookup), simply press n

In this case, the ?Tomcat search takes us directly to the full stack trace (Only the last part of the trace is shown here):

... at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63) ~[guava-17.0.jar:na]
    at org.apache.brooklyn.core.util.task.BasicTask.get(BasicTask.java:343) ~[classes/:na]
    at org.apache.brooklyn.core.util.task.BasicTask.getUnchecked(BasicTask.java:352) ~[classes/:na]
    ... 9 common frames omitted
Caused by: brooklyn.util.exceptions.PropagatedRuntimeException: 
    at org.apache.brooklyn.util.exceptions.Exceptions.propagate(Exceptions.java:97) ~[classes/:na]
    at org.apache.brooklyn.core.util.task.BasicTask.getUnchecked(BasicTask.java:354) ~[classes/:na]
    at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.execute(ScriptHelper.java:339) ~[classes/:na]
    at org.apache.brooklyn.entity.webapp.tomcat.TomcatSshDriver.customize(TomcatSshDriver.java:72) ~[classes/:na]
    at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver$8.run(AbstractSoftwareProcessDriver.java:150) ~[classes/:na]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_71]
    at org.apache.brooklyn.core.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343) ~[classes/:na]
    ... 5 common frames omitted
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}
    at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.7.0_71]
    at java.util.concurrent.FutureTask.get(FutureTask.java:188) [na:1.7.0_71]
    at com.google.common.util.concurrent.ForwardingFuture.get(ForwardingFuture.java:63) ~[guava-17.0.jar:na]
    at org.apache.brooklyn.core.util.task.BasicTask.get(BasicTask.java:343) ~[classes/:na]
    at org.apache.brooklyn.core.util.task.BasicTask.getUnchecked(BasicTask.java:352) ~[classes/:na]
    ... 10 common frames omitted
Caused by: java.lang.IllegalStateException: Execution failed, invalid result 127 for customizing TomcatServerImpl{id=e1HP2s8x}
    at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.logWithDetailsAndThrow(ScriptHelper.java:390) ~[classes/:na]
    at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper.executeInternal(ScriptHelper.java:379) ~[classes/:na]
    at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:289) ~[classes/:na]
    at org.apache.brooklyn.entity.software.base.lifecycle.ScriptHelper$8.call(ScriptHelper.java:287) ~[classes/:na]
    ... 6 common frames omitted

AMP’s use of tasks and helper classes can make the stack trace a little harder than usual to follow, but a good place to start is to look through the stack trace for the node’s implementation or ssh driver classes (usually named FooNodeImpl or FooSshDriver). In this case we can see the following:

at org.apache.brooklyn.entity.webapp.tomcat.TomcatSshDriver.customize(TomcatSshDriver.java:72) ~[classes/:na]

Combining this with the error message of mkrid: command not found we can see that indeed mkdir has been misspelled mkrid on line 72 of TomcatSshDriver.java.

Non-Script Failure

The section above gives an example of a failure that occurs when a script is run. In this section we will look at a failure in a non-script related part of the code. We’ll use the customize() method of the Tomcat server again, but this time, we’ll correct the spelling of ‘mkdir’ and add a line that attempts to copy a nonexistent resource to the remote server:

newScript(CUSTOMIZING)
    .body.append("mkdir -p conf logs webapps temp")
    .failOnNonZeroResultCode()
    .execute();

copyTemplate(entity.getConfig(TomcatServer.SERVER_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "server.xml"));
copyTemplate(entity.getConfig(TomcatServer.WEB_XML_RESOURCE), Os.mergePaths(getRunDir(), "conf", "web.xml"));
copyTemplate("classpath://nonexistent.xml", Os.mergePaths(getRunDir(), "conf", "nonexistent.xml")); // Resource does not exist!

Let’s deploy this using the same YAML from above. Here’s the resulting error in the AMP debug console:

Resource exception in the AMP debug console.

Again, this tells us what the error is, but we need to find where the code is that attempts to copy this file. In this case it’s shown in the Detailed Status section, and we don’t need to go to the log file:

Failed after 221ms: Error getting resource 'classpath://nonexistent.xml' for TomcatServerImpl{id=PVZxDKU1}: java.io.IOException: Error accessing classpath://nonexistent.xml: java.io.IOException: nonexistent.xml not found on classpath

java.lang.RuntimeException: Error getting resource 'classpath://nonexistent.xml' for TomcatServerImpl{id=PVZxDKU1}: java.io.IOException: Error accessing classpath://nonexistent.xml: java.io.IOException: nonexistent.xml not found on classpath
    at org.apache.brooklyn.core.util.ResourceUtils.getResourceFromUrl(ResourceUtils.java:297)
    at org.apache.brooklyn.core.util.ResourceUtils.getResourceAsString(ResourceUtils.java:475)
    at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver.getResourceAsString(AbstractSoftwareProcessDriver.java:447)
    at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver.processTemplate(AbstractSoftwareProcessDriver.java:469)
    at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver.copyTemplate(AbstractSoftwareProcessDriver.java:390)
    at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver.copyTemplate(AbstractSoftwareProcessDriver.java:379)
    at org.apache.brooklyn.entity.webapp.tomcat.TomcatSshDriver.customize(TomcatSshDriver.java:79)
    at org.apache.brooklyn.entity.software.base.AbstractSoftwareProcessDriver$8.run(AbstractSoftwareProcessDriver.java:150)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at org.apache.brooklyn.core.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343)
    at org.apache.brooklyn.core.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Error accessing classpath://nonexistent.xml: java.io.IOException: nonexistent.xml not found on classpath
    at org.apache.brooklyn.core.util.ResourceUtils.getResourceFromUrl(ResourceUtils.java:233)
    ... 14 more
Caused by: java.io.IOException: nonexistent.xml not found on classpath
    at org.apache.brooklyn.core.util.ResourceUtils.getResourceViaClasspath(ResourceUtils.java:372)
    at org.apache.brooklyn.core.util.ResourceUtils.getResourceFromUrl(ResourceUtils.java:230)
    ... 14 more

Looking for Tomcat in the stack trace, we can see in this case the problem lies at line 79 of TomcatSshDriver.java

External Failure

Sometimes an entity will fail outside the direct commands issues by AMP. When installing and launching an entity, AMP will check the return code of scripts that were run to ensure that they completed successfully (i.e. the return code of the script is zero). It is possible, for example, that a launch script completes successfully, but the entity fails to start.

We can simulate this type of failure by launching Tomcat with an invalid configuration file. As seen in the previous examples, AMP copies two xml configuration files to the server: server.xml and web.xml

The first few non-comment lines of server.xml are as follows (you can see the full file here):

<Server port="${driver.shutdownPort?c}" shutdown="SHUTDOWN">
     <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />
     <Listener className="org.apache.catalina.core.JasperListener" />

Let’s add an unmatched XML element, which will make this XML file invalid:

<Server port="${driver.shutdownPort?c}" shutdown="SHUTDOWN">
     <unmatched-element> <!-- This is invalid XML as we won't add </unmatched-element> -->
     <Listener className="org.apache.catalina.core.AprLifecycleListener" SSLEngine="on" />
     <Listener className="org.apache.catalina.core.JasperListener" />

As AMP doesn’t know how these types of resources are used, they’re not validated as they’re copied to the remote machine. As far as AMP is concerned, the file will have copied successfully.

Let’s deploy Tomcat again, using the same YAML as before. This time, the deployment runs for a few minutes before failing with Timeout waiting for SERVICE_UP:

External error in the AMP debug console.

If we drill down into the tasks in the Activities tab, we can see that all of the installation and launch tasks completed successfully, and stdout of the launch script is as follows:

Executed /tmp/brooklyn-20150721-153049139-fK2U-launching_TomcatServerImpl_id_.sh, result 0

The task that failed was the post-start task, and the stack trace from the Detailed Status section is as follows:

Failed after 5m 1s: Timeout waiting for SERVICE_UP from TomcatServerImpl{id=BUHgQeOs}

java.lang.IllegalStateException: Timeout waiting for SERVICE_UP from TomcatServerImpl{id=BUHgQeOs}
    at org.apache.brooklyn.core.entity.Entities.waitForServiceUp(Entities.java:1073)
    at org.apache.brooklyn.entity.software.base.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:388)
    at org.apache.brooklyn.entity.software.base.SoftwareProcessImpl.waitForServiceUp(SoftwareProcessImpl.java:385)
    at org.apache.brooklyn.entity.software.base.SoftwareProcessDriverLifecycleEffectorTasks.postStartCustom(SoftwareProcessDriverLifecycleEffectorTasks.java:164)
    at org.apache.brooklyn.entity.software.base.lifecycle.MachineLifecycleEffectorTasks$7.run(MachineLifecycleEffectorTasks.java:433)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at org.apache.brooklyn.core.util.task.DynamicSequentialTask$DstJob.call(DynamicSequentialTask.java:343)
    at org.apache.brooklyn.core.util.task.BasicExecutionManager$SubmissionCallable.call(BasicExecutionManager.java:469)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

This doesn’t really tell us what we need to know, and looking in the brooklyn.debug.log file yields no further clues. The key here is the error message Timeout waiting for SERVICE_UP. After running the installation and launch scripts, assuming all scripts completed successfully, AMP will periodically check the health of the node and will set the node on fire if the health check does not pass within a pre-prescribed period (the default is two minutes, and can be configured using the start.timeout config key). The periodic health check also continues after the successful launch in order to check continued operation of the node, but in this case it fails to pass at all.

The first thing we need to do is to find out how AMP determines the health of the node. The health-check is often implemented in the isRunning() method in the entity’s ssh driver. Tomcat’s implementation of isRunning() is as follows:

@Override
public boolean isRunning() {
    return newScript(MutableMap.of(USE_PID_FILE, "pid.txt"), CHECK_RUNNING).execute() == 0;
}

The newScript method has conveniences for default scripts to check if a process is running based on its PID. In this case, it will look for Tomcat’s PID in the pid.txt file and check if the PID is the PID of a running process

It’s worth a quick sanity check at this point to check if the PID file exists, and if the process is running. By default, the pid file is located in the run directory of the entity. You can find the location of the entity’s run directory by looking at the run.dir sensor. In this case it is /tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs. To find the pid, you simply cat the pid.txt file in this directory:

$ cat /tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs/pid.txt
73714

In this case, the PID in the file is 73714. You can then check if the process is running using ps. You can also pipe the output to fold so the full launch command is visible:

$ ps -p 73714 | fold -w 120
PID TTY           TIME CMD
73714 ??         0:08.03 /Library/Java/JavaVirtualMachines/jdk1.8.0_51.jdk/Contents/Home/bin/java -Dnop -Djava.util.logg
ing.manager=org.apache.juli.ClassLoaderLogManager -javaagent:/tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BU
HgQeOs/brooklyn-jmxmp-agent-shaded-0.8.0-SNAPSHOT.jar -Xms200m -Xmx800m -XX:MaxPermSize=400m -Dcom.sun.management.jmxrem
ote -Dbrooklyn.jmxmp.rmi-port=1099 -Dbrooklyn.jmxmp.port=31001 -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.manage
ment.jmxremote.authenticate=false -Djava.endorsed.dirs=/tmp/brooklyn-martin/installs/TomcatServer_7.0.56/apache-tomcat-7
.0.56/endorsed -classpath /tmp/brooklyn-martin/installs/TomcatServer_7.0.56/apache-tomcat-7.0.56/bin/bootstrap.jar:/tmp/
brooklyn-martin/installs/TomcatServer_7.0.56/apache-tomcat-7.0.56/bin/tomcat-juli.jar -Dcatalina.base=/tmp/brooklyn-mart
in/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs -Dcatalina.home=/tmp/brooklyn-martin/installs/TomcatServer_7.0.56/apache
-tomcat-7.0.56 -Djava.io.tmpdir=/tmp/brooklyn-martin/apps/jIzIHXtP/entities/TomcatServer_BUHgQeOs/temp org.apache.catali
na.startup.Bootstrap start

This confirms that the process is running. The next thing we can look at is the service.notUp.indicators sensor. This reads as follows:

{"service.process.isRunning":"The software process for this entity does not appear to be running"}

This confirms that the problem is indeed due to the service.process.isRunning sensor. We assumed earlier that this was set by the isRunning() method in TomcatSshDriver.java, but this isn’t always the case. The service.process.isRunning sensor is wired up by the connectSensors() method in the node’s implementation class, in this case TomcatServerImpl.java. Tomcat’s implementation of connectSensors() is as follows:

@Override
public void connectSensors() {
    super.connectSensors();

    if (getDriver().isJmxEnabled()) {
        String requestProcessorMbeanName = "Catalina:type=GlobalRequestProcessor,name=\"http-*\"";

        Integer port = isHttpsEnabled() ? getAttribute(HTTPS_PORT) : getAttribute(HTTP_PORT);
        String connectorMbeanName = format("Catalina:type=Connector,port=%s", port);

        jmxWebFeed = JmxFeed.builder()
            .entity(this)
            .period(3000, TimeUnit.MILLISECONDS)
            .pollAttribute(new JmxAttributePollConfig<Integer>(ERROR_COUNT)
                    .objectName(requestProcessorMbeanName)
                    .attributeName("errorCount"))
            .pollAttribute(new JmxAttributePollConfig<Integer>(REQUEST_COUNT)
                    .objectName(requestProcessorMbeanName)
                    .attributeName("requestCount"))
            .pollAttribute(new JmxAttributePollConfig<Integer>(TOTAL_PROCESSING_TIME)
                    .objectName(requestProcessorMbeanName)
                    .attributeName("processingTime"))
            .pollAttribute(new JmxAttributePollConfig<String>(CONNECTOR_STATUS)
                    .objectName(connectorMbeanName)
                    .attributeName("stateName"))
            .pollAttribute(new JmxAttributePollConfig<Boolean>(SERVICE_PROCESS_IS_RUNNING)
                    .objectName(connectorMbeanName)
                    .attributeName("stateName")
                    .onSuccess(Functions.forPredicate(Predicates.<Object>equalTo("STARTED")))
                    .setOnFailureOrException(false))
            .build();

        jmxAppFeed = JavaAppUtils.connectMXBeanSensors(this);
    } else {
        // if not using JMX
        LOG.warn("Tomcat running without JMX monitoring; limited visibility of service available");
        connectServiceUpIsRunning();
    }
}

We can see here that if jmx is not enabled, the method will call connectServiceUpIsRunning() which will use the default PID-based method of determining if a process is running. However, as JMX is running, the service.process.isRunning sensor (denoted here by the SERVICE_PROCESS_IS_RUNNING variable) is set to true if and only if the stateName JMX attribute equals STARTED. We can see from the previous call to .pollAttribute that this attribute is also published to the CONNECTOR_STATUS sensor. The CONNECTOR_STATUS sensor is defined as follows:

AttributeSensor<String> CONNECTOR_STATUS =
    new BasicAttributeSensor<String>(String.class, "webapp.tomcat.connectorStatus", "Catalina connector state name");

Let’s go back to the AMP debug console and look for the webapp.tomcat.connectorStatus:

Sensors view in the AMP debug console.

As the sensor is not shown, it’s likely that it’s simply null or not set. We can check this by clicking the “Show/hide empty records” icon (highlighted in yellow above):

All sensors view in the AMP debug console.

We know from previous steps that the installation and launch scripts completed, and we know the procecess is running, but we can see here that the server is not responding to JMX requests. A good thing to check here would be that the JMX port is not being blocked by iptables, firewalls or security groups (see the troubleshooting connectivity guide). Let’s assume that we’ve checked that and they’re all open. There is still one more thing that AMP can tell us.

Still on the Sensors tab, let’s take a look at the log.location sensor:

/tmp/brooklyn-martin/apps/c3bmrlC3/entities/TomcatServer_C1TAjYia/logs/catalina.out

This is the location of Tomcat’s own log file. The location of the log file will differ from process to process and when writing a custom entity you will need to check the software’s own documentation. If your blueprint’s ssh driver extends JavaSoftwareProcessSshDriver, the value returned by the getLogFileLocation() method will automatically be published to the log.location sensor. Otherwise, you can publish the value yourself by calling entity.setAttribute(Attributes.LOG_FILE_LOCATION, getLogFileLocation()); in your ssh driver

Note: The log file will be on the server to which you have deployed Tomcat, and not on the AMP server. Let’s take a look in the log file:

$ less /tmp/brooklyn-martin/apps/c3bmrlC3/entities/TomcatServer_C1TAjYia/logs/catalina.out

Jul 21, 2015 4:12:12 PM org.apache.tomcat.util.digester.Digester fatalError
SEVERE: Parse Fatal Error at line 143 column 3: The element type "unmatched-element" must be terminated by the matching end-tag "</unmatched-element>".
    org.xml.sax.SAXParseException; systemId: file:/tmp/brooklyn-martin/apps/c3bmrlC3/entities/TomcatServer_C1TAjYia/conf/server.xml; lineNumber: 143; columnNumber: 3; The element type "unmatched-element" must be terminated by the matching end-tag "</unmatched-element>".
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:441)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1437)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1749)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2973)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:649)
    at org.apache.tomcat.util.digester.Digester.parse(Digester.java:1561)
    at org.apache.catalina.startup.Catalina.load(Catalina.java:615)
    at org.apache.catalina.startup.Catalina.start(Catalina.java:677)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:321)
    at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:455)
Jul 21, 2015 4:12:12 PM org.apache.catalina.startup.Catalina load
WARNING: Catalina.start using conf/server.xml: The element type "unmatched-element" must be terminated by the matching end-tag "</unmatched-element>".
Jul 21, 2015 4:12:12 PM org.apache.catalina.startup.Catalina start
SEVERE: Cannot start server. Server instance is not configured.

As expected, we can see here that the unmatched-element element has not been terminated in the server.xml file