JMX timeseries data

When looking at monitoring Java applications it seems that JMX is still one of the best standard ways to get information from your application and application servers.

All containers, and most frameworks publish MBeans into the JMX context, but the tools for processing and retrieving them still seem to be limited.

The quickest and easiest solution is to fire up VisualVM, install the MBeans plugin and connect to your container. You’ll end up with access to a heap of values, most of which won’t help you with your problem, but there may be some pieces of gold in there that are key to helping you identify issues or performance problems.

Not matter what container you run, you’ll get access to the key Memory and Threading beans. These are probably a great place to start when looking at a system in abnormal state. But how do you know what is normal and what is not. The best way is to regularly check on some key metrics and over time you will learn what is normal and what is abnormal.

So how do you do this? You could just fire up VisualVM every now and then and read the values, hopefully you’ll remember what looks normal or not. Or you can setup something that extracts the JMX values periodically and stores them.

If you’ve been following me on Twitter then you’ll know I have been recently using Librato, and that it is a really simple and directed tool that produces so much power by how flexible it is.

Librato is a simple website that enables you to push in timestamped JSON objects, and then graph them on a timeseries graph. This may sound simple, but one of the most complex parts of Metrics analysis is managing the volume of information and drawing relationships between metrics. The Librato platform allows you to easily build ‘Instruments’ that combine multiple different metrics against the same time frame. This simple tool enables a multitude of analysis to be done.

So we now have the raw data in the JMX beans, and a place to store it in Librato, but the missing piece is how to get the data from one into the other.

Of course the Librato guys have build a connector to extract specific JMX metrics and import them into the Librato platform. The code is available on GitHub and pretty self explantory.

Just configure your Librato email and token, and supply the connection credentials for the JMX datasource and you are almost there.

1
2
3
4
5
6
7
8
9
librato-metrics-tap-jmxbeans\
 --publish \
 --email "$EMAIL" \
 --token "$TOKEN" \
 --source "$SOURCE" \
 --jmx-host "$JMX_HOST" \
 --jmx-port $JMX_PORT \
 --data-file-full tomcat-jmx.yaml \
 -i 30

The only decision left is to identify what to extract, as a simple example here is the configuration to pull out the key Thread and Memory statistics as well as the Tomcat specific beans for extracting the Active Sessions from a ‘ROOT’ context.

1
2
3
4
5
6
7
8
---
java.lang:type=Threading:
   ThreadCount:
   PeakThreadCount:
java.lang:type=Memory:
   HeapMemoryUsage:
Catalina:type=Manager,context=/,host=localhost:
   activeSessions:

Note: At the time of writing this the Librato library to process that yaml file needs to be patched, the ‘context=/’ causes issues trying to post them to the Librato API, but worry not, there is a patch available that fixes your problems.. Hopefully soon the Pull Request will get patched and it will be even easier to use.

I also ran into an issue with the library, in that it did not support JMX Authentication, but that was an easy fix and there is now another Pull Request to add support for a –jmx-username and –jmx-password attribute to enable connection to authenticated JMX endpoints.

Monitoring now made easy, enabling you to quickly extract key JMX metrics, pump them into Librato and perform analysis.

Sample Librato graphic

Key Links

Starting a new project can be daunting

I’ve been working on some new utility projects over the last few weeks, and starting from scratch can be a daunting task.

You start out with heaps of ideas and the initial enthusiasm clouds the reality of the task at hand. But as time progresses and features start to appear you begin to realise how big even the simplest things can be.

Its often at this time in the project that they get thrown into the ‘later’ pile never to be resumed again.

Its the daunting scope of your initial ideas that may be your enemy over and over again. But don’t be stopped. Most probably you have something working, or nearly there and with a little focus and some effort you can probably ship it.

But you need to prioritise. Focus on the minimum viable product. Get your ideas listed somewhere, a wiki, whiteboard, bug tracker, Google doc.. Just get the important things down.  Now cut the list to only the essential features. If there is a way to get it in the hands of its target users without the feature put it off.

Hopefully you’ll now find 2-3 things that must be done and a heap you would like to do. Focus on the first couple and get it into the hands of some users.

Just ship it..

Remember that just like Rome all the great products you use today were not built in a day. They most probably started off as a small fraction of what they were and have been shaped by feedback from users and other products.

While we now have access to great libraries and platforms to work with, they only do so much. You need to use the right ones in the right ways.

Shipping a new product can be daunting thinking about all of the work to come, but remember what you have achieved so far and focus on completing the minimal remaining features to get your product into the hands of your users.

Arduino based build light system

I’ve been looking at an Arduino for a few years now, but never really had the drive to pick one up, or the project to use it on. But after the recent move from Jenkins to Bamboo, our office build lights needed some rework and I got to thinking this might be the perfect system for an Arduino.

I ordered the Arduino starter kit, Ethernet Shield and strip of 25 WS2801 LED’s from AdaFruit, and surprisingly only about 6 days later my order arrived in Australia.

Phase 1

Most of the complex logic is handled by the WS2801 lights themselves, each one is individually addressable, and so the only real complex part was how to do the build server integration.

After looking into the features of the Ethernet shield, I took the approach of coding a simple web-server into the Arduino that can respond to a basic set of URL requests, changing the state of the lights based on the pattern of the URL.

The initial build consisted of 6-led lights all configured as one set that could either be on or on a rotating pattern with the led’s Green for success or Red for failure.

These were easily addressable by the simple http server, hitting the url

http://192.168.1.177/success/complete

would turn on the lights to the success state.

While

http://192.168.1.177/failed/complete

would show the full red option.

http://192.168.1.177/success/building

and

http://192.168.1.177/failed/building

would give the other 2 potential options for build status.

A quick and easy modification to our existing build watch script enabled a quick integration to our build system.

Phase 2

Our build process is a little more complex than a single light (or set of lights) can indicate. We have 3 distinct build stages, and it would be good if the lights could watch each stage and indicate the progress through the workflow, and the current success of each stage.

The upgraded Arduino webserver code has now broken the LED’s down into 3 sets, TOP (S1), MIDDLE (S2) and BOTTOM (S3).

These sets each have the same status options of ‘/success/complete/’ and ‘/failed/building’ etc, but are prefixed by the Section the command applies to.

The webserver does very crude pattern matching, so its even possible to control multiple stages at once like.

http://192.168.1.177/S1S2S3/success/complete

or

http://192.168.1.177/S1S3/failed/building

How to build your own

I purchased all of my components from Adafruit.com.
You’ll need:

  1. Arduino base kit
  2. Ethernet shield
  3. WS2801 LED Strip
  4. Some wire or cabling, also available but I had some spare cat5

Once you have all the parts, connect the GND and 5+V pins of the WS2801 to the corrosponding pins on the Arduino, and the Data and Clock pins to the 2 and 6 pin (or whatever you want and change the Sketch)

The Arduino Sketch files are available on my GitHub account. The phase 1 configuration is available in the master branch, while the phase 2 (3 section) configuration is available in the 3seg branch.

You’ll also need the Adafruit_WS2801 library and the base Ethernet library.

Then it should be as simple as firing up the Arduino IDE, loading in the sketch and uploading to your device.

Note: You might need to change the IP address, MAC address or Data pins, these are all identified at the top to the Arduino Sketch file.

What’s your stack?

At work one of our standard interview questions is for the interviewee to describe their current stack, from Operating System through to front end libraries give us a full view of what technology they would choose if they had a choice.

I’ve been playing with lots of different tech recently, and thought it would be a good time to answer the question myself.

So lets start from the bottom, and work up from the Hardware through the stack.

Amazon Web Services

This one is a simple decision, the freedom that the AWS stack provides when it comes to trying new technologies and a forced attitude to configuration management and fault tolerant infrastructure design just makes

With a solid configuration management solution, being able to fully automate a copy of your production system in a matter of minutes is an amazing game changer. Especially when it will only cost a matter of dollars.

There are any number of other reasons to run an AWS stack, but the second biggest benefit has to be the Auto-scaling ability. As long as you design for scale and use the right types of indirection, you’ll have a platform that can handle tiny and massive loads while only paying for the platform you use.

Git

What more is there to say, the Git technology and tools like GitHub and Stash have improved on a already key technology (Subversion) by adding excellent branch management.

The big success of Git adoption has been the innovation that has come out of GitHub. I could fill this blog post with the things I love about GitHub and the features they have added to source control.

For corporate developers that need to run a Git server inside their enterprise look at Stash by Atlassian, while it doesn’t have all the features of GitHub, it has enough to get the benefits of Git.

Chef

This is the most recent tool in my kit, but it is beginning to feel the most exciting.

I’ve needed a way to manage application server configuration for a long time, and with the introduction of AWS into the toolkit being able to provision from bare (or near bare) metal with Chef is key.

I’m only a few weeks into work out how it all hangs together and how to best make it fit to solve my needs, but its getting massive amounts of work done quickly.

The ability to repeat a configuration deployment from scratch is amazing. This is where application development is currently innovating, by turning the entire stack into code we get much more control and flexibility in our environments.

Capistrano

My first serious introduction to Configuration Management was through the use of some simple Capistrano scripts.

While Capistrano is best suited for the deployment of Ruby based applications, we have been able to extend its core features to support application deployment and configuration of complex Java applications.

Its core support for executing remote commands (via SSH) on multiple sets of servers is a simple feature, but enables a whole depth of possible actions.

For a long time our entire Configuration Management was run fully by Capistrano scripts, only to be recently upgraded for Chef for more ‘off the shelf’ component deployments.

Java

While a great number of my stack elements are not Java products or libraries, I still feel that as an application programming language of large and complex systems you can’t beat the JVM and the depth of Java Libraries.

As I am more introduced to tools around Ruby I can see a place for it moving forward, but in a supporting role not as the main codebase.

Tomcat

While Tomcat has been around for ages and a day, its rock solid, well understood and just works.

I’m also bias’d towards Tomcat as its pretty much the only application container I have run, but has always suited the needs of the projects I’ve been on.

Apache

So if Tomcat has been around for ages, then Apache was ancient why Tomcat was pre 1.0. But the same rules apply, it works, does its job well, rock solid and well known.

By connecting Apache and Tomcat together, you end up with a flexible configuration platform that can use Apache modules to generate powerful rewrite rules, load balancing even security settings.

Sensu

Application monitoring is becoming the most critical part of any platform. With multi server environments and more and more moving parts, having a central view of what is running and how it is performing is critical to the success of any application.

Sensu falls into the category of something I’m interested in, but have not yet had the chance to take for a serious run. I like all I am reading about it and what it offers, but am yet to have the chance of running a serious sensu implementation.

Coming in a close second in monitoring has to be the tried and true Nagios, this is what we (and I guess a lot of others) actually run to monitor our systems, but its beginning to feel long in the tooth and adds more configuration overhead that we get out of it.

The fact that it works and does its job well enough is the hard sell on spending more time with Sensu, but moving towards more dynamic and adaptive environments (especially around AWS) a new monitoring solution is required.

New Relic

One of the greatest finds of the past 12 months has to be New Relic. Its monitoring and performance tracking in a box is just amazing, and shows what modern SaaS solutions should be.

New Relic is a great application tier monitoring solution, and while I would love it to do many more things, its still amazing how much it can do.

It is still lacking a bit in non application monitoring, so can’t hold all the cards at the moment, but works well alongside other platforms.

Don’t write this off as a Java tool, I know I initially did, but we have had it run against PHP stacks with just as much useful information generated.

Metrics (by Yammer)

Got put onto this simple library that has finally linked my need to get application tier monitoring with the existing platform monitoring.

While New Relic can get ‘some’ key metrics from the server, it lacks the scalpel like focus you sometimes need on metrics. By this the Metrics library you can extract any JMX metric value, or even include you own deep integrations to export your metrics.

If there is only 1 thing you look at from this post, I would recommend to go and look at Metrics.

Think about it as the Bootstrap for application metrics.

Librato

Not sure where I first found this tool, but its a very simple tool for graphing metrics against a time series, and while this sees like a simple task, the swathe of plugins available and the ease of integration shows off its power.

There are lots of things that can produce data, and as I’ve stated many times in this post, knowledge is power, but wading through pages and pages of data doesn’t often help. Being able to interpret the data as information is key.

Timeseries data needs to be visualised on a graph, and with the massive amount of data you need to analyse application performance you really need to be able to overlay multiple dimensions of data on the fly.

Librato at it’s core is a simple tool, but a powerful cornerstone in your application’s stack.

Summary

If you’ve read this far you can see my stack is very bottom heavy, and that is by design. The application tier will vary from product to product, but they all require the same support from their platform.

Having a solid platform, an ability to control it and identify how it is performing is key to maintaining a successful product.

The best thing about nearly all of these projects are they are OpenSourced and available to all, so you can stand on the shoulders of giants, scratch your own itch and help your fellow developers all while building a flexible, scalable and hopefully profitable product.

AWS CloudWatch graphs with Librato and a little node

I’ve been playing around with building scaleable applications using Amazon’s AWS platform, and seeing as they are now in Sydney it seemed like now was a great time to take things for a run.

Getting servers up and running is not that hard, but the thing I am most interested in at the moment are Metrics, its critical to be be able to measure and monitor the performance of any part of a running system, this is even more critical on the AWS platform, as their setup encourages you to design for failure. But designing for failure requires an adaptive monitoring solution that can look at key metrics and give you simple health.

Looking into the AWS way of doing this, you obviously find CloudWatch, this is the AWS metrics gathering system. There are lots of resource around the web to understand what it is, and I may do a follow up posting on how to setup your own custom metrics, but as it stands straight out of the box there are lots of pieces of information to play with.

On my test account lots equates to 2,257 CloudWatch metrics.
LotOfMetrics

So once you start to dig into it, there are some metrics that are more useful than others, thinks like the ELB Latency, or RequestCount give a lot more useful figures for me than some of the others.

The AWS console has a simple interface to select Metrics and graph their history, but I found myself clicking around all the time just to have to re-open another metric, and while you can select multiple items on the one graph, you can’t mix scales, or even the calculation mode.

My gut feel is the AWS folks offer a simple interface, and expect you (or me) to build a better solution with the metrics that can be found.

As with all of the AWS features, there is a rich API behind CloudWatch that can be used to not only post metrics, but retrieve them as well.

I had been previously investigating some cloud based graphing tools, and stumbled upon Librato, a quick and easy tool to push time based data at, with a free trial period (just like you should) and a nice cheap price point.

A tiny bit more Google foo and I found a cloudwatch-librato library that seemed to do exactly what I wanted.

But after a bit of playing I still couldn’t get it to work. Partially because I hadn’t done any serious Node.js development before, but partially because the script expected a few things, that I just wasn’t doing. I have my own fork of the library, that you might want to start with, it has:

  1. Better support for “Off-AWS” execution,
  2. Some extra Error handling when retrieving CloudWatch metrics,
  3. Debug support to help you get up and running.

Grab the latest from Github using:

git clone https://github.com/stevemac007/cloudwatch-librato

First issue I ran into was getting the dependencies installed, start with NPM by installing npm for your platform.

I’m running Cygwin on Windows, so was able to run:

curl http://npmjs.org/install.sh | sh

The other key missing part was to install Python (needed for the libxmljs module), I started with 3.3, but it seems that it really wants Python >= v2.5.0 & < 3.0.0.

After installing python, run npm to install all the required modules.

npm install underscore optimist aws-lib step librato-metrics

Once installed, all you need to do is:

  • Copy the example settings file,
  • Enter your AWS credential in the ‘awskey’ and ‘awssecret’ section of the settings file
  • Enter the email and token from Librato
  • Configure some services to track

The supplied sample file in the checkout is from the original author, but its quick and easy to add you own by using the AWS console, just open up the CloudWatch console and browse through to a graph you like, for example lets graph the HealthyHostCount for our LoadBalancers per availablity zone.

ELB-CloudWatch

From this screen you have all the information you need to complete the JSON element.

1
2
3
4
5
6
7
8
9
10
11
12
13
        {
            "MetricName": "HealthyHostCount",
            "Namespace": "AWS/ELB",
            "Unit": "Count",
            "Period": 60,
            "Statistic": "Average",
            "Dimensions": {
                "ap-southeast-2": [
                        ["AvailabilityZone", "ap-southeast-2a", "se-2a"],
                        ["AvailabilityZone", "ap-southeast-2b", "se-2b"]
                ]
            }
        }

From the screen:

  • the Namespace is showing in the “Search” box
  • the MetricName is listed in the top section
  • the Unit is listed above the graph
  • along with the Statistic
  • the period needs to be a multiple of 60, and unless you have detailed monitoring enabled, you’ll need to use 300.

Then we need to apply some Dimension filters, here I am creating 2 different graph elements, 1 per AvailabilityZone, and when they are sent to Librato they will appear as ‘se-2a’ and ‘se-2b’.

Now that you have the settings.json file complete, its time to run the script. There is a

--debug

option that allows for some detail to be output when executing the script to show what is going on.

1
node index.js -c settings.example.json --debug

and you will see:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Default Region ap-southeast-2
Resolving Metric dimensions.
 
 Metric: HealthyHostCount
  Region: ap-southeast-2
batches: {
    "60": [
        {
            "MetricName": "HealthyHostCount",
            "Namespace": "AWS/ELB",
            "Unit": "Count",
            "Period": 60,
            "Statistic": "Average",
            "Dimensions": {
                "ap-southeast-2": [
                    [
                        "AvailabilityZone",
                        "ap-southeast-2a",
                        "se-2a"
                    ],
                    [
                        "AvailabilityZone",
                        "ap-southeast-2b",
                        "se-2b"
                    ]
                ]
            }
        }
    ]
}
Scheduled for 60 seconds interval processing....

This lists details of the Metrics being processed, in this example it shows just the contents of your metrics files, but when configured with the “_self” and “_instances” options will detail the resolved AWS instance ids.

The script batch’s up all of the metrics on the same period and processes them together. As we only have 1 metric it groups it into a 60 second polling time. Once the 60 seconds have expired, the script will request the Metric Data from CloudWatch and publish it to Librato.

You’ll see it working with some detail like:

****************************************
*  Retreiving metrics from CloudWatch  *
****************************************
HealthyHostCount for ["AvailabilityZone","ap-southeast-2a","se-2a"] ["2.0" Count]
HealthyHostCount for ["AvailabilityZone","ap-southeast-2b","se-2b"] ["2.0" Count]
 
 
****************************************
*    Processing and sending results    *
****************************************
 POST to librato [
    {
        "name": "HealthyHostCount",
        "value": "2.0",
        "source": "se-2a",
        "measure_time": 1360469280
    },
    {
        "name": "HealthyHostCount",
        "value": "2.0",
        "source": "se-2b",
        "measure_time": 1360469280
    }
]

Once you have a couple of these posted to Librato its time to go check the graphs.

Librato-graph

And that’s the lot.

And for a longer running example with some real data, this is what is possible:

Librato-AllStats

Sample SpringMVC starter application

Getting started on any project or technology is a surprisingly hard task. Once you have a codebase setup you can easily extend it by finding an existing example of how you have done something and copy from it. But when there is nothing I find I spend a lot of time just getting some basic plumbing working.

We have been playing with a lot of different and new technology at work in the last month, and I want to ensure all of that research and learning was not quickly lost.

Tarcio has a quickstart application on github, which inspired me to do the same.

I have published the first project, a very simple SpringMVC application and have added my first extension to it, a simple SiteMesh configuration, which is available as a branch off the base SpringMVC starter project.

I’m going to add some more branches relating to other complimentary technologies soon:

  • Angular
  • Twitter Bootstrap
  • Hibernate
  • Backbone

Dipping a toe into git

I’ve run into some limitations of the EnVers addon to Hibernate that required me to dig into the Hibernate source code and look at pushing up some changes. Luckily the Hibernate Core is now hosted on github so this makes it easy to fork and post my updates.

The discussion on my changes for EnVers are outlined in this discussion.

So first step was to create my own fork of hibernate-core and then create a branch from the 4.0.0.Final tag on the hibernate-core repo.

Then cloning the repo to my local machine was as simple as:

git clone https://stevemac007@github.com/stevemac007/hibernate-core.git

A few simple commits later and I had a working solution.

Then my first git gotcha. I made a change and did a local commit, and then pushed the change upstream. But I had missed a file.
That’s OK from what I know you CAN amend the previous commit, and so I did. It committed locally fine, but when I tried to push the change back upstream it failed.

I’m not sure how I need to get out of this sticky situation so I have just deleted my local clone and re-cloned the repository.

After the fact I found that I might have been able to do this to fix my problem.

I’m going to create a new test branch and see how I can fix this and will update back here how to I go.

Ubuntu server and ufw firewall

I think I have now standardised my server choice to ubuntu server. It seems to have everything I need available as packaged and more importantly supports version upgrades in place. (and they actually work).

The other nice thing is the introduction of a simple firewall app to manage the iptables.

Only hardcore hackers and serious sys admins remember the syntax for adding rules to iptables, and even then I’m sure they still need to read the man page once in a while.

So lets take a look a the new firewall config tool ‘ufw’. It’s probably not that new, but its new to me.

It makes it really easy to add simple rules, like
ufw allow http (allow http access)

but once you get a multi-nic’d machine it still is a bit complex to work out how to allow some traffic and not others.

This is the basic syntax to limit service access based on ip address.

ufw allow from 10.11.12.0/24 to any port 5222

Hopefully I’ll remember this note is here and check back later, and anyone else that has luck searching for this stuff I hope these comments were useful.