23 Aug 2014, 21:24

Why do I do DefCon?

For some reason, I’m in a reflective mode, so this entry is going to be a bit on the narcissistic and cathartic side. Feel free to pass on this one as it’s mostly for me.

Just spent the last few weeks doing a bit of conferences and running around. This is that fine time of year for me where I’m away more than I’m present. It definitely takes its toll, both physically and mentally. A lot of the time, I don’t get to see talks, I don’t end up at parties (my choice), I don’t end up passed out drunk (only so much when you limit yourself to frufru drinks), and I don’t end up with anything interesting to show of it. At the same time, I put up with some assholes, and some people who aren’t too sure what they’re doing (to put it nicely).

So why do I do it?

I have to admit that I started it just to be a part of it. To be able to say that I was big and badass. I’ve always like the CyberPunk motif and to be a part of it is such a feeling. It’s not necessarily the best reason, but I have to admit to myself and others as to why.

As I said, I’m not really in it for the parties, or the debauchery (well, I could be talked into that one ;)), or the interaction with thousands of people, or the purposeful trolling.

For me, the best reason I think of is that it’s really about the doing. Seeing what you can accomplish. Digging under the surface and seeing more than what others see. Finding out. Exploring. Basically, a bunch of the fun stuff that people ascribe to the “hacker ethos.” I consider myself a lifetime learner and tinkerer, and this reinforces it. From the world class expert telling the world what she knows, to the newbie who is just there to find out why he can find, it’s all there.

I think I finally heard it this year. Not sure why it took me this long, but there’s a focus on The Community. DefCon is a conference that is better because of the attendees. If you show up, there’s a good chance that you’re going to be participating, and not just watching. If you’re just watching, you’re watching the other attendees as much as any organizer. As organizers, we’re there to create the space so that you can give all you want to give.

We want you to have fun.

I want to have fun. And I have fun by doing. So, that’s why I do DefCon.

I think one of the big things for me to work on over the coming years is to be more a part of The Community, and the family. Not sure how to make that happen, but something about the first step being recognize it.

One last call out. To my fellow goons, those that I know and those that I don’t, I salute you. Everyone of you impress me and inspire me to be a better goon.

27 Jul 2014, 00:13

Stepping Up the San Diego Computer Tech Community

I was at a tech conference this week, and talking with some people. Most of them were former San Diego people who are now SF or Austin. It came up that SD is lacking in the size of its computer tech community. Sure there’s a lot of good stuff in Biotech, but if you look at the number of computer start ups, we don’t have the same feel as LA or NY (I’m not even including SF as it’s a different category). There’s 17 meet ups in just the next week for “Tech,” and 110 groups in the area. Is it enough? I’m not sure as I can’t say how active they are, but 17 seems like a low number for meetings. So, I think there is some truth to SD not being in the same league, but I think it’s also something we can change.

Though it’s nice to be a big fish, even if it’s a small pond, one of the better was to improve is to surround yourself with people who will bring you up. There’s some adage about not being the biggest fish in the pond.

Personally - leaving SD isn’t a huge option for me, and I’m still iffy on telecommuting. So, I’m left with making SD a better place. “Left with” is probably the wrong word choice with a negative connotation. Reality is that it’s just the place that I am right now. So, all of those behoove me to make SD a better place for this.

So, “Why is SD like this?” Part of me wants to avoid that question completely, but you can’t really address a problem unless you identify it. So, here’s my reasons:

.) Location - The county is spread out a bit, so it’s hard to build momentum when you can’t even build it in one place .) Support - Most of the big supports are companies and, currently, those support what SD is known for: Biotech and Military. .) Topics/discussion - “There’s few people will to lead a discussion.” This is self-fulfilling crap. Momentum builds but you gotta get it going first. .) Ganas - There’s a concern that the culture here just isn’t as go-getting as SF or other places. That seems to put is into a Fixed Mindset rather than a Growth Mindset. But Mindsets can change.

What can we do for each of these? In the groups that I’m working with (largely, LOPSA-SD and the SD Cassandra User Meetups):

Location: I think this is a matter of being a bit flexible (stealing this from some comments on SF Perl Mongers - interesting to see them talking about how to address not being as involved as they’d like. To that, I’m gonna start by making it a bit optional. I think getting some video recordings or live streaming going is step 1. Beyond that, probably playing with the actual location or timing will help us find the right fit. I think long term, it’s best to have some stability, but in the short term, we probably have to experiment a bit to see what the community wants.

Support: I know that some company support has been coming. It’s time to extend this into reaching out to other groups and see what we can do to support each other. Maybe offering some goods and services, or location, or joint meetings. I’m not sure, but this is an area to take a look at. Reaching out will be fun, since I know I’m an Introvert.

Topics/Discussion: This one is kinda making me scratch my head. There’s so much out there. I think the key is to just take some topics and run with them.

Ganas: I really do think that there’s enough people in SD who want to be out there, exploring and learning new things. The key here will be convincing others that they are interested, and addressing what’s in it for them. Still some work to sort out here.

Yeah - those second two are fairly lacking for concretes, so I’ll have to come back to those at a later point.

Fundamentally, I think it’s about creating a learning community (a la learning organizations). I want people to feel safe and encouraged to explore and improve. Yeah, it helps me, but in the end it also is the right thing to do.

12 Jul 2014, 23:44

OSCON 2014 - Full Stack Development

In other news, I’m giving a tutorial on the Go language at OSCON. This is a big one for me so hoping it goes well.

I'm Speaking at OSCON 2014

As part of the lead up to it, they sent a call out to the speakers asking us to comment on “full-stack development.” This was supposed to be submitted as a video, but I didn’t get to it in time. It was open ended, so not sure if it was supposed to be meaning or impact. So, here’s some quick thoughts on it.

Fundamentally, being able to address the full stack is about autonomy. Much has been done to show that people a driven by having feelings of autonomy, mastery, and purpose (see Daniel Pink). The idea is that if we feel like we are able to own the work, we are more motivated to complete it. I think we can all think of personal projects which we’ve been caught up in and put in the extra effort on them. Part (yeah, only part) of what drives us is that we get to set the terms of the project - how it’s conceived, how it’s developed, and how it’s delivered.

In many organizations, much of this work has been given to specialized functional units so it’s harder to have ownership in those sections. Thinking about he full stack let’s us insert ourselves into those parts of the project. Basically, being a full stack developer gives you more changes to get that autonomy back.

13 Feb 2014, 01:20

What does The Cloud(tm) mean?

The Cloud ™ - it’s a term that is far too encompassing of too many concepts.

At first, I thought the problem with describing it was that it was like the image of 10 blind people trying to say what an elephant was by each describing the one part they could feel. The more I think about it, that doesn.t even do it. The focus of that description is all about the “physical” description, but we’ve ascribed so much more into what we think of as The Cloud ™. Not only do we talk about what it is, but also what it can do, and what it can allow others to do. It’d be the same as trying to describe how an elephant herd interacts, or how the use of domesticated elephants affected agriculture or helped win a war.

In short, it’s impact is just as and probably more important than just what it is. So, let’s look at both of those in turn.

Physically, the cloud is a combination of the multiple *aaSes that exist, but largely focused on Software, Infrastructure, and Platform. Disclosure: In my realm, I end up interacting with the latter two, so this is largely concerned with those. To be clear, I say Infrastructure-aaS and mean any product which provides an abstraction of compute, storage and networking, which allows a user to obtain resources in a low latency (ideally sub-minutes given with self-service and API interfaces) SLA. PaaS is similar to the above but focuses on the application container (e.g. servlet engine, dynamic web server backend, database) instead of infrastructure components. The Cloud ™ can be public or private, it can be outsourced or internal, and it can even be service organizations in addition to true services.

We add confusion because all of these are “physical” descriptions, and so we tend to first compare on that level. Many look at The Cloud ™ as a single solution (most of the time, it’s AWS, but it can also happen on the other side with internal solutions). But really, we want to agree on what aspects of those solutions are important and the trade off that those require.

So what are those aspects? What can The Cloud ™ enable? Well, in not particular order, and definitely not complete:

  • It can be a cash flow offset. It allows you to focus on leveled burn (operational expenses) rather than big bang spends with depreciation (capital expenditures). How much this matters depends how your corporate finances are structured.

  • It can provide dynamic resource commitments. You can purchase resources for short term usages. The dynamic capability leads to a need for rapidly providing and taking those away. How much this matters depends on your duty cycle, your bursts, and what margins are like with the provider.

  • It can provide rapid global ramp up of resources. From the last point, where you get those dynamic resources, you can choose where they go. How much this matters depends on your ability to configure those resources rapidly and the global properties of your application, as well as the provider capabilities (e.g. points of presence).

  • It can be automation point. Not talk about The Cloud ™ can happen without some aspect of automation. Every cloud is built upon it. Every interaction asks the question “how can we automate it?” How much this matters - well, it just matters. Your ability to execute on this drives how helpful it is.

  • It can change the semantics of application deployments. You move from talking about a build of an application or code package, and towards building (at least for now) machine images or container images (with application and dependent code inside). How much this matters depends on how you do your application configuration.

  • It can change the semantics of host and system management. You move from talking about individual hosts to talking about abstract roles or clusters. See Pets and Cattle.

  • It can provide you a way to level your production. If you’re not familiar with Heijunka, it’s a way to smooth out the flow of invetory through the delivery pipeline. Virtual environments enable you to provide the just the right resources just in time, by taking larger undifferentiated resources and honing them into what you need. Previously, you had to be very targeted and keep a lot of pre-differentiated products that can be used when need be. This leveling helps speed everything up without keeping around too much inventory. How much this matters depends on how many different resource types you really need, and how much overhead you’re willing to take.

  • It can let users take care of themselves. It can provide self-service in very structured ways. You can replace people and teams and service catalogs with APIs. Replace is probably the wrong word as someone or something needs to handle the underlying infrastructure of the service, and the service itself becomes a very codified service catalog. How much this matters depends on the level of responsibility being expected and accepted by the service users.

  • It can transfer work and risk to a third party. You can outsource what you deem to be noncritical and/or commoditized aspects of your business to others. The funny thing about risk is that it is rarely actually transferred. How much this matters depends on how tolerant of risk you are, how much you can negotiate, and how well you can handle this internally.

Ultimately, it’s a matter of gaining some level of real or perceived efficiency. That efficiency can come in the form of economic (as in using for bursting, or cash flow changes), or in the form of faster changes, or in the form of shifting responsibilities, or probably others.

A lot of the above can be achieved without using The Cloud ™, and many of the aspects run counter to each other (e.g. virtualization overhead versus flexibility). All in all that makes it impossible to say that The Cloud ™ is goal. The goal is ultimately to make money, but the question is which aspect(s) of The Cloud ™ do the best to get you that?

26 Jan 2014, 22:00

Presenting at CascadiaIT 2014

Hey Kids! Big News!

I’ll be presenting at CascadiaIT 2014! I’ve got two items:

Registration is now open:

http://casitconf.org/casitconf14/registration-is-now-open/

So, if you’re anywhere near the Seattle area the second weekend in March, come on down to CascadiaIT!

11 Jan 2014, 21:51

More than just Pets and Cattle

wIt’s been said many times many ways that cloud) servers should be treated like cattle, and not like pets. Looks like the first reference is Bias, but there are quite a few others: here here here here just the top ones on a google search. The main idea being that we had this tendency when the servers were fewer yet more longitudinal to treat them delicately: putting care and feeding into each of them; now that we (can) have large amounts of short lived instances, we can’t be bothered with the same care.

That’s a completely valid way of thinking (it’s a great place to be), so I’m curious as to where its limits are. In some ways, looking at just servers that way is looking at a point in time and capabilities and thought.

We’ve all had pet files. Remember that hand crafted config file that you spent days of your life tweaking to get it just right? Maybe it was specific to that host. At some point, you groomed it enough that it became a golden file for your entire environment and you could copy it and push it out to all of the other servers. Then you pushed it out using some higher level config management system. Then you moved up some semantic level and the file itself got abstracted into specific resources, and those were composited and pushed out. So, files started as pets, and by realizing that the file was only a model of something that we actually cared about, they moved to cattle.

Really, pets are pets because you’ve become attached to them - you can’t clone them, and it hurts to lose them. Cattle is cattle because it’s easy to get another and it’s not a big deal if you lose it. There’s a lot of different specific means to achieve these, but it’s these two fundamental classes of properties that enable this thinking:

1.) It’s easy to copy, and 2.) It’s easy to handle losing it (enter whatever you want to say about antifrigilness here).

But thinking about files and servers is so the 2000-noughts. What are our pets now?

Moving up from the server, is the cluster. Are clusters now the new pets? or can we treat them as cattle as well? Given sufficiently large IaaS services and strong configuration management systems and lots of variable substitution (well, probably more like locally realized global patterns), it’s actually fairly easy to fulfill property #1 above - copying. As for #2, if you have sufficient global load balancing of any form (DNS, anycast, etc), you can easily route traffic to working clusters, or more precisely, away from failing (lost) clusters.

So, pulling further out, our clusters collapse into a service. Is that our new pet? With even more config and *aaS and some client service discovery (aka any sufficiently advanced delivery model), you can certainly copy it. Though, if you lose your source code, it would definitely take a bit to reproduce the service (get all those coders together again, etc). What about losing it? Well, if you are a single feature service inside of a larger service, you might be able to be disabled, so you can lose it. But what about that larger service? I think for most businesses, you can’t just lose it.

So, that’s your pet.

Maybe.

(One could examine businesses and business models and plans and use the same comparisons, but I think this first point - what makes something pet versus cattle across various object domains is copying and dealing with lose - is done well enough, so my second point…).

There’s another way to slice (heh) this metaphor: milk. Not all cattle is used for steak. Some cattle is used to produce a product, bulked up again, then produce more of the same product. That cycle time might be a little too short, so the metaphor might make a little more sense by using different livestock - sheep. Some sheep are raise for mutton, some sheep are raise for wool (and yes, you can do both, but still). For the wool sheep, after the wool is reaped each year, you have to let it grow out again before you can reap it again, all the while caring for the sheep. The sheep itself stays around, but you continue to reuse it.

That being said, you can use other sheep for the same purpose because lots of wool is the same; and sheep have their own way to easily copy each other well enough.

But you still don’t really want to lose a sheep. You still gotta deal with it going away and getting the replacement there. The same really applies to larger services (or businesses) - maybe you can copy it, but you really don’t want to deal with it going away.

So, my second point is really that there’s a third category between pets (hard to copy, hard to deal with loss) and (steak) cattle (easy to copy, easy to deal with loss), and that’s of the milk cattle (easy to copy, but still hard to deal with lose). This last category by its very nature persists and is modified, rather than being destroyed and rebuilt each time. All of those things that we had to think about for when we wanted to change our pets still apply. Maybe it’s not to servers, but the lessons learned are still valuable.

And lastly, not everyone is there. And not everyone who is there is there for everything that they do (there’s probably a mix of services made of cattle and services made of pets in a lot of organizations). So don’t feel bad. Just figure out which one it should be and work to improve.

PS Interesting enough, if we do the combinations of the above, there’s the last class: a service which is hard to copy, but you can deal with failure. I’m not really sure what that looks like, so I’m going to leave it as an exercise for the reader. I’d be curious if anyone comes up with something interesting. Contact me.

23 Mar 2013, 12:14

DNS: Inverting the View

This is something that I’ve thought for a while, but am trying to get the backlog of thoughts out.

I’ll admit that I run bind for DNS because it’s a “safe” default. There’s many cons against it, but it does work well enough for many-many situations. But one of the issues that I’ve run into is that how views are managed are counter to how I would like them to be managed 99% of the time.

Specifically, the Zones are children of Views and when working with the underlying configuration files, you have to maintain two separate sets of files (or point to the same files and have the same data). I want it to be the opposite to some degree. I want to be able to maintain one zone file which mixes views with some markup on the records themselves.

This isn’t necessarily a bind thing; though, bind certainly sets the stage for others. It matches most DNS servers because it matches how the mental model of DNS Zone delegation goes, but that may not be how the mental model of record management in the context of views should be. There may also be technical limitation, if you’re mixing zone delegation of children zones (e.g. record bad in zone child.example.com) in one view and child records that existing directly . But honestly, in those cases, I think it goes back to the “that’s an antipattern that you really shouldn’t be doing even when you think you should be doing” that gave rise to linting programs, so I’m going to discount that a bit.

The key here is that when I work on records that are split DNS, I’m doing it on a per-record basis. I have to ask the question when I’m looking at the record “where should this be visible from?” In the context of the per-record, it’s much easier to answer. When I look at an entire zone, it’s a PITA to say “where should all of these be visibile from?”

Another side effect we end up with is that due to this extra overhead of managing the split zones, we end up with some intenal.example.com zone. That “internal” starts polluting everything. Yes, it’s ugly, but the problem is more than just aesthetics. If you want to move a host from one side to the other, or have a host respond to both sides, or have any SSL certificates anywhere, or not expose that you have a secret “internal” domain, or avoid any of a whole host of problems because DNS is so critical to how we run networks, then, well, then you’re sunk.

Again, this isn’t necessarily limited to bind, and honestly, how bind or any other DNS server implements it doesn’t really matter. The DNS server should be responsible for serving out queries. A separate service should be responsible for the management of DNS. Everyone wants a nice pretty (for whatever your definition of pretty is) administrative interface. That is something that is separate and builds on top of it. Realistically, the admin interface can be very different from the configuration files. That’s part of the whole point, isn’t it? The semantics and languaged used by the admin interface get mapped into different configuration formats. If you want to add a new DNS service, all you have to do is figure out the mapping. The biggest mistake we make sometimes is directly implementing the configuration format of the specific DNS service back in the administrative interface. While it’d be nice if they match up, it’s more critical to have the administrative interface match the model that you believe is the most appropraite for the people using it, and sometimes that doesn’t match the configuration files.

Along those lines, although I like bind from a practical standpoint, I also like the make of tinydns, largely because it does this well. In tinydns configuration, you name your views:

    %internal:10
    %external:

Then you can follow up with associating the actual records with each view:

    +www.example.com:1.2.3.4:::external
    +www.example.com:10.11.12.13:::internal

It’s all in one nice neat location to look at.

There’s three extensions that I’d make here:

  1. The location part should be a tag - so you can have multiple locations specified in one line. Yes, this is an edge case since most people only deal with two views (internal and external), but it does have meaning if you have more than 2 views. For instance, if you have public, sitea, and siteb, and don’t want “bothsites” to be public:

    +www.example.com:1.2.3.4:::external
    +www.example.com:10.11.12.13:::sitea,siteb
    +bothsites.example.com:10.11.12.14:::sitea,siteb
    
  2. The location indicator for tinydns is really an indicator of the view that you want it to be present in, for whatever properties, of which clientip/location is only one, apply to the view. Basically, I’m arguing that tiny-dns should support equivalents to the match-destinations parameters of bind*.

  3. Add some magic for GeoDNS and let it be its own “location” tag. This is some syntactic sugar that makes it easier to do since it’s so common today and interacts interestingly with views.

* Actually, I’m not arguing that tinsdns should support that, just like I’m not saying that bind needs to support per-record view attributes. When it comes down to it, if the proper way for tinydns to handle the match-destionations parameter is to run multiple tinsdns instances bound to different destinations, that’s perfectly fine. It’s the administrative wrappers or management interface around the DNS service itself that needs to support this semantic, and then map that out to the DNS service.

19 Mar 2013, 00:20

Firewall Rules from Models

I’m trying to put a little more meat to the bones of my ANCL discussion. Unfortunately, I can’t say that I have a tool for it all, yet, but there’s a bit more of the theoretical basis for it. Some of the key requirements forming around ANCL are:

  • It abstracts the components involved - roles are used instead of source IPs and destionation IPs.

  • It generalizes the communication patterns - models are built using the roles; these models can be used in different locations by identifying what composes the roles.

  • It can compose multiple models together - since the models are composed of roles, using the same role in multiple models connects them together (caveat in the second side note).

I was reviewing some slidedecks and came across Ript which has a good presentation. It is a powerful abstraction over iptables. Bonus: It’s abstraction can even be carried to other stateful packet filters, software or hardware.

Comparing it to ANCL, there’s several key differences:

  • A partition is not a connection of models, but a combination of them. It’s an administrative domain which lumps multiple unrelated connections together. This is by far the biggest difference and is a matter of the mental models.

  • The labels aren’t the same as roles. They’re good for identifying, but they aren’t then instantiated. There’s a single mapping, when multiple would be useful. This seems like an implementation detail that could be easily addressed.

  • It’s focus is on specific instantiations of the models. While the rules would be portable to different devices, this limits its ability to be ported to separate but similar situations. This seems like it is a matter of changing conventions (e.g. the naming of the labels), but might be more.

The first item is what it really comes down to is that these are two different mental and description models for the problem. Ript is a firewall abstraction language, and not necessarily an application communication description language. That being said, there are a great many other pieces of Ript, not the least of it being something concrete, that make it very useful and means I have some work here to move ANCL to something that can compare.


There’s also two parts that Ript has also incorporated which I’ll be honest that I don’t know how to incorporate into ANCL:

  • NATs and SNATs and other translations. In many ways, these are not critical to the application communication pattern, at least, not critical at an abstract level. It’s when it’s applied in specific contexts where translations come into play.

  • “Bad Guy” Rejects. Like transactions, these aren’t critical to application communication patterns - in fact, these are anti-communication patterns. Necessary at times, but not something that are accounted for when building the patterns.


The side note from above is that the roles aren’t completely what are connected to each other. Consider two models:

  1. A three tier application model which has a “webserver” role, an “application” role, and a “DB” role.

  2. A DB model which has an “application” role and a “DB” role.

As the application owner of the first model, I would elaborate the nodes in the “application” role. Most likely, the “DB” would be specified by the DBA. Conversely, as the DB owner, I’d elaborate the nodes in the “DB” role, and leave the “application” role to someone else. In either case, half of the equation is left unfilled. There’s (at least) two ways to approach this:

  1. Since the key is to connect the two edges for the “application”-“DB” together, the real role association happens there. So, what language should be used to describe these “edge roles”?

  2. The ambiguity can be taken away if we insert “connection points” or “service” points into each model. In this case, in the application model, we replace the “DB” role with a “DB” connection point, and in the DB model, we replace the “application” role with a “DB” connection point. When joining these models together, we overlay the connection points.

Not sure which is the right way, so that’ll probably comes down to implementation.

13 Mar 2013, 20:38

A tale of two PaaSes

I spend a good amount of time trying to figure out if my operational team can do much to make the general engineering efforts more productive. We’ve followed the usual turns around self-service IaaS and the like, and we’re now exploring the next level of Platforms-as-a-Service. In exploring the options, I’m seeing two large patterns.

On one hand, there are the “middleware centric and injection based” PaaS models. These are the ones where the developer picks a development middleware (Java Servlet, PHP, Node, Rails, etc), and adds other parts in. As if by an after thought, a static file service is added, or maybe a data persistance (i.e. database) service is added. On a implementation level, these usually involve allocating some compute and storage resource (e.g. a VM), installing the middleware container, doing a baseline install of the add-ons, and starting them all up inside that VM. There are some other configuration items such as pointing it at some version control repository, but also the developer is able to login onto the VM via shell.

On another hand, there’s a “service focused” PaaS model. This feels like the lesser named PaaS, though it probably has a larger install because this is the model that AWS largely is. In this case, the developer picks different service components (e.g. DB, cache, messaging bus, etc) and composites them a bit more independently. Underneath the control layer, each of the component providers can implement their services in different ways - using different VMs, processes, or internal containers (e.g. DB schemas w/ authnz) - based on what makes sense for that provider. There’s more work for the developer here as they have to compose services across different providers, and the developer doesn’t have direct access to the underlying system, but in exchange, might have better options.

From an implementor’s perspective, I think the service focused model is easier to maintain. This may not necessarily be the right reason to go down that route, but when it comes to delivery, that matters a lot. It’s also a bit more transportable - at this point in the industry’s lifecycle, it’d be easier to migrate from one IaaS or traditional Infrastructure to another. It’s also easier to extend this model to other (traditional?) services such as monitoring. You can see this in the industry - there’s many different service providers focusing on a narrow niche offering around one specific service, but fewer middleware centric vendors and even those that exist tend to also include some service based model for the add-ons.

As I said, most of the traditional services called PaaS are the form. So, what makes the application middleware so much different than a data store? or a caching layer? Fundamentally, you have some level of “service” which you want to present a clean interface to. This is true for the database as well as for a java servlet container, yet somehow we treat them a little differently in our heads. The only reason I can imagine is that is where time is spent. As a developer, I spend most of my time in the code, so that’s where my mind goes. But while I run it, I want to have a better idea of how it fits in with the other component services.

I think the vote is still out on which way has better long term viability. And it may never be decided on. It may just be a matter of preference.

Maybe PaaS isn’t the right term for this second model. These are more Services-as-a-Service, which seems likely to be a great way to confuse people. Mmaybe they’re more along the lines of Infrastructure, and are just a different take on that. I’ll admit, I’m not sure what the right way to refer to them is, but I believe that the use case that they present is more than just an implementor’s fancy. It’s a valid use case based on how developers are expecting to work with it.

02 Oct 2012, 23:52

Monitoring Discipline 1: Meter

In the last post, I talked about 4 different monitoring disciplines. In this, I’m talking about the first one: Meters; and try to distinguish some common Meter patterns.

I pick Meter for the first one for two reasons:

  • It is the first one in the pipeline. It tends to be the closest to the feature being monitored.
  • It tends to be the mechanism that I (and I believe many others) think about first.

The Meter is the basis of measurement. Whether it’s counting the number of bytes going through a network interface, a recording of events generated from snmp traps, a log watcher, or a stream of metrics coming out of an application, the Meter is any item that can be measured or status checked and the path for sending the measurements to consumers who are interested in them. These are two fundamental and distinct sides. In earlier days, they were typically intermingled in one feature or tool, but now technologies allow for different distinctions between those.

There are two main categories of data types for the Meters: numeric, and event based. The primary difference for these is distinguishing between looking at a specific value (or an aggregate or nonvalue items), and looking at a specific event. Numeric values have a continuous meaning - the read at time t will have a different but just as significant meaning than the reading at time t+1. Events on the other hand are not discernable in time - if I receive an event at time t, I may not know the state of the system at t+1 (sometimes I can do another poll, sometimes I can’t).

The Meter can use any number of mechanisms:

  • SNMP Poll.
  • JMX Poll.
  • SNMP Trap.
  • Reviewing application logs and counting ERROR lines.
  • An application that exports mechanisms as a stream to some endpoint.
  • An application that sends events to a collector via syslog, a message queue or bus or chat protocol, an HTTP post to a web service, a generic mailbox mechanism.
  • An application that sends events via a message queue or bus to a collector.
  • A status/transactional check that grabs a web page and checks it for validity.

We tend to want to turn Meters into a push versus pull argument, but it ends up being a false dicotomy. In many cases it depends on the perspective. An application which uses the metrics will feel as it it’s pushing those data points out, but the collection engine that uses it may pull those data points via JMX calls.

The key with Meter is identifying What you want to be watching. Everything else is plumbing (as if saying that makes it all easier).

Note: While it can’t be discounted, the storage engine is not necessarily a part of the Meter. In many cases, the storage could also be used by the other disciplines. It’s capabilities impact Meters, but it is a technology area more than an area of monitoring work.