20 Feb 2017, 12:37

ancl connecting multiple models

One use case not previous documented is the one of connecting two models together. This can be used to cover some of the combinations of the same model applied to a node in different ways and node in multiple models and self-referential. In our previous models, we fully connected everything or made it orthogonal (“node in multiple models”). The key difference here is that all of the previous ones don’t fundamental change the model world - it’s a matter of making sure that a single model at a time is applied to the nodes.

This is a matter of connecting two models in a way that could be seen as making a new unified model.

Note: below, I’m going to be using two new notations: a. There’s an added yaml dictionary level for the model name. b. Some of the model details (specifically: target ingresses) may be purposefully left out.

The simple example of an app talking to Cassandra:

    egress: [[app,appapi]]
    ingress: {}
    egress: [[db,binary]]
      appapi: [8009,8009,"tcp"]
    egress: []
      binary: [9042,9042,"tcp"]
    egress: [[db,binary]]
    ingress: {}
    - [server,plain-gossip]
    - [remote-server,encrypted-gossip]
      binary: [9042,9042,"tcp"]
      plain-gossip: [7000,7000,"tcp"]
      encrypted-gossip: [7001,7001,"tcp"]
    - [local-server,encrypted-gossip]
      encrypted-gossip: [7001,7001,"tcp"]

appmodel and cassandra

One way to handle this is to not. Basically, avoid some of the overlaps here - e.g. lose the db role inside of the appmodel. Make it look like:

no overlap

Then you could say for role assignments:

appnode01: [appmodel::app,cassandra::client]
cassadranode01: [cassandra::local-server]

While this works, it doesn’t seem like a good idea. Without the additional role assignment context, you don’t know that the app node requires DB access. The model really is incomplete.

Another approach is to be able to link the two models - to say that nodes in one model are equivalent to nodes in another model.


There’s a couple of ways to do this.

  1. Connect the edge: The overlap of edges is to recognize that appmodel::app----binary---->appmodel::db::binary is the same as cassandra::client----binary---->cassandra::local-server::binary
  2. Connect the node pair: Similar but just say appmodel::app == cassandra::client and appmodel::db == cassandra::local-server

The first might be more accurate assessment of what it actually is, but it is more verbose and I’m not sure it actually achieves any real difference. So, just going with this, a linkage might look like:

- [app,client]
- [db,local-server]

This doesn’t look like good yaml. I’m not sure every parser can handle it, so this might change. The key is to be able to descern the tuple.


The interesting part is that this can be implemented without fundamentally changing anything previous. The only piece to add is to extend the roles automatically - wherever there’s a linkage, add the linkage role equivalent to the node’s roles.

appnode01: [appmodel::app]
cassandra01: [cassandra::local-server]


appnode01: [appmodel::app,cassandra::client]
cassandra01: [cassandra::local-server,appmodel::db]

Inferred model roles

Above, the db::binary and local-server::binary are redundant: the port information is defined twice. In reality, the appmodel is more concerned about the app itself and less concerned about the specifics of the DB; it just cares that there is one. In the case of knowing that there will be linkage, you could consider stubbing out the role that is going to be the crux of the linkage. This might look like:

    egress: []
      binary: false

From here, it says “there’s a db role, but it needs a linkage to make it concrete.”

Given that the stub isn’t really saying much that isn’t already implied by the egress on the “app” role, it’s possible to not even define the db role - not even having a “db” dictionary key - and inferring that there’s a “db” role. Similarly, since the “client” role only has the egress which is references elsewhere, it’s possible to infer that as well.

From an automatic aspect, this means that the role expansion could be limited. However, since both sides are likely to have something stubbed out or inferred, it could be easy for this to fall into the trap of not expanding either side and we’re back to step one. So, one side has to remain. Since the realizations are going to happen from the target side, I think it makes sense to keep the source side roles. This means the above roles would look like:

appnode01: [appmodel::app,cassandra::client]
cassandranode01: [cassandra::local-server]

The added bonus of the inferred roles is that they can be automatically replaced when processing. This might make realizing the configuration output a little bit easier. E.g. don’t end up with many incoming policies for the cassandra::local-server connection; end up with one (since all of the inferred target roles are not expanded) with multiple sources (those roles are expanded) for each linked context/model.

Self-referential update

This can be used to connect two cassandra data centers together (dc1,dc2). The linkage:

- [local-server,remote-server]
- [remote-server,local-server]

This is basically saying that in the dc1::cassandra context, dc2::cassandra::local-server is a remote-server, and vice-versa. So node assignments go from:

cassandra-dc01: [dc1::cassandra::local-server]
cassandra-dc02: [dc2::cassandra::local-server]


cassandra-dc01: [dc1::cassandra::local-server,dc2::cassandra::remote-server]
cassandra-dc02: [dc2::cassandra::local-server,dc1::cassandra::remote-server]

This seems to work since the linkage doesn’t create any redundant roles in this use case. That may fall down if there is, but I’m not seeing how that would come up just yet.

Not done yet?

The downside of this is that since no new actual model is defined, contexts aren’t automatically done. So, the above works for what it is, but the linkage can’t necessarily be promoted. It can be, but that might be overdoing what should be done. This has some impact on the linkage of models vs linkage of context+models.

So, I have to go model a bunch of stuff at work and see if this can express in a simple way everything that I need there.

18 Feb 2017, 15:03

ANCL: Use Cases

I had a chance recently to revisit ANCL in two ways. I recently had to compare some firewall rules between two different firewalls that were setup to mirror each other. The original firewall was not set up using any modeling of firewall rules so very much fell under the issues that I originally commented on. Lessons learned:

  1. It’s much easier to collect/reason about the communication pattern when you look at it from the “what do I need to talk to” perspective instead of the “what talks to me” perspective. People are blocked when their downstreams aren’t working and so have a bit more motivation to make sure those are well described.
  2. Even JSON is a bit more verbose than that I wanted to deal with when working on the rules in mass.
  3. To make it happen, I simplified and didn’t attempt to build out any kind of hierarchy or dependency. Even though some would stem from the same model, I manually created the instantiation of those models. I don’t have a good approach for this yet, but working through the concrete example gave me a better understanding.
  4. It’s easy to use IP address in the context of firewalls, and you can overlap anything that has that address or containing CIDR.
  5. Naming is hard (1): There seems to be a bit of redundancy with model roles. If I have service, what do I specify for clients and what do I specify for the port which the clients connect to? In both cases, I want to use “client”
  6. Naming is hard (2): It’s still not clear to me what to use to describe the generic descriptions (e.g. models), the components in those descriptions (e.g. roles), and the instances of items in those roles (e.g. nodes?). I keep using the term roles in the place of the nodes - I think.

Separately from the firewall, I’ve been looking at using this to help figure out overall communication matters. I’m trying to bridge together different applications running by different groups and using different interconnect mechanisms. I need to get quality and slot information for what’s talking to what. That’s got me thinking. A few more items to postulate:

  1. Mental exercise: How does routing information play into all of this? Does different routing affect how the models are structured?
  2. Mental exercise: Can I use the models to influence aggregation and reporting on netflow data. Each netflow entry could be associated with a specific model which gives a lot more context than protocol, port, and subnet which ends up being the bulk of what I usually see?
  3. Mental exercise: What’s it looks like to add additional information to each model? Not just “443/tcp” but also “100Mbps”, “TLS”, and “client certificate required”?
  4. Mental exercise: In the first model, I associated roles to specific IPs. What’s it look like when instead of IPs, I use AWS instance IDs, or AWS security group IDs, or container IDs, processes, etc?

So, there’s a lot more interesting stuff beyond just the firewall, and it’ll be interesting to see what comes up. But I still worry about the complexity, so I want to figure out ways to reduce that complexity.

The first one is to not have specific models (“this application’s Oracle DB”) for everything and to be able to use more generic models (“Oracle DB communication”). This means having an ability to reference a model. I’m still not sure how to do that. So, I’m trying to take a step back and come up with some use cases to help noodle through this. So with that in mind, the remainder of this is about examining that. I’m not committing to anything so you’ll see possibly a few implementations below.

Use Cases

Simple 3 Tier Application

This is your classic three tier application.


A sample general model could look like:

  - [web,webapi]
  ingress: []
  - [app,appapi]
    webapi: [443,443,"tcp"]
  - [db,sqlnet]
    appapi: [8009,8009,"tcp"]
  egress: []
    sqlnet: [1521,1521,"tcp"]

Shared DB

This is the case of the same model being applied in two different context with one overlapping resource. The example is a shared DB resource (here shared between dev and prod, but probably shared across multiple DB)

prod/dev share db

A fully expanded model could look like:

  - [db,sqlnet]
  ingress: {}
  - [db,sqlnet]
  ingress: {}
  egress: []
    sqlnet: [1521,1521,"tcp"]

However, in reality, there’s a base model which looks like just:

  - [db,sqlnet]
  ingress: {}
  egress: []
    sqlnet: [1521,1521,"tcp"]

The question is really about how to relate multiples together. Looking at roles:

prod-app: ["prod::app"]
dev-app: ["dev::app"]
db: ["prod::db","dev::db"]

This works in this simple example, but I’m not sure it covers everything (see below).

Same model applied to node as two different roles

This is one that masks quite a bit so it’s not clear what the perfect setup is. The simple case is that there’s a DB that serves sqlnet, but in turn also connects to other DBs using sqlnet (e.g. replication).

main-db -> ro1-db -> ro2-db

This could looks like:

  - [db-server,sqlnet]
  ingress: {}
  egress: []
    sqlnet: [1521,1521,"tcp"]

main-db: [main2ro1::db-server]
ro1-db: [main2ro1::db-client,ro12ro2::db-server]
ro2-db: [ro12ro2::db-client]

The “db-server” and “db-client” part feels a bit weird. I kinda want to just have “server” and “client” but then feel like I need another name hierarchy - e.g. “db::client” and “db::server” - so the roles would look like:

main-db: [main2ro1::db::server]
ro1-db: [main2ro1::db::client,ro12ro2::db::server]
ro2-db: [ro12ro2::db::client]

This looks ok, but there’s two concern for me:

  1. How many things are “client” or “server” so would there be a way to simplify that?
  2. Having to have a context for all of the directed pairings seems a bit overdone. Is there a way to simplify that?

The latter concerns me more. Maybe not using the pairwise, and looking at the context to be a bit more on the node (in this case) itself:

main-db: [main::db::server]
ro1-db: [main::db::client, ro1::db::server]
ro2-db: [ro1::db::client]

Node as Multiple models

This is the case of having a node participate in multiple models. The example is that the node is part of its main role (app or db), but it’s also being monitored and logged into (so, “adminee” controlled by “adminbox”).

(insert app/db models above)
    ssh: [22,22,"tcp"]
    snmp: [161,161,"udp"]
  egress: []
  ingress: {}
    - [adminee,ssh]
    - [adminee,snmp]

With an example of multiple roles put together:

prod-app: ["prod::app","adminee"]

Overlapping attributes

This is more of “if there’s overlapping attributes it needs to get the roles of any roles that match that attribute.” Simple example of overlapping IP addresses/CIDRs:

"": [adminbox]
"": [adminee]

In this case, has both [adminbox,adminee]

Self Referential

Some models are a bit self referential. Nodes of the same role will talk to each other (cluster members). Nodes of the corresponding role (cluster members in differnt subsections of the cluster) will talk to each other in another way. The post child for this is Cassandra:

Cassandra Fun

So, a model might look like:

  - [server,binary]
  ingress: {}
  - [server,plain-gossip]
  - [remote-server,encrypted-gossip]
    binary: [9042,9042,"tcp"]
    plain-gossip: [7000,7000,"tcp"]
    encrypted-gossip: [7001,7001,"tcp"]
  - [local-server,encrypted-gossip]
    encrypted-gossip: [7001,7001,"tcp"]

And the roles might look like:

app-dc1: [dc1::cassandra::client]
app-db2: [dc2::cassandra::client]
cass-dc1: [dc1::cassandra::local-server,dc2::cassandra::remote-server]
cass-dc2: [dc2::cassandra::local-server,dc1::cassandra::remote-server]

I’m actually surprised by this model. It seems to be one of the cleanest but it’s also pretty complex. Feels like a trap but I’m not seeing it yet.

Uh… distinct items?

I’m having trouble describing this one, and a bit about reasoning about it.

The general idea is that there are cases where you need to have a general pattern, but replicated a lot of times with specific contexts. The simply example would be to have 30 nodes - each of which have a self referential pattern that only refers to them. This is kinda like the Cassandra situation with the subtle distinction each Cassandra node talks to all other Cassandra nodes and in this case each node would only talk to itself. Effectively each node is its own context for a role (for as ugly as that sounds) that follows the pattern.

There’s two practical answers for this right now:

  1. Since it’s self-referential, it actually is unlikely to be needed to be defined (most people can talk to themselves and processes are probably listening on localhost anyways - which has overlapping IP space and thar be dragons with trying to reason down that one right now).
  2. You can enumerate each as a separate context - this seems like a workaround, but it at least allows for it, just not efficiently.

So, that may be enough of a starting point.


I think that’s enough for now. Definitely something to help ponder through all of this…

08 May 2016, 22:22

X-Forwarded-For: You keep using that word...

I recently had a discussion around the X-Forwarded-For header and common usage. This wasn’t the first time I’ve had the discussion, and probably won’t be the last. I’m going to jot down some thoughts for future me and future others.

For the record, this is the perspective from running online services - so the focus is on the incoming requests.

tl;dr: X-Forwarded-For is not as standard as people believe, and even where it is standard, it’s not the standard that people think it is. And don’t use it for server-to-server calls - just for proxies.


I have to ask three questions every time I see it in use:

  1. Can I believe that the client (or middleman) was truthful about the value?
  2. Can I believe that nothing in the middle messed up the handling of the value?
  3. Is the service making the call in just a “proxy”?

For the first one, let’s go back to the first rule of internet fight club - don’t trust anything from the client. This can easily be spoofed, so I have to remember to strip it at my front door.

As for the second question, let’s just say that the chances are not insignificant that something isn’t handling it correctly. The current RFCs for HTTP 1.1 (2616 and the new proposal 7230) both allow for multiple headers as long as the header value is a list:

A sender MUST NOT generate multiple header fields with the same field
name in a message unless either the entire field value for that
header field is defined as a comma-separated list...


Most make the assumption that it’s on one line, and it’s comma separated (and I even had a case where some assumed it was a single value). The truth of the matter is that there are plenty of bugs in well know projects which don’t handle this correctly. There are more bugs in internal code.

This second question might have been handled by rfc7239, but it seems to contradict rfc7230. On the one hand, it’s header format is no longer just a list; on the other hand, rfc7239 explicitly permits (CAN) multiple headers. So, my vote is still out here.

So, regarding the first two questions, at all of your inspection points (including the implicit ones which are easily overlooked), you have to make sure the XFF is being handled correctly. If not, it’s worse than useless; it’s dangerous.

The philosophical question…

The last question is a bit harder to noodle through. What does it mean to be “forwarded”? The context here is a lot of proxying of requests where it is expected that the proxy isn’t making a meaningful change to the request itself (adding some tracking headers, converting from HTTPS to HTTP, caching, etc).

This is different than the case where one service is calling another service. The source service isn’t really proxying the request as it’s making a new request on behalf of the client. Even in the newer RFCs, I haven’t found a clear definition of “proxy” so I don’t think there’s a formal answer.

This may be a subtle distinction, but the meaning has consequences for how you manage it. When doing controls, there usually can only be one source: one value that gets used to compare.

As a simple example, I only want requests from a specific geography to come in, and I’m servicing both clients and other services. I have to decide if I want that geo restriction to apply to the original client or the geo of the last connection, which could be a service. If I choose last connection, then I’m going to be shutting down a lot of clients in that geography because they are dependent on a service in another geography. If I choose clients, in addition to making sure the client info gets to me, I have to make sure that the services are good handling that split good/bad responses.

In the case of rate controls, I have to make the same decision, and that’s got its own issues that I probably want both - one rate control for end clients, and another loose rate control for partner services. Are you supposed to parse the XFF chain and figure out which came from which? Can you even apply some

This leads me to the mindset that XFF should only be used to show a source connection via proxy, and something else should be used for requests made as part of a services chain: “X-Requested-For” or something similar.

26 Feb 2016, 13:34

But I don't want multi-tenant networking

I’m in a bit of a conundrum at work.

It’s coming to the point where I need to put some formality around how everything talks to everything else. I’m merging three different network administrative domains (at least, there’s some partners that really are another curveball).

The question comes how - how do I bridge our internal network IP spaces?

I believe - and that’s a funny item to be looked at in a minute - that the principle of least surprise says that the network which engineers are wanting to use in the environment is one that is as flat as possible from the ip route perspective. It is the one where I can reach any other part of it (ignoring security policy) without having to think about it. When people ask “Can I get to that from here?” they usually are asking “Are there policy permits that let me get to there from here?” and not usually “Would a packet leaving me be routed in the right way to get there (and getting routes back)?”

Now, there is a new generation of engineers who are growing up “cloud native” and recognize that managing the IP space

This might just be me, so I should probably ask around… that’s a blog article on it’s own. Or holding this one…

Of course, there’s always IPv6. In theory, that solves everything. But that’s not a space that I’m going to see soon. Hopefully, I can be a midwife to usher that in, but that means handling both.

13 Feb 2014, 01:20

What does The Cloud(tm) mean?

The Cloud ™ - it’s a term that is far too encompassing of too many concepts.

At first, I thought the problem with describing it was that it was like the image of 10 blind people trying to say what an elephant was by each describing the one part they could feel. The more I think about it, that doesn.t even do it. The focus of that description is all about the “physical” description, but we’ve ascribed so much more into what we think of as The Cloud ™. Not only do we talk about what it is, but also what it can do, and what it can allow others to do. It’d be the same as trying to describe how an elephant herd interacts, or how the use of domesticated elephants affected agriculture or helped win a war.

In short, it’s impact is just as and probably more important than just what it is. So, let’s look at both of those in turn.

Physically, the cloud is a combination of the multiple *aaSes that exist, but largely focused on Software, Infrastructure, and Platform. Disclosure: In my realm, I end up interacting with the latter two, so this is largely concerned with those. To be clear, I say Infrastructure-aaS and mean any product which provides an abstraction of compute, storage and networking, which allows a user to obtain resources in a low latency (ideally sub-minutes given with self-service and API interfaces) SLA. PaaS is similar to the above but focuses on the application container (e.g. servlet engine, dynamic web server backend, database) instead of infrastructure components. The Cloud ™ can be public or private, it can be outsourced or internal, and it can even be service organizations in addition to true services.

We add confusion because all of these are “physical” descriptions, and so we tend to first compare on that level. Many look at The Cloud ™ as a single solution (most of the time, it’s AWS, but it can also happen on the other side with internal solutions). But really, we want to agree on what aspects of those solutions are important and the trade off that those require.

So what are those aspects? What can The Cloud ™ enable? Well, in not particular order, and definitely not complete:

  • It can be a cash flow offset. It allows you to focus on leveled burn (operational expenses) rather than big bang spends with depreciation (capital expenditures). How much this matters depends how your corporate finances are structured.

  • It can provide dynamic resource commitments. You can purchase resources for short term usages. The dynamic capability leads to a need for rapidly providing and taking those away. How much this matters depends on your duty cycle, your bursts, and what margins are like with the provider.

  • It can provide rapid global ramp up of resources. From the last point, where you get those dynamic resources, you can choose where they go. How much this matters depends on your ability to configure those resources rapidly and the global properties of your application, as well as the provider capabilities (e.g. points of presence).

  • It can be automation point. Not talk about The Cloud ™ can happen without some aspect of automation. Every cloud is built upon it. Every interaction asks the question “how can we automate it?” How much this matters - well, it just matters. Your ability to execute on this drives how helpful it is.

  • It can change the semantics of application deployments. You move from talking about a build of an application or code package, and towards building (at least for now) machine images or container images (with application and dependent code inside). How much this matters depends on how you do your application configuration.

  • It can change the semantics of host and system management. You move from talking about individual hosts to talking about abstract roles or clusters. See Pets and Cattle.

  • It can provide you a way to level your production. If you’re not familiar with Heijunka, it’s a way to smooth out the flow of invetory through the delivery pipeline. Virtual environments enable you to provide the just the right resources just in time, by taking larger undifferentiated resources and honing them into what you need. Previously, you had to be very targeted and keep a lot of pre-differentiated products that can be used when need be. This leveling helps speed everything up without keeping around too much inventory. How much this matters depends on how many different resource types you really need, and how much overhead you’re willing to take.

  • It can let users take care of themselves. It can provide self-service in very structured ways. You can replace people and teams and service catalogs with APIs. Replace is probably the wrong word as someone or something needs to handle the underlying infrastructure of the service, and the service itself becomes a very codified service catalog. How much this matters depends on the level of responsibility being expected and accepted by the service users.

  • It can transfer work and risk to a third party. You can outsource what you deem to be noncritical and/or commoditized aspects of your business to others. The funny thing about risk is that it is rarely actually transferred. How much this matters depends on how tolerant of risk you are, how much you can negotiate, and how well you can handle this internally.

Ultimately, it’s a matter of gaining some level of real or perceived efficiency. That efficiency can come in the form of economic (as in using for bursting, or cash flow changes), or in the form of faster changes, or in the form of shifting responsibilities, or probably others.

A lot of the above can be achieved without using The Cloud ™, and many of the aspects run counter to each other (e.g. virtualization overhead versus flexibility). All in all that makes it impossible to say that The Cloud ™ is goal. The goal is ultimately to make money, but the question is which aspect(s) of The Cloud ™ do the best to get you that?

26 Jan 2014, 22:00

Presenting at CascadiaIT 2014

Hey Kids! Big News!

I’ll be presenting at CascadiaIT 2014! I’ve got two items:

Registration is now open:


So, if you’re anywhere near the Seattle area the second weekend in March, come on down to CascadiaIT!

11 Jan 2014, 21:51

More than just Pets and Cattle

wIt’s been said many times many ways that cloud) servers should be treated like cattle, and not like pets. Looks like the first reference is Bias, but there are quite a few others: here here here here just the top ones on a google search. The main idea being that we had this tendency when the servers were fewer yet more longitudinal to treat them delicately: putting care and feeding into each of them; now that we (can) have large amounts of short lived instances, we can’t be bothered with the same care.

That’s a completely valid way of thinking (it’s a great place to be), so I’m curious as to where its limits are. In some ways, looking at just servers that way is looking at a point in time and capabilities and thought.

We’ve all had pet files. Remember that hand crafted config file that you spent days of your life tweaking to get it just right? Maybe it was specific to that host. At some point, you groomed it enough that it became a golden file for your entire environment and you could copy it and push it out to all of the other servers. Then you pushed it out using some higher level config management system. Then you moved up some semantic level and the file itself got abstracted into specific resources, and those were composited and pushed out. So, files started as pets, and by realizing that the file was only a model of something that we actually cared about, they moved to cattle.

Really, pets are pets because you’ve become attached to them - you can’t clone them, and it hurts to lose them. Cattle is cattle because it’s easy to get another and it’s not a big deal if you lose it. There’s a lot of different specific means to achieve these, but it’s these two fundamental classes of properties that enable this thinking:

1.) It’s easy to copy, and 2.) It’s easy to handle losing it (enter whatever you want to say about antifrigilness here).

But thinking about files and servers is so the 2000-noughts. What are our pets now?

Moving up from the server, is the cluster. Are clusters now the new pets? or can we treat them as cattle as well? Given sufficiently large IaaS services and strong configuration management systems and lots of variable substitution (well, probably more like locally realized global patterns), it’s actually fairly easy to fulfill property #1 above - copying. As for #2, if you have sufficient global load balancing of any form (DNS, anycast, etc), you can easily route traffic to working clusters, or more precisely, away from failing (lost) clusters.

So, pulling further out, our clusters collapse into a service. Is that our new pet? With even more config and *aaS and some client service discovery (aka any sufficiently advanced delivery model), you can certainly copy it. Though, if you lose your source code, it would definitely take a bit to reproduce the service (get all those coders together again, etc). What about losing it? Well, if you are a single feature service inside of a larger service, you might be able to be disabled, so you can lose it. But what about that larger service? I think for most businesses, you can’t just lose it.

So, that’s your pet.


(One could examine businesses and business models and plans and use the same comparisons, but I think this first point - what makes something pet versus cattle across various object domains is copying and dealing with lose - is done well enough, so my second point…).

There’s another way to slice (heh) this metaphor: milk. Not all cattle is used for steak. Some cattle is used to produce a product, bulked up again, then produce more of the same product. That cycle time might be a little too short, so the metaphor might make a little more sense by using different livestock - sheep. Some sheep are raise for mutton, some sheep are raise for wool (and yes, you can do both, but still). For the wool sheep, after the wool is reaped each year, you have to let it grow out again before you can reap it again, all the while caring for the sheep. The sheep itself stays around, but you continue to reuse it.

That being said, you can use other sheep for the same purpose because lots of wool is the same; and sheep have their own way to easily copy each other well enough.

But you still don’t really want to lose a sheep. You still gotta deal with it going away and getting the replacement there. The same really applies to larger services (or businesses) - maybe you can copy it, but you really don’t want to deal with it going away.

So, my second point is really that there’s a third category between pets (hard to copy, hard to deal with loss) and (steak) cattle (easy to copy, easy to deal with loss), and that’s of the milk cattle (easy to copy, but still hard to deal with lose). This last category by its very nature persists and is modified, rather than being destroyed and rebuilt each time. All of those things that we had to think about for when we wanted to change our pets still apply. Maybe it’s not to servers, but the lessons learned are still valuable.

And lastly, not everyone is there. And not everyone who is there is there for everything that they do (there’s probably a mix of services made of cattle and services made of pets in a lot of organizations). So don’t feel bad. Just figure out which one it should be and work to improve.

PS Interesting enough, if we do the combinations of the above, there’s the last class: a service which is hard to copy, but you can deal with failure. I’m not really sure what that looks like, so I’m going to leave it as an exercise for the reader. I’d be curious if anyone comes up with something interesting. Contact me.

23 Mar 2013, 12:14

DNS: Inverting the View

This is something that I’ve thought for a while, but am trying to get the backlog of thoughts out.

I’ll admit that I run bind for DNS because it’s a “safe” default. There’s many cons against it, but it does work well enough for many-many situations. But one of the issues that I’ve run into is that how views are managed are counter to how I would like them to be managed 99% of the time.

Specifically, the Zones are children of Views and when working with the underlying configuration files, you have to maintain two separate sets of files (or point to the same files and have the same data). I want it to be the opposite to some degree. I want to be able to maintain one zone file which mixes views with some markup on the records themselves.

This isn’t necessarily a bind thing; though, bind certainly sets the stage for others. It matches most DNS servers because it matches how the mental model of DNS Zone delegation goes, but that may not be how the mental model of record management in the context of views should be. There may also be technical limitation, if you’re mixing zone delegation of children zones (e.g. record bad in zone child.example.com) in one view and child records that existing directly . But honestly, in those cases, I think it goes back to the “that’s an antipattern that you really shouldn’t be doing even when you think you should be doing” that gave rise to linting programs, so I’m going to discount that a bit.

The key here is that when I work on records that are split DNS, I’m doing it on a per-record basis. I have to ask the question when I’m looking at the record “where should this be visible from?” In the context of the per-record, it’s much easier to answer. When I look at an entire zone, it’s a PITA to say “where should all of these be visibile from?”

Another side effect we end up with is that due to this extra overhead of managing the split zones, we end up with some intenal.example.com zone. That “internal” starts polluting everything. Yes, it’s ugly, but the problem is more than just aesthetics. If you want to move a host from one side to the other, or have a host respond to both sides, or have any SSL certificates anywhere, or not expose that you have a secret “internal” domain, or avoid any of a whole host of problems because DNS is so critical to how we run networks, then, well, then you’re sunk.

Again, this isn’t necessarily limited to bind, and honestly, how bind or any other DNS server implements it doesn’t really matter. The DNS server should be responsible for serving out queries. A separate service should be responsible for the management of DNS. Everyone wants a nice pretty (for whatever your definition of pretty is) administrative interface. That is something that is separate and builds on top of it. Realistically, the admin interface can be very different from the configuration files. That’s part of the whole point, isn’t it? The semantics and languaged used by the admin interface get mapped into different configuration formats. If you want to add a new DNS service, all you have to do is figure out the mapping. The biggest mistake we make sometimes is directly implementing the configuration format of the specific DNS service back in the administrative interface. While it’d be nice if they match up, it’s more critical to have the administrative interface match the model that you believe is the most appropraite for the people using it, and sometimes that doesn’t match the configuration files.

Along those lines, although I like bind from a practical standpoint, I also like the make of tinydns, largely because it does this well. In tinydns configuration, you name your views:


Then you can follow up with associating the actual records with each view:


It’s all in one nice neat location to look at.

There’s three extensions that I’d make here:

  1. The location part should be a tag - so you can have multiple locations specified in one line. Yes, this is an edge case since most people only deal with two views (internal and external), but it does have meaning if you have more than 2 views. For instance, if you have public, sitea, and siteb, and don’t want “bothsites” to be public:

  2. The location indicator for tinydns is really an indicator of the view that you want it to be present in, for whatever properties, of which clientip/location is only one, apply to the view. Basically, I’m arguing that tiny-dns should support equivalents to the match-destinations parameters of bind*.

  3. Add some magic for GeoDNS and let it be its own “location” tag. This is some syntactic sugar that makes it easier to do since it’s so common today and interacts interestingly with views.

* Actually, I’m not arguing that tinsdns should support that, just like I’m not saying that bind needs to support per-record view attributes. When it comes down to it, if the proper way for tinydns to handle the match-destionations parameter is to run multiple tinsdns instances bound to different destinations, that’s perfectly fine. It’s the administrative wrappers or management interface around the DNS service itself that needs to support this semantic, and then map that out to the DNS service.

19 Mar 2013, 00:20

Firewall Rules from Models

I’m trying to put a little more meat to the bones of my ANCL discussion. Unfortunately, I can’t say that I have a tool for it all, yet, but there’s a bit more of the theoretical basis for it. Some of the key requirements forming around ANCL are:

  • It abstracts the components involved - roles are used instead of source IPs and destionation IPs.

  • It generalizes the communication patterns - models are built using the roles; these models can be used in different locations by identifying what composes the roles.

  • It can compose multiple models together - since the models are composed of roles, using the same role in multiple models connects them together (caveat in the second side note).

I was reviewing some slidedecks and came across Ript which has a good presentation. It is a powerful abstraction over iptables. Bonus: It’s abstraction can even be carried to other stateful packet filters, software or hardware.

Comparing it to ANCL, there’s several key differences:

  • A partition is not a connection of models, but a combination of them. It’s an administrative domain which lumps multiple unrelated connections together. This is by far the biggest difference and is a matter of the mental models.

  • The labels aren’t the same as roles. They’re good for identifying, but they aren’t then instantiated. There’s a single mapping, when multiple would be useful. This seems like an implementation detail that could be easily addressed.

  • It’s focus is on specific instantiations of the models. While the rules would be portable to different devices, this limits its ability to be ported to separate but similar situations. This seems like it is a matter of changing conventions (e.g. the naming of the labels), but might be more.

The first item is what it really comes down to is that these are two different mental and description models for the problem. Ript is a firewall abstraction language, and not necessarily an application communication description language. That being said, there are a great many other pieces of Ript, not the least of it being something concrete, that make it very useful and means I have some work here to move ANCL to something that can compare.

There’s also two parts that Ript has also incorporated which I’ll be honest that I don’t know how to incorporate into ANCL:

  • NATs and SNATs and other translations. In many ways, these are not critical to the application communication pattern, at least, not critical at an abstract level. It’s when it’s applied in specific contexts where translations come into play.

  • “Bad Guy” Rejects. Like transactions, these aren’t critical to application communication patterns - in fact, these are anti-communication patterns. Necessary at times, but not something that are accounted for when building the patterns.

The side note from above is that the roles aren’t completely what are connected to each other. Consider two models:

  1. A three tier application model which has a “webserver” role, an “application” role, and a “DB” role.

  2. A DB model which has an “application” role and a “DB” role.

As the application owner of the first model, I would elaborate the nodes in the “application” role. Most likely, the “DB” would be specified by the DBA. Conversely, as the DB owner, I’d elaborate the nodes in the “DB” role, and leave the “application” role to someone else. In either case, half of the equation is left unfilled. There’s (at least) two ways to approach this:

  1. Since the key is to connect the two edges for the “application”-“DB” together, the real role association happens there. So, what language should be used to describe these “edge roles”?

  2. The ambiguity can be taken away if we insert “connection points” or “service” points into each model. In this case, in the application model, we replace the “DB” role with a “DB” connection point, and in the DB model, we replace the “application” role with a “DB” connection point. When joining these models together, we overlay the connection points.

Not sure which is the right way, so that’ll probably comes down to implementation.

13 Mar 2013, 20:38

A tale of two PaaSes

I spend a good amount of time trying to figure out if my operational team can do much to make the general engineering efforts more productive. We’ve followed the usual turns around self-service IaaS and the like, and we’re now exploring the next level of Platforms-as-a-Service. In exploring the options, I’m seeing two large patterns.

On one hand, there are the “middleware centric and injection based” PaaS models. These are the ones where the developer picks a development middleware (Java Servlet, PHP, Node, Rails, etc), and adds other parts in. As if by an after thought, a static file service is added, or maybe a data persistance (i.e. database) service is added. On a implementation level, these usually involve allocating some compute and storage resource (e.g. a VM), installing the middleware container, doing a baseline install of the add-ons, and starting them all up inside that VM. There are some other configuration items such as pointing it at some version control repository, but also the developer is able to login onto the VM via shell.

On another hand, there’s a “service focused” PaaS model. This feels like the lesser named PaaS, though it probably has a larger install because this is the model that AWS largely is. In this case, the developer picks different service components (e.g. DB, cache, messaging bus, etc) and composites them a bit more independently. Underneath the control layer, each of the component providers can implement their services in different ways - using different VMs, processes, or internal containers (e.g. DB schemas w/ authnz) - based on what makes sense for that provider. There’s more work for the developer here as they have to compose services across different providers, and the developer doesn’t have direct access to the underlying system, but in exchange, might have better options.

From an implementor’s perspective, I think the service focused model is easier to maintain. This may not necessarily be the right reason to go down that route, but when it comes to delivery, that matters a lot. It’s also a bit more transportable - at this point in the industry’s lifecycle, it’d be easier to migrate from one IaaS or traditional Infrastructure to another. It’s also easier to extend this model to other (traditional?) services such as monitoring. You can see this in the industry - there’s many different service providers focusing on a narrow niche offering around one specific service, but fewer middleware centric vendors and even those that exist tend to also include some service based model for the add-ons.

As I said, most of the traditional services called PaaS are the form. So, what makes the application middleware so much different than a data store? or a caching layer? Fundamentally, you have some level of “service” which you want to present a clean interface to. This is true for the database as well as for a java servlet container, yet somehow we treat them a little differently in our heads. The only reason I can imagine is that is where time is spent. As a developer, I spend most of my time in the code, so that’s where my mind goes. But while I run it, I want to have a better idea of how it fits in with the other component services.

I think the vote is still out on which way has better long term viability. And it may never be decided on. It may just be a matter of preference.

Maybe PaaS isn’t the right term for this second model. These are more Services-as-a-Service, which seems likely to be a great way to confuse people. Mmaybe they’re more along the lines of Infrastructure, and are just a different take on that. I’ll admit, I’m not sure what the right way to refer to them is, but I believe that the use case that they present is more than just an implementor’s fancy. It’s a valid use case based on how developers are expecting to work with it.