Tag Archives: introspection

Query the data center like you query the Web


Say you want to query thousands of systems in your data center for something – e.g. the string “9.22.33.4”. Maybe you want to know what systems might be impacted if you were to change this IP address somewhere, like in a firewall rule. How would you implement it?

Most people would send this query to agents running on each of the thousands of computers, have them execute this query locally by inspecting their machine’s state, and have the results shipped back. Not only is this a terribly clunky approach in practice, it also scales poorly as the number of systems grows. Your query latency is gated by the slowest machine in your data center – you have to wait until every machine responds before you have your answer. What if one of the machine’s is wedged and its response never comes back; how long should you wait?

Now let us change the context completely. Say you want to query millions of sites on the Web for the string “9.22.33.4”. How would you implement it?

This is a no-brainer. You query a central index, not the individual web sites. The index is constantly fed by crawlers that scan every web site periodically to extract changes made to that site since the last crawl. And here is the key: the crawlers have no knowledge of what queries will be asked of the index. Your query latency is independent of the current state of every website.

This approach is not only scalable, it also enables a more intuitive human interface. It is scalable because (a) crawling is a non-intrusive task (unlike running an agent inside a machine), enabling web sites to be monitored frequently enough to keep the index continuously refreshed, and (b) the data extraction and indexing process is decoupled from the query handling process, enabling each to be optimized independently. By decoupling queries from the crawling, there is no requirement to tune the query format to suit the needs of the data crawler – which in turn allows the query interface to be designed for human consumption, and the crawler interface to be designed for machine consumption.

Search engines like Google, Bing, and Yahoo are able to keep the index remarkably close to the real-time state of billions of web sites, debunking the myth that such an approach risks having the index become too stale to support real-time situational awareness requirements.

So, how can we query the data center like we query the Web?

We must begin by re-thinking how systems are monitored. In an earlier post I talked about “introspection” as an alternative way to monitor the real-time state of a system without the use of in-system agents. Introspection provides the foundation for building a new kind of “crawler”, one that continuously indexes the state of systems in a data center, similar to the way a Web crawler works on documents and web sites. This is because introspection enables crawling systems without disrupting their operation in any way.

In essence, introspection enables us to think about a running system as a series of point-in-time snapshots, where each snapshot is a document containing the metadata about that system’s state extracted by the crawler at a particular point in time. If you think about the system as a movie, you can think about this document as a frame. Frames are literally just documents. You can imagine translating all sorts of useful system state into a simple JSON dictionary for example, that would look something like this:

{
  '_frame': {
    JSON entry with timestamp and other metadata
  }
  'file': {
    one JSON entry per monitored file
  },
  'process': {
    one JSON entry per running process
   },
  'connection': {
    one JSON entry per open connection
  },
  'package': {
    one JSON entry per installed package 
  },
  ...
}

This is the “frame” output by every crawl of a system: it is the document you have to index, to provide a Google-like query interface. And yes, the query response can return faceted search results, rank ordered by various heuristics that make intuitive sense in a data center context. Your mind immediately jumps to abstractions that are familiar in the Web search domain. Few tools to manage data centers look anything like this today – they are made for consumption by skilled IT Ops people, not regular humans like the rest of us. Why must this be so?

The Origami project, that my team has been working on for the last couple of years, has been exploring this very question. Why can’t a systems in the Data Center be queried and indexed like documents in the Web? In fact, the state of many websites changes at rates faster than your typical production server, and yet we get reasonably good real-time query results from the index. There really is no good reason why these two worlds have to be so far apart.

Introspect, don’t monitor, your VMs


When you need to observe the state of a running computer, what do you do? If its just a few computers, you simply ssh into each one and inspect the state from the command line. If you have a lot of computers, you use an agent that locally monitors each computer’s state and periodically ships the results to you. In either case, you make a key assumption: that your inspection logic has to run within the OS context of the running machine in order to observe its state.

There are a number of problems with this approach, and most people are oblivious to it. Lets take the case where you need to monitor thousands of computers – a modest data center scale operational environment. Virtually anyone managing systems at this scale or higher uses some kind of in-system monitoring solution. Data center monitoring is a billion dollar business – it provides what Ops teams refer to as “situational awareness”, the lifeblood of data center operations.

Problem 1: When a system becomes unresponsive, so does the monitoring agent running within it

This is a far more common situation than you might think. Systems can become (intermittently) unresponsive for any number of reasons. A process may be thrashing the disk, or memory, or both and your monitoring agent is not getting sufficient cycles to function. A system update may have modified a library on which your monitoring agent depends, causing the agent itself to malfunction or crash.

Google had such an outage in 2011, when graphs showing the global health of Google vanished from internal dashboards, pagers fell silent, and ssh stopped working. Because the paging system was itself affected, many site reliability engineers had no idea a major outage was in progress. The root cause turned out to be an accidental change to the permission flags of file /lib/x86_64-linux-gnu/ld-2.15.so (the Linux dynamic loader used to exec() user processes) from -rwxr-xr-x to -rw-r–r—. As a result, all user processes on these systems, including Google’s own monitoring agents, failed to start. That google.com did not fail despite what was apparently an outage that impacted “between 15 and 20% of Google’s production serving machines” is a testament to its resilient design. But this highlights the problem of relying on in-system monitoring agents for understanding system health.

Problem 2: In-system monitoring agents are vulnerable to security attacks

Monitoring agents are the foundation of another billion dollar industry: data center security. Yet, in-system security monitoring agents are themselves vulnerable to malicious attacks and accidental disruption, like any other process running inside these systems. That does not give a warm and fuzzy feeling.

Problem 3: In-system monitoring agents impact system performance

Every vendor or individual that writes a “lightweight monitoring agent” promises that the agent’s own operation will not impact the performance of the system it is monitoring. But agents are after all just another piece of software, running in a complex system environment. It is not uncommon to encounter situations where the culprit for poor performance is the monitoring agent itself.

Ultimately, any monitoring logic (even the commands you type when you ssh into a running computer) have side-effects that you are not fully aware of. This is the classic Heisenberg effect: the very act of monitoring the system is affecting the state you are trying to monitor. Most people disregard this as a problem they need to even think about, until their systems become heavily loaded. It is usually under peak loads that the impact of monitoring agents become more noticeable. And that is just when the monitoring data they provide is most necessary.

Introspection: an alternative way to observe system health

Virtualization enables a different way to observe the state of a running system. Introspection refers to the process of inspecting the state of a VM from outside the guest OS context. This is fundamentally different from monitoring, in that there is no monitoring logic running within the VM.

Introspection of a VM’s file system and memory state is possible by leveraging the VMM (virtual machine monitor, aka hypervisor) that interposes between the VM’s guest state and its mapping to the underlying physical host state. The key question is: can out-of-VM introspection approach the robustness and fidelity of conventional in-VM monitoring?

There are 2 parts to this problem: (a) real-time introspection of guest file-system state, and (b) real-time introspection of guest memory. Each has different challenges. Note that if we removed the “real-time” requirement, many good solutions exist already. For instance, backup solutions are now available that use VM disk snapshots to perform continuous backup without the use of an in-system backup agent. A benefit of this approach is you do not have to schedule a downtime window for backing up your VMs (though your VMs may be briefly stunned when the guest OS quiesces any uncommitted state to disk, prior to taking the snapshot).

My team and I have been experimenting with a technique we refer to as “Near Field Monitoring”, that uses VM file-system and memory introspection as an alternative to in-system monitoring agents. After two years of R&D, and many dead-ends, I am proud to say we now have these techniques working in the IBM Research Compute Cloud (RC2) production environment.

The following publications give the technical details of our approach. The work was done with CMU and University of Toronto, originally started during summer internships done by Wolfgang Richter and Sahil Suneja with my team at IBM Watson Labs. In subsequent blogs, I will dive deeper into the technical intricacies of these two papers. The key result here is that introspection based “Near Field Monitoring” techniques can approach the robustness and fidelity of in-system monitoring agents for most common monitoring tasks we have studied. This makes them a viable contender to disrupt the in-system monitoring model that is in widespread use today.

References

[1] Agentless Cloud-wide Streaming of Guest File System Updates. Wolfgang Richter (Carnegie Mellon University), Canturk Isci (IBM Research), Jan Harkes and Benjamin Gilbert (Carnegie Mellon University), Vasanth Bala (IBM Research), and Mahadev Satyanarayan (Carnegie Mellon University). Best Paper Award, IEEE International Conference on Cloud Engineering, Boston, MA, March 2014.

[2] Non-intrusive, Out-of-band and Out-of-the-box Systems Monitoring in the CloudSahil Suneja (University of Toronto), Canturk Isci (IBM Research), Vasanth Bala (IBM Research), Eyal de Lara (University of Toronto), Todd Mummert (IBM Research). ACM SIGMETRICS, Austin TX, June 2014.