Skip to content
This repository has been archived by the owner on Oct 5, 2018. It is now read-only.
bdha edited this page Nov 9, 2011 · 2 revisions

Overview

  • Client hits the API with a request
  • Goes into MQ
  • A Master Control Program runs on each Compute Node, and is subscribed to MQ
  • If the MCP determines its Compute Node has the resources, it takes the message and services the request
  • MCP hits the local Compute Node API and creates the zone.

Examples

API

Message Queue

Codex

  • Contains information about customers, users.
  • Contains information about node configuration and zone state.
  • Sprawl Machine Manifest
  • Contains configuration information
  • When a system performs an update that requires an update in the Codex, an update request is submitted to the MQ
  • The Codex consumes the message and updates the requesting host.
  • Periodically does sanity checking to ensure the Catalog on each node corresponds to information in the Codex.

Resource Catalog

  • A local sqlite database on each node, which contains node/zone configuration and metrics.

Chef Integration

  • A Sprawl Machine Manifest contains metadata for Chef, including a list of roles and recipes for the each node and zone.

Progress

  • Sprawl Feature Map

Design

Stack

  • A specific configuration of management and compute nodes. Typically consisting of a 42U rack.
  • A stack has public/private/admin networks.

Compute Node

  • A physical server, hosting many zones.
  • Runs SmartOS, which boots via PXE.
  • A Compute Node will need to advertise was kind of storage it has attached to it, in addition to its available CPU and RAM resources.

Zone

  • Containers living on a Compute Node.
  • Leg in the private network.
  • Customer services live in zones.
  • Zones can either be native (Solaris) or Virtual Machines (using KVM).

Admin Zone

  • Each customer gets a VPN server to access their private admin network.
  • VNC and system serial consoles are accessible here.

Chef Zone

  • Each customer gets a Chef Server.
  • When a new zone is instantiated, roles can be passed to the host.

Management

API Servers

Boot Servers

  • Live on the admin network.
  • Dedicated to each stack.
  • Each stack has redundant boot servers, which serve SmartOS to each Compute Node via PXE.

Router Zones

  • Live on each Compute Node.
  • Leg in the public/private networks.
  • Many VNICs into the private nework (one for each customer).
  • Manage the mapping between public/private IPs when a public IP is allocated to a zone.

Master Control Program

  • Runs in the Global Zone on each Compute Node.
  • MCP manage the creation of zones.
  • MCP ensures the Compute Node has required System Templates.

System Template

  • A ZFS dataset which contains a copy of SmartOS.
  • A ZFS volume which contains a copy of an OS for KVM.
  • Systems Templates are stored as files on an HTTP server. Trivial to wget | zfs recv them onto a Compute Node at instantiation time.

Storage

  • New storage volumes an be attached via API and hitting the qemu management console (see: attaching new PCI devices).

System

  • Directly attached SAS for zone roots, small data.

Local

  • Directly attached SAS, disassociated from root I/O.

Bulk

  • Directly attached SATA (hybrid storage with SSD log/cache devices for performance) for larger datasets.

Fast

  • Directly attached SSD.

Networking

Security Silo

  • Mirror port plugged into a security zone or host.
  • The new Sourcefire 40gbs box would be nice here. :)

Security Groups/Silos

VLAN

  • Per customer

vxlan

Layer 3 / gif/gre tunnels

  • gz manages the tunnel

  • ngz uses the endpoint as exclusive VNIC

    < dlg> its sometimes hard to get l2 domains to cover big DR things < dlg> so people are looking more at l3 solutions < dlg> eg < dlg> say you have a server on a particular ip < dlg> if you advertise that one ip as a /32 to the net < dlg> you could drop the ip in one location < dlg> bring it up in another < dlg> and advertise the route from the other site < dlg> and there a metric buttlod of tunneling over ip things around < LeftWing> IP is pretty popular. :P

Errors

  • vmadm foo.json parse error