RoCE - RDMA over Converged Ethernet

Introduction


Currently, I work for a mid-sized high-performance computing (HPC) shop.
For many of the scientific codes we run, communication performance matters -
both in terms of inter-machine (a.k.a., inter-node) bandwidth and latency.
Like most HPC shops, we have some experience with Infiniband, but in recent
years we’ve been using 10 Gbps Ethernet (10gigE) for a cluster interconnect.
Given ethernet’s prevalence, and general dominance in datacenter networking,
10gigE seems on the surface to be a general win, and a decent choice for
a cluster interconnect (particularly for a user base that historically
prefers gigabit ethernet for cost reasons).


I’ve designed three 10gigE clusters, two of which are on the current
(November, 2011) Top 500 list. I do
not recommend this. 10gigE has its place, but currently economics favor
Infiniband for high-performance computing. If your code uses MPI, and you
need more cores than you can fit in one compute node (and your code isn’t
embarassingly parallel - I’ve seen some that could operate nicely over
10 Mbps ethernet), you should be looking at Infiniband.


Rather than delving into why I’ve been building 10gigE clusters, this page
discusses modern technology that can help you get the most performance from
a high-speed ethernet fabric. Be warned, the content from here on out gets
technical quickly. I’ve likely spent more time than is healthy examining
this space, and doing so requires a fair amount of expertise in TCP, IP,
ethernet, Infiniband (as well as general RDMA theory, and its multiple
incarnations), operating systems, MPI libraries, and several vendors’ product
lines.


To quote the xterm source code: “There be dragons here.”


Defining “slow”, and Why Plain TCP/IP is Bad


TCP/IP is great, for most things - but the API pretty much requires kernel
intervention. Your app calls socket() and write(),
some library fires off a syscall, and the kernel starts formatting data to
go over the wire. Under Linux, a null syscall has an overhead of around 1000
instructions (if you’ll pardon the blind assertion), so that means you can
do around 2.5 million syscalls per second on a 2.5 GHz CPU (using some vague
hand-waving to avoid calulating effects of load-store queuing and superscalar
processors). On paper, that means a hard max of around 30 Gbps of throughput -
more, with frame sizes over 1500 bytes.


Unfortunately, that’s not reality. First off, a processor will need to do
some data formatting and copying beyond the time to enter the syscall. Second,
data arriving will also trigger syscalls. Some of this can be ameliorated
(e.g., jumbo frames, interrupt coalescing, etc.) but at a cost of tying up a
processor to handle the kernel’s side of the communication. If your application
requires frequent data exchange (like most HPC simulations), the added latency
and processor overhead can greatly degrade performance - even without fully
utilizing the available bandwidth.

TOE NICs


TOE (TCP Offload Engine) NICs may help, to a limited degree. A TOE will
reduce the CPU’s workload, but won’t significantly reduce overall message
latency - unless the TOE vendor comes with a wrapper library to replace the
sockets API (Solarflare does this, for example).

iWARP


If you need to do RDMA over Ethernet, this is the easiest way to do it. It’s
not quite Infiniband, but many of the various IB-related commands in OFED
will work. Many RDMA apps will work with this, and as iWARP is encapsulated
by TCP/IP it can transit a router. Latency will be higher than RoCE (at least
with both Chelsio and Intel/NetEffect implementations), but still well under
10 μs. iWARP is reasonably stable with recent versions of the
OpenFabrics stack - in-kernel drivers
may not be as stable (including those baked into Redhat Enterprise 5 and 6).
Caveat emptor.


RoCE


RoCE is RDMA over Converged Ethernet - but Infiniband over Ethernet would be
a more apt description. Strip the GUIDs out of the IB header, replace them
with Ethernet MAC addresses, and send it over the wire. As of this writing,
only Mellanox (www.mellanox.com) makes
RoCE-capable equipment (their CX2 and CX3 line of products).


Infiniband is a lossless physical-layer protocol, so RoCE requires lossless
Ethernet. Also, since it’s Ethernet, RoCE cannot transit a router. It’s
strictly a layer-2 protocol, and it needs a complicated layer-2.



Lossless Ethernet: a Quick Review




Ethernet becomes lossless by re-using 802.1D PAUSE frames for explicit flow
control. This is timing-sensitive; a receiver must send a PAUSE soon enough
such that it is received and processed before the receive buffer can fill.
Obviously, there are issues stretching this over some distance. Switches
must be internally lossless, and must be able to send PAUSE frames as well
as receive them. Such switches are usually marketed with acronyms like “DCB”
(DataCenter Bridging) or “CEE” (Converged Enhanced Ethernet).


Obviously, this coarse-grained approach will pause all traffic over the link -
including any IP or FCoE traffic. As this can have a negative impact on
non-RoCE performance, Cisco has proposed Priority Flow Control (PFC, now
covered under IEEE 802.1Qbb). This
is a PAUSE frame with a special payload, indicating which Ethernet QoS class
should be paused. This is accompanied by other protocols, to negotiate
QoS values on either end of a link (i.e., between NIC and switch).


Finally, all types of traffic on the link will have different Ethernet frame
types (as described by
IANA).
IPv4, IPv6, FCoE, and RoCE all have different ID values.

Reality


While RoCE is supported by
OFED, as of OFED 1.5.3 it isn’t
completely stable. You’ll want to use Mellanox’s OFED - version 1.5.3 or
higher. Stock OFED will work fine for small tests, but large applications
will have a tendency to crash.


PFC is a pain. The tools to auto-negotiate may not exist for RoCE - the
only documentation I’ve found was limited to FCoE. Avoid it if at all possible.


Somehow, you’ll need to classify RoCE traffic as lossless. Here’s some
suggestions, in my order of preference:

  1. Discriminate RoCE traffic by Ethertype - RoCE packets would be
    treated losslessly, and non-RoCE traffic could be dropped (during congestion).

  2. Classify ALL traffic as lossless (and deal with the performance impact, if
    any, on non-RoCE traffic).

  3. Assign a QoS class for lossless traffic. Unfortunately, Mellanox adapters will
    only emit a QoS when they emit a VLAN tag, so you’ll need to do the following:

    • Set a default IB Service Level to match your QoS using options rdma_cm def_prec2sl=4 in /etc/modprobe.d (Obviously, I’m using the value 4)

    • Configure your Ethernet switch to treat that traffic as lossless

    • Create a tagged VLAN device on your RoCE NIC on all connected systems

    • Assign those VLAN devices a private IP address

    • Stick that IP address in /etc/mv2.conf, so MVAPICH2 will know what IP address to try for RoCE connections

    • Configure all other RDMA-aware applications to use a non-default GID (since VLAN interfaces will appear as additional GID indexes on the Infiniband HCA side of the RoCE adapter)






    So you have Cisco Nexus switches…




    If you can, stop reading and go buy some Infiniband adapters. You’ll save a
    considerable amount of staff time.




    Fine. Keep reading. But don’t say I didn’t warn you.




    The Nexus 5000-series and the Nexus 7000-series switches are completely
    different products. The interface to building lossless queues is different,
    the command syntax is different, and different values can be used for lossless
    traffic classes on each series of switches. If you have environments with
    both, you’ll be picking different QoS values.




    The Nexus 7000 platform only supports lossless queuing on the newest “F”
    boards - the fabric boards that have no routing abilities. You’ll want to
    buy those, if you plan on having stable RoCE.




    Finally, be wary of ANY firmware updates. We’ve had a functional RoCE
    configuration on a Nexus 7000 switch, using firmware 5.1(3), using the
    third method above. That broke, however, when we upgraded to 5.1(5).
    Something changed in the default queuing config, and since you can only build
    on the default lossless queue config (rather than nuke it and define your
    own), you are subject to changes in the default. In our case, RoCE performance
    dropped to 30 Mbps (down from 9.91 Gbps). All wasn’t lost, though - after
    the upgrade, all traffic was lossless (except what we’d previously tagged
    via QoS, of course). We just stopped using QoS, and now have reliable
    Ethernet. Absolutely bizarre.



    Making this all work for practical apps




    Making this work depends on how RoCE traffic was classified. If RoCE
    Ethertypes are lossless, or if all traffic is lossless (options #1 or #2,
    above) any RDMA application should just work - the RoCE adapter presents as an
    Infiniband HCA.




    If you picked option #3, you’ll need to jump through some extra hoops. First,
    set the def_prec2sl module parameter and /etc/mv2.conf
    as described above. At this point, MVAPICH2 applications should work. For
    OpenMPI, you’ll need to use OpenMPI 1.4.4 or 1.5.4 or newer. They need
    additional command-line options to set the IB service level and the IP address
    to use: -mca btl_openib_ib_service_level <number> and
    -mca btl_openib_ipaddr_include <ipaddr>, respectively.
    These can be baked into a config file (like openmpi-mca-params.conf
    in your OpenMPI’s share directory). Note that
    btl_openib_ipaddr_include can take CIDR notation for a subnet to
    match, so you can use the same config file for all nodes in a cluster.




    In theory, it may be possible to use RoCE for non-MPI applications - including
    kernel-level things like Lustre. I’d only attempt this if options #1 or #2
    are in use, though - setting extra VLANs, non-default GIDs, and custom IB
    service levels (mapped to Ethernet QoSes) is likely to be hard to integrate
    in anything other than OpenMPI and MVAPICH2.



    Additional Resources




    There isn’t a lot of documentation (practically zero, outside of Mellanox)
    on RoCE. Any useful links I can find will be added here.



    Time Machine, Meet Netatalk. But in Lion.

    Introduction


    This morning, I upgraded the first Mac around the house to MacOS 10.7 (aka,
    “Lion”). Went smoothly, and it’s re-indexing Spotlight now. Insert comments
    about how wonderful it is to have to get used to new trackpad finger gestures
    (gestures are nice, but it’ll be a few days before I’m used to the workflow
    change).


    Naturally, Time Machine is now horribly broken. Originally, I was using AFP
    and netatalk, as described here, but then I
    switched to SMB and Samba (since Netatalk 2.1.x wasn’t as stable). Lion no
    longer supports either of these methods; it only works with AFP 3.3. That’s
    only supported by Netatalk 2.2, which (as of this writing) was committed to
    git yesterday.


    This page serves to document my odyessy in setting up netatalk on a FreeBSD
    jail in the basement, from the latest source in git. Here’s a couple useful
    links:


    Throughout all this, I’m assuming a similar earlier
    setup
    of Time Machine has been done, and the previous netatalk packages
    have been removed. Right now, I’m mainly concerned with differences.

    Source Setup


    As shown in the links above, get git, grab the source, and start building:



    pkg_add -r git
    git clone git://netatalk.git.sourceforge.net/gitroot/netatalk/netatalk
    cd netatalk
    git checkout netatalk-2-2-0
    ./bootstrap
    ./configure –without-acls –without-pam –disable-ddp –disable-cups



    I didn’t have appropriate zeroconf headers on my FreeBSD jail, so I didn’t
    configure with –enable-zeroconf. I’ll use Avahi for that setup, if needed.
    My config ended up looking like this (printout from ./configure):

    Using libraries:
        LIBS =  -L$(top_srcdir)/libatalk
        CFLAGS = -I$(top_srcdir)/include -D_U_="__attribute__((unused))" -g -O2 -I$(top_srcdir)/sys
        SSL:
            LIBS   =  -lcrypto
            CFLAGS =  -I/usr/include/openssl
        LIBGCRYPT:
            LIBS   = -L/usr/local/lib -lgcrypt -lgpg-error
            CFLAGS = -I/usr/local/include
        BDB:
            LIBS   =  -L/usr/local/lib -ldb-4.6
            CFLAGS =  -I/usr/local/include/db46
    Configure summary:
        Install style:
             none
        AFP:
             AFP 3.x calls activated: 
             Extended Attributes: ad | sys
        CNID:
             backends:  dbd last tdb
        UAMS:
             DHX     ()
             DHX2    ()
             RANDNUM ()
             passwd  ()
             guest
        Options:
             DDP (AppleTalk) support: no
             CUPS support:            no
             SLP support:             no
             Zeroconf support:        no
             tcp wrapper support:     yes
             quota support:           no
             admin group support:     yes
             valid shell check:       yes
             cracklib support:        no
             dropbox kludge:          no
             force volume uid/gid:    no
             Apple 2 boot support:    no
             ACL support:             no
    


    The lack of CUPS and ACLs should be tolerable, since this is just going to
    be used for Time Machine (I use Samba for everything else). Note that
    initially I did leave ACL support to autodetect; it was enabled, but that led
    to compilation errors.


    Before you make, if you’re using FreeBSD like me you’ll need to
    fix some compilation errors. I’m sure the ports folks will fix this in due
    time, but as I’d rather not wait…


    First, at.h:



    — sys/netatalk/at.h.orig 2011-07-24 12:28:55.823029116 -0400
    +++ sys/netatalk/at.h 2011-07-24 12:29:40.522913740 -0400
    @@ -24,6 +24,14 @@
    #include
    #include / so that we can deal with sun’s s_net #define /

    +typedef unsigned char u_char;
    +typedef unsigned short u_short;
    +typedef unsigned int u_int;
    +typedef unsigned long u_long;
    +
    +#include
    +#include
    +
    #ifdef MACOSX_SERVER
    #include
    #endif / MACOSX_SERVER /



    Then cnid_metad.c:



    — etc/cnid_dbd/cnid_metad.c.orig 2011-07-24 12:48:52.140103389 -0400
    +++ etc/cnid_dbd/cnid_metad.c 2011-07-24 12:49:21.195654454 -0400
    @@ -45,6 +45,7 @@
    #include
    #define _XPG4_2 1
    #include
    +#include
    #include
    #include




    make, make install, and move on. Be warned: since
    this install comes from source, there likely won’t be an init.d
    or rc.d script to start up daemons. A usable FreeBSD template is
    below (based of the most current port, as of this writing).

    #!/bin/sh
    #
    # $FreeBSD: ports/net/netatalk/files/netatalk.in,v 1.3 2010/03/27 00:13:49 dougb Exp $
    #
    # PROVIDE: atalkd papd cnid_metad timelord afpd
    # REQUIRE: DAEMON
    # KEYWORD: shutdown
    #
    # AppleTalk daemons. Make sure not to start atalkd in the background:
    # its data structures must have time to stablize before running the
    # other processes.
    #
    
    # Set defaults. Please overide these in /usr/local/etc/netatalk.conf
    ATALK_ZONE=
    ATALK_NAME="`/bin/hostname -s`"
    AFPD_UAMLIST=
    AFPD_MAX_CLIENTS=50
    AFPD_GUEST=nobody
    
    # Load user config
    if [ -f /usr/local/etc/netatalk/netatalk.conf ]; then . /usr/local/etc/netatalk/netatalk.conf; fi
    
    netatalk_enable=${netatalk_enable-"NO"}
    atalkd_enable=${atalkd_enable-"NO"}
    papd_enable=${papd_enable-"NO"}
    cnid_metad_enable=${cnid_metad_enable-"NO"}
    afpd_enable=${afpd_enable-"NO"}
    timelord_enable=${timelord_enable-"NO"}
    
    . /etc/rc.subr
    
    name=netatalk
    rcvar=`set_rcvar`
    hostname=`hostname -s`
    
    start_cmd=netatalk_start
    stop_cmd=netatalk_stop
    
    netatalk_start() {
        checkyesno atalkd_enable && /usr/local/sbin/atalkd
        checkyesno atalkd_enable && \
            /usr/local/bin/nbprgstr -p 4 "${ATALK_NAME}:Workstation${ATALK_ZONE}" &
        checkyesno atalkd_enable && \
            /usr/local/bin/nbprgstr -p 4 "${ATALK_NAME}:netatalk${ATALK_ZONE}" &
        checkyesno papd_enable && /usr/local/sbin/papd
        checkyesno cnid_metad_enable && /usr/local/sbin/cnid_metad
        checkyesno timelord_enable && /usr/local/sbin/timelord
        checkyesno afpd_enable && \
            /usr/local/sbin/afpd -n "${ATALK_NAME}${ATALK_ZONE}" \
                    -s /usr/local/etc/netatalk/AppleVolumes.system \
                    -f /usr/local/etc/netatalk/AppleVolumes.default \
                    -g ${AFPD_GUEST} \
                    -c ${AFPD_MAX_CLIENTS} \
                    ${AFPD_UAMLIST}
    }
    
    netatalk_stop() {
        checkyesno timelord_enable && killall timelord
        checkyesno afpd_enable && killall afpd
        checkyesno cnid_metad_enable && killall cnid_metad
        checkyesno papd_enable && killall papd
        checkyesno atalkd_enable && killall atalkd
    }
    
    load_rc_config ${name}
    run_rc_command "$1"
    

    Netatalk


    A few extra options are needed, both for each mount and for the server itself.


    Here’s the relavant (non-comment) bits at the end of AppleVolumes.default. Use your own paths and logins as appropriate.

    # The line below sets some DEFAULT, starting with Netatalk 2.1.
    :DEFAULT: options:upriv,usedots
    
    # The "~" below indicates that Home directories are visible by default.
    # If you do not wish to have people accessing their Home directories,
    # please put a pound sign in front of the tilde or delete it.
    #~
    /tm/laptop "Laptop Backup" allow:laptop_login cnidscheme:dbd options:usedots,upriv,tm
    /tm/desktop "Desktop Backup" allow:desktop_login cnidscheme:dbd options:usedots,upriv,tm
    
    # End of File
    


    And here’s the relevant pieces from afpd.conf. Obviously, use
    your own server name and IP.

    # default:
    # - -tcp -noddp -uamlist uams_dhx.so,uams_dhx2.so -nosavepassword
    SERVER -tcp -ipaddr 10.0.0.10 -noddp -uamlist uams_randnum.so,uams_dhx.so,uams_dhx2.so -nosavepassword
    
    
    

    Avahi

    Avahi is relatively unchanged. If you were using Avahi before Lion, it should work the same. I think.

    File System Bits

    Oddly enough, it looks like the .com.apple.timemachine.supported file is no longer required.

    Client Configuration

    I'm still using the preference for an unsupported time machine volume. Run the following on the client:

    defaults write com.apple.systempreferences TMShowUnsupportedNetworkVolumes 1
    


    If you aren’t dealing with a recently-upgraded client and pre-existing backups,
    you may want to read the original notes on setting up sparsebundles on the
    client here.

    Caveats


    None so far, but then, I’m still in the middle of my first Time Machine backup
    under Lion. Things largely seem to work, though. Expect to spend some
    non-trivial time on the first backup, to re-index any pre-existing dumps, but
    then Time Machine appears to just do its thing normally.

    Time Machine, Meet Netatalk

    Introduction


    My family has started using MacOS on laptops. Apple’s been shipping a nifty
    little backup utility, Time Machine - it makes backups and restores to easy,
    the wife can use it (and believe me, this is a big improvement from things
    like dirvish, or
    amanda, or other things that require
    a more extensive knowlege of UNIX). For more info, check out
    Apple’s
    site
    .


    Time Machine requires either a local disk, or a special (Apple-specific)
    networked disk called a Time Capsule. Since I’ve got a FreeBSD machine in
    the basement with a good-sized ZFS pool, I’d rather leverage that. In
    theory, if the laptops’ Time Machines can backup to the basement ZFS pool,
    they’ll get the benefit of dynamically grown disk space and offsite backups
    (well, once I figure out an offsite backup solution for the pool, at least).
    Sadly, making Time Machine use a non-Apple widget for storage is decidedly
    non-intuitive, and I can never seem to find appropriate documentation on
    this when I need to. Below are my notes on the subject, for future
    reference.


    Server Setup


    Personally, I’m using a FreeBSD file server these days (primarily for the
    availability of ZFS, and the lack of Solaris). These instructions are rather
    general, but be warned that you may have different results under different
    operating systems.


    You’ll need two pieces on the server:
    netatalk for serving the data,
    and Avahi for advertising your share.

    Netatalk


    Netatalk implements Apple’s File Protocol (AFP) under UNIX. Install it
    however your OS installs software (it’s in FreeBSD’s ports, and is a package
    for most Linux distributions). Netatalk uses PAM for authentication, so
    take a look in /etc/pam.d and make sure it looks like netatalk
    will authenticate.


    We care about two config files, AppleVolumes.default (which
    lists the available shares) and afpd.conf (which controls the
    file sharing service). Netatalk can handle a variety of other ancillary
    AFP tasks (there’s a whole set of protocols for naming things, for instance),
    but we really don’t care about that now. The format of both files is simple:
    one directive per line, and lines starting with # are comments.


    I declare that a particular subdirectory off my ZFS pool is available as an
    AFP volume named “fmep-Tardis” (what else do you call fmepnet’s time machine?)
    That’s the name of the disk as it appears in Time Machine Preferences on your
    clients. Relevant AppleVolumes.default parts:


    # The “~” below indicates that Home directories are visible by default.
    # If you do not wish to have people accessing their Home directories,
    # please put a pound sign in front of the tilde or delete it.
    #~
    /pool/backup/time_machine “fmep-Tardis”


    I haven’t had need to modify afpd.conf from default. Since your
    default may vary, here’s what I’m using:


    # default:
    # - -transall -uamlist uams_clrtxt.so,uams_dhx.so -nosavepassword
    - -transall -uamlist uams_clrtxt.so,uams_dhx.so -nosavepassword

    Consult the afpd.conf man page for details. Basically, this sets
    the default options for all servers (as clearly indicated by the server name
    -“), and allows both PAM-based and Diffie-Hellman key exchanges
    for password authentication.

    Avahi


    Avahi is an mDNS responder, for zero-configuration service advertisements.
    Ever wonder how your Mac finds other Macs on the network? Here you go.
    Again, install as best suits your OS - it’s a FreeBSD port, and a package on
    a variety of Linuxes.


    Avahi is a layer-2 protocol - it doesn’t use routeable IP addresses. You’ll
    want to run this on something that’s on the same network as your Mac clients.
    If you have a separate wireless and wired network in the house, as I do,
    you’ll want to put Avahi on something connected to each network (well, each
    network with clients, at least). There are no problems routing AFP over TCP,
    so it’s perfectly permissible to use Avahi to advertise an AFP server on a
    different subnet - in fact, that’s what I’m doing right now.


    Avahi has a top-level config file, avahi-daemon.conf, and a
    config file for every advertised service. Here’s my
    avahi-daemon.conf:


    # See avahi-daemon.conf(5) for more information on this configuration
    # file!

    [server]
    #host-name=foo
    #domain-name=local
    #browse-domains=0pointer.de, zeroconf.org
    use-ipv4=yes
    use-ipv6=no
    #check-response-ttl=no
    #use-iff-running=no
    #enable-dbus=yes
    #disallow-other-stacks=no
    #allow-point-to-point=no

    [wide-area]
    enable-wide-area=yes

    [publish]
    #disable-publishing=no
    #disable-user-service-publishing=no
    #add-service-cookie=yes
    #publish-addresses=yes
    #publish-hinfo=yes
    #publish-workstation=yes
    #publish-domain=yes
    #publish-dns-servers=192.168.50.1, 192.168.50.2
    #publish-resolv-conf-dns-servers=yes

    [reflector]
    #enable-reflector=no
    #reflect-ipv=no

    [rlimits]
    #rlimit-as=
    rlimit-core=0
    rlimit-data=4194304
    rlimit-fsize=0
    rlimit-nofile=30
    rlimit-stack=4194304
    rlimit-nproc=3

    Note that this is basically the default config on many OSes. Be sure to enable
    ipv4!


    When you installed Avahi, it should have created a “services
    directory. Files in there are XML service descriptions, usually one per
    service name. Here’s my afp.service:


    <?xml version=”1.0” standalone=’no’?><!–-nxml-–>
    <!DOCTYPE service-group SYSTEM “avahi-service.dtd”>

    <!– $Id: time_machine.html,v 1.5 2009/10/06 03:47:50 shuey Exp $ –>

    <!–
    This file is part of avahi.

    avahi is free software; you can redistribute it and/or modify it
    under the terms of the GNU Lesser General Public License as
    published by the Free Software Foundation; either version 2 of the
    License, or (at your option) any later version.

    avahi is distributed in the hope that it will be useful, but
    WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
    General Public License for more details.

    You should have received a copy of the GNU Lesser General Public
    License along with avahi; if not, write to the Free Software
    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA
    02111-1307 USA.
    –>

    <!– See avahi.service(5) for more information about this configuration file –>

    <service-group>

    <name replace-wildcards=”yes”>AFP on fergus</name>

    <service>
    <type>_afpovertcp._tcp</type>
    <port>548</port>
    <host-name>server.in.my.basement.fmepnet.org</host-name>
    </service>
    <service>
    <type>_device-info._tcp</type>
    <port>0</port>
    <txt-record>model=Xserve</txt-record>
    </service>

    </service-group>


    Most fields above are self-explanatory. The host-name field
    can be replaced with an IP address, if you don’t have DNS service for internal
    machines around the house. The _device-info service stanza
    contains metadata for the service. In this case, the model=Xserve
    causes more recent versions of MacOS to think your AFP service is an Apple
    Xserve, so you’ll get a pretty little icon (an Xserve RAID picture, I believe).

    File System Bits


    Obviously, make sure your Time Machine directory, referenced in Netatalk’s
    AppleVolumes.default, actually exists. It’ll also need to be
    writable by whatever user or group that should be using the volume - group
    read/write/execute permissions for “users“ is probably a good
    idea.


    You’ll also need to create an empty file named
    .com.apple.timemachine.supported. This did it for me:


    touch /pool/backup/time_machine/.com.apple.timemachine.supported


    Client Configuration


    The bad news: Because you’re using a non-Apple server, this isn’t going to be
    very intuitive. The good news: Once a client is set up, it will just work as
    normal - just like if it was an Apple server on the other end. Open a
    terminal window, and let’s get started.


    Netatalk may be AFP, but it’s not quite supported. In a terminal window, run
    this:


    defaults write com.apple.systempreferences TMShowUnsupportedNetworkVolumes 1


    Now go into System Preferences, into the Time Machine config. You should be
    able to see your shiny new Netatalk share (assuming your Avahi and Netatalk
    configurations are correct, and both sets of daemons are running, and nothing
    else is wrong), so you can configure Time Machine normally. Go ahead, start
    your first backup. Then watch it fail.


    Time Machine is trying to create a sparse bundle disk image - that’s a set of
    files that pretend to be a MacOS disk, and grow (or increase in numbers) as
    files are copied into the disk. That’s MacOS’s way of handling
    less-than-supported volumes. Problem is, Netatalk doesn’t support a couple
    of the AFP operations necessary to finish off the disk image. You can
    work around it by creating your own disk image locally, then copying the
    result to the Netatalk server.


    Fire up Disk Utility. Use File->New Blank Image… to create a new disk image.
    Fill out the form as follows:


    • Volume Name: computername_macaddr.sparsebundle

    • Volume Size: Custom (enter a size larger than your client’s disk)

    • Volume Format: Mac OS Extended (Journaled)

    • Encryption: none

    • Partitions: Single partition - Apple Partition Map

    • Image Format: sparse bundle disk image


    Note that the volume name is your client’s hostname, an underscore, followed
    by the MAC address of the on-board ethernet interface (in hex, all lower case,
    with no :s), with the .sparsebundle extension. To
    find out our machine’s MAC address, just do an ifconfig -a and
    look for the “media“ line in the en0 stanza.


    If you have issues with the GUI, you may want to try the CLI version:

    hdiutil create -library SPUD -size $SIZE -fs Journaled HFS+ -type SPARSEBUNDLE -volname $MACHINENAME_$MAC_ADDRESS.sparsebundle
    


    Once you’ve created a sparse bundle, copy it (scp works for me)
    over to your Netatalk server. The sparse bundle is actually just a directory
    of information, so this should work fine.


    With the sparsebundle in place, open Time Machine’s preferences and do another
    backup. Things should click, your Netatalk server should mount, and data
    should start flowing. Yay!


    Caveats

    Server Disk Fills Up


    If your server’s disk fills up, Time Machine will not be happy. Early versions
    of Time Machine would nuke all old backup images (except maybe the latest one).
    Supposedly this has been fixed, but….well, caveat emptor.

    Restores


    If you’re like me, you use a networked file server to back up your laptop.
    If your laptop’s disk dies, you’ll have trouble restoring - the Mac OS X
    install disk can “Restore from Time Machine Backup”, but it won’t find yours
    (since it’s on an unsupported volume). Supposedly, you can look under the
    Utilities menu and open up a Terminal, then run this:


    mount -t afp afp://username:password@hostname/ShareName /Volumes/ShareMount

    Obviously, customize the above for your setup. If you can ls
    /Volumes
    and see your share, with your sparsebundle in it, it’s
    mounted correctly. Once that’s done, close the terminal and try “Restore
    from Time Machine Backup” again, and things should work.


    Well, in theory, at any rate.

    Acknowledgements


    Google is extremely helpful, as always. Also, Matthias Kretschmann’s blog
    has proven pretty handy, especially
    this entry.

    Syncing Apple's Addressbook with Google


    Apparently Apple’s decided to most irritatingly hide the “Sync with Google”
    option in their Addressbook, unless you buy an iPhone. I currently lack
    either an iPhone or an iPod Touch, so I’m apparently out of luck.


    Thankfully, someone else has already figured this out. Unfortunately, his
    blog seems to be down - but it’s still in Google’s cache. Here’s his findings:


    Courtesy Eli’s Blog:

    Sync Google Contacts with Apple’s Address Book WITHOUT an iPhone/iPod Touch, and without MobileMe


    Step 1. Get the “Synchronize with Google” option to show up in Address Book. You will need 10.5.3 or later, and just the right entry In the com.apple.ipod.plist file stored in ~/Library/Preferences. If you have an iPhone or an iPod Touch , you wont need to do this step. From the terminal, run the following:

    defaults write com.apple.iPod Devices -dict-add red-herring '{ "Family ID" = 10001; }'
    


    Step 2. Run Address Book. Go to Preferences, and enable “Syncronize with Google”. Accept the warning/agreenment and put in your credentials.


    Step 3. We change the GoogleContactSync from being an ‘app’ which requires iTunes to trigger the sync, to a ‘server’ which can just be triggered on its own. Again, from the terminal:

    sudo defaults write /System/Library/PrivateFrameworks/GoogleContactSync.framework/Resources/ClientDescription Type 'server'
    
    sudo chmod 644 /System/Library/PrivateFrameworks/GoogleContactSync.framework/Resources/ClientDescription.plist
    


    Step 4. Run iSync. Go to Preferences, and enable “Show Status in Menu Bar”.


    Step 5. Very important. Reboot.


    …and that should do it. Choose “Sync Now” from the iSync menu bar item whenever you want to sync them.


    This entry was posted on Monday, February 9th, 2009 at 11:00 pm and is filed under Apple. You can follow any responses to this entry through the RSS 2.0 feed. Responses are currently closed, but you can trackback from your own site.

    Amanda RAIT Notes

    Introduction



    I recently acquired a Sun StorEdge L1800 tape library - a mid-sized, 48-slot,
    4-drive affair from the late 1990s. Eventually, I started using it with
    Amanda, a free backup and recovery
    scheduler, for basic home backups. Not too shabby.




    Since this thing only uses DLT-7000 drives (35 GB native tapes), it’s not able
    to hold loads of data. Currently, I’m working on using it solely for full
    (aka level-0) backups, with incrementals being done on other media. Of
    course, these are older tapes and drives, so I’d want some level of redundancy.
    Boosting the dump speed (native rate is just under 5 MB/sec) would be nice,
    too. And that’s how I started using RAIT in Amanda.



    RAIT basics


    RAIT, or Redundant Array of Inexpensive Tapes, is pretty well described in
    the Amanda online manual.
    It’s basically a RAID-3 using tape - two tapes are used for a data stripe,
    and a third for parity. Amanda’s implementation seems a bit limited, compared
    to some commercial products; the parity stripe doesn’t rotate around the
    tapes (useful for auto-compressing tape drives - if one drive always does
    the parity, that drive will probably have poor compression and may limit
    the amount of storable data), and you can only use 3 or 5 drives (though you
    can do mirrored backup tapes in amanda, so 2 drives are somewhat supported).


    There’s a brief article in
    the amanda wiki about configuring RAIT,
    and some sample config files in a href="http://www.mail-archive.com/amanda-users%40amanda.org/msg16415.html"this
    mailing list posting,
    so I won’t go into excessive detail here. Essentially, you use the
    chg-rait tape changer, with a config file that specifies a drive, changer
    type and config file, and changer device for each of the drives involved.
    Each drive gets its own changer config, so you can mix and match different
    changers freely - even use a vtape/disk “changer” if you desire. Pretty
    straightforward.


    If you need more info, read the chg-rait script. It’s just a shell script,
    is really short, and has some good comments.

    The Problem With RAIT In A Library


    So, you can pretty easily whip up three changer configs that each use the
    same SCSI generic device to control the changer, but use a different drive in
    the library and a different range of tapes. Be warned, you’ll probably want
    to have 3 copies of every tape label - Amanda will assume each tape in a
    RAIT set will have the same label, and if they don’t match things may complain.


    What they don’t tell you is that chg-rait will pass the same slot number to
    each of the three (or however many) sub-changers. If you’re using three
    configs for chg-zd-mtx, each with different ranges of tapes in the same
    library, this will result in one tape loading successfully (since the
    requested slot is actually in that config’s range of slots) and the other
    two configs failing to find a slot (since the requested slot isn’t in their
    ranges).


    To get around this, I wrote a small fake changer script in perl. It bumps
    the given slot number by a set amount, calls a real changer, and corrects
    the slot number in its output before returning. The changer script, and the
    bump value (number of slots to skip over in the changer) are both highly
    changer-dependant, and are both constants in the script. See below:

    #!/usr/bin/perl5 -w
    #
    # This invokes chg-zd-mtx, but bumps the slot number by a constant amount.
    
    $bump = 11;
    $real_changer = "chg-zd-mtx";
    
    if( exists($ARGV[0]) and $ARGV[0] eq "-slot" ) {
        if( $ARGV[1] =~ m|[0-9]+| ) {
            $slot = $ARGV[1] + $bump;
            $cmd = "$real_changer $ARGV[0] $slot\n";
        } else {
            $cmd = "$real_changer $@\n";
        }
    } else {
        $cmd = "$real_changer $@\n";
    }
    
    # Run the command, and modify the returned slot number
    open(CPIPE, "${cmd}|");
    while() {
        $line = $_;
        if( $line =~ m|^([0-9]+)\s+(.*)| ) {
            $slot = $1 - $bump;
            $rest = "$2\n";
            print "$slot $rest";
        } else {
            print "$line";
        }
    }
    
    # vi:set ai aw sw=4:
    


    Obviously, you’ll need a couple copies of this (I use two, one for the
    second drive, and one for the third) for each of the offsets into the
    library’s tape storage array.

    Quick Start Guide to Debian and WPA Wireless Security


    This document describes the basic elements of WPA wireless security, mainly
    because it’s somewhat confusing and the necessary information is scattered
    around the Internet. For implementation, I’m primarily concerned with
    Debian Linux - especially the current unstable (note: appropriate packages
    will likely be in “etch”, which is currently the testing release).


    *BSD users are, unfortunately, screwed.








    1. Introduction



    2. WPA basics (aka, what the hell is this?)




    3. Debian Installation




    1. Introduction


    Recently my employer changed their campus-wide wireless network. While
    the previous authentication system is still
    maintained, it requires setting a machine’s ESSID explicitly, and will likely
    not be maintained for much longer. The new authentication scheme is based
    on 802.1x and WPA, necessitating some end-user self-education. Given that
    the new system allows for seamless roaming, and supports 802.11g (as opposed
    to just 802.11b), learning about WPA seemed prudent. Since others at work
    also have Debian laptops, this little guide was born.


    Much of the text below constitutes some personal notes on WPA in general.
    If you’re looking for just the basic information, and you’re in a rush to
    just turn things on, you may want to skip to section 3.


    2. WPA Basics (aka, what the hell is this?)

    2.1 Overview


    In the beginning, there was WEP - Wired Equivalent Privacy. Problem is, it
    really wasn’t (equivalent to wired). The crypto used to ensure privacy
    wasn’t very robust, was deployed in an extremely poor manner, and relied on
    shared secrets for any authentication. Not quite what you’d want for a
    robust wireless system.


    This deficiency led to the development of 802.1x - an authentication
    protocol for ethernet (wired or wireless). It allows for a few different
    mechanisms for password comparison, several types of encryption (including
    key rotation plans, where necessary) and much other goodness. Unfortunately,
    this took a while to develop. Companies developing wireless products grew
    somewhat impatient and decided to start shipping an agreed-upon subset of
    802.1x’s capabilities. This is now known as WPA - basically 802.1x, but
    missing a few authentication mechanisms and lacking heavy crypto for
    packet encryption (like AES, for example). Upcoming wireless products will
    (eventually) support WPA2, which is basically full-blown 802.1x.


    Today’s WPA offers both a shared secret and support for real user
    authentication, usually through a RADIUS server on the backend. While
    packet encryption options are a bit limited, it does offer extended key lengths
    over plain ol’ WEP, and provides a key rotation procedure (called TKIP) to
    avoid weaknesses inherent in both WEP’s crypto and its implementation. It’s
    generally considered to be a good enough solution for now, and should last
    until well after WPA2 support starts appearing in commercial products.

    2.2 Hardware Requirements


    Like WEP, WPA requires hardware support to help with the packet encryption.
    Unfortunately, that means only newer 802.11[abg] equipment will work with
    WPA - older gear won’t work, for a variety of reasons (key length, key
    rotation schedules, encryption types, etc.) 802.11g gear is likely to be
    a safer bet, although some of the very first 802.11g equipment predates
    WPA support.


    The Atheos 802.11abg NIC (using the “madwifi” driver) in my IBM ThinkPad X40
    works just fine with WPA.


    Naturally, WPA support must be present on both the NIC (wireless adapter) and
    on the access point. As this page is solely focused on the client drivers,
    I’ll assume whoever operates your favorite access points (you, perhaps)
    already knows what he or she is doing.

    2.3 Software Requirements


    Obviously, you’ll need a Linux driver for your WPA-enabled wireless card
    that supports the WPA features. I use the
    madwifi drivers for
    my Altheos 802.11[abg] card, but that’s just me. You’ll also need a recent
    Linux kernel, with a fairly recent wireless ethernet API. As of this writing,
    I’m using 2.6.13, so anything around that vintage (or newer) should work.


    You’ll also need a supplicant - a daemon program that authenticates your
    ethernet (wired or wireless) connection. The supplicant will set up all
    aspects of an authentication connection, and convince the authenticator
    on the network that your network port is valid. Since this occurs at the
    ethernet layer, you will probably need to DHCP after authenticating (unless
    your network uses static IP addressing) - but any standard DHCP client
    (pump or dhclient, for example) will work at that
    point.


    Like most things in the open-source world, there are two different packages
    to choose from for a supplicant:
    wpa_supplicant
    and open1x.

    2.3.1 wpa_supplicant


    wpa_supplicant started life as a WPA layer for open1x’s
    xsupplicant. It’s since evolved into its own 802.1x-compatible
    supplicant, and no longer needs (in fact, no longer even supports) open1x.
    wpa_supplicant supports nearly every WPA and WPA2 (and 802.1x)
    authentication mode, and is the supplicant described in the text below.

    2.3.2 open1x and xsupplicant


    Open1x is still maintained, and now includes proper WPA support (now that
    it is independent from wpa_supplicant). The open1x developers
    have made some progress toward a GUI to manage the supplicant, but open1x’s
    xsupplicant supposedly supports fewer authentication modes than
    wpa_supplicant.

    2.4 Links


    3. Debian Installation


    If you’re using Debian, congratulations. The good news is,
    wpa_supplicant is already in Debian’s package repository.
    The bad news is, you may need to get it from Debian unstable (aka “sid”).
    It’s not in the sarge release, but will likely be in the upcoming etch
    release.


    If you’re not using Debian, well, good luck. :-)

    3.1 Necessary Packages

    apt-get install wpasupplicant
    


    Make sure /etc/init.d/wpasupplicant starts at boot-time. As
    I write this, 0.4.4 is the current version. Earlier versions of the
    wpasupplicant package will have problems (with madwifi cards, at least).

    3.2 /etc/wpa_supplicant.conf


    wpa_supplicant uses a single config file to store descriptions
    of all wireless networks, along with (optionally) the credentials used to
    access each. Wireless networks are listed in order of increasing priority;
    SSIDs near the bottom of the file are preferred to those listed near the
    top. As an example, let’s consider the following config file:

    # Minimal /etc/wpa_supplicant.conf to associate with open
    #  access points. Please see
    #  /usr/share/doc/wpasupplicant/wpa_supplicant.conf.gz for more complete
    #  configuration parameters.
    
    ctrl_interface=/var/run/wpa_supplicant
    ctrl_interface_group=20
    
    #ap_scan=1
    #eapol_version=2
    #fast_reauth=1
    
    ### Associate with any open access point
    ###  Scans/ESSID changes can be done with wpa_cli
    network={
            ssid=""
            key_mgmt=NONE
    }
    
    network={
            ssid="PAL2.0"
            scan_ssid=1
            key_mgmt=WPA-EAP
            eap=PEAP
            identity="someguy"
            password="somepass"
            ca_cert="/etc/ssl/certs/ca-certificates.crt"
    }
    


    Ignoring the comment lines, here’s what everything means.

    ctrl_interface=/var/run/wpa_supplicant
    ctrl_interface_group=dialout
    


    wpa_supplicant uses a named pipe to communicate with a text-based
    client interface, wpa_cli. wpa_cli can be used to
    select networks, to provide authentication credentials (if they aren’t already
    in the config file), and to force the supplicant to jump through a variety of
    hoops. The above two lines indicate that the pipe will be named
    /var/run/wpa_supplicant, and that it will be owned by group
    dialout“. The group ownership is merely personal preference
    on my part - the supplicant runs as root, so any group can be used for the
    pipe.

    #ap_scan=1
    #eapol_version=2
    


    These are a few useful global options. These just restate default settings,
    but are worth mentioning.


    Modern access points (like those my employer has deployed) are able to
    broadcast more than one network ID (SSID), and potentially respond to
    several more. To find
    these alternate SSIDs, the supplicant must poll access points.
    ap_scan=1 turns on this capability, and indicates that
    wpa_supplicant (not the driver) should handle AP scanning.
    eapol_version indicates which version of the polling protocol
    should be used. Version 2 exists (and is defined in the 802.1x spec), but
    not many access points support it yet - hence using version 1.


    If you operate in an environment with many hidden SSIDs, you’ll probably want
    to set ap_scan=2. With that, when a new access point is detected
    the supplicant will iterate through every network defined in the config file
    to see if a proper network connection can be made (rather than just relying
    on broadcasted SSIDs).

    #fast_reauth=1
    


    Reauthenticate quickly, when asked. The default is to enable this, but it’s
    a useful tweakable parameter (which is why I’m mentioning it here).

    network={
            ssid=""
            key_mgmt=NONE
    }
    


    This stanza defines a network without an SSID. Without an SSID, this will
    bond to any open broadcast network. Without a key_mgmt scheme,
    no authentication will be attempted. Since this is the first network defined
    in the config, any other networks will take priority over this one.

    network={
            ssid="PAL2.0"
            scan_ssid=1
            key_mgmt=WPA-EAP
            eap=PEAP
            identity="someguy"
            password="somepass"
            ca_cert="/etc/ssl/certs/ca-certificates.crt"
    }
    


    This defines a network requiring authentication. Obviously, this network
    is named “PAL2.0“, and requires WPA-EAP to
    set up the authentication connection. Credentials are exchanged using the
    PEAP method (one of several options with WPA-EAP).
    WPA-EAP uses an X509 certificate to protect the authentication
    session. This certificate should be verified by the supplicant -
    otherwise you may be sending your credentials to any network with a matching
    SSID. Not good. That’s why you point ca_cert to Debian’s
    CA certificate file.


    The scan_ssid=1 line indicates that local access points will
    probably not broadcast this network by default - that access points must be
    polled to see if they support this SSID.


    Note that both the username and password to be used for this network are
    stored, in the clear, in the config file. If you’d rather not do this, you
    can omit one or both of these. You can then use wpa_cli to
    add an identity and password on a per-network basis at runtime, so they
    will only be cached in the supplicant daemon. Obviously, authentication
    won’t work until you do this, but it does avoid having cleartext credentials
    in the config file.

    3.3 /etc/network/interfaces


    With the latest wpasupplicant packages, Debian controls daemon
    command-line options through options in your
    /etc/network/interfaces file.
    My interfaces file currently has a stanza like this:

    # Wireless interface
    #auto ath0
    iface ath0 inet dhcp
        wpa-driver madwifi
        wpa-conf /etc/wpa_supplicant.conf
    


    Most of this is self-explanatory. Note that ath0 is the
    network device for my madwifi-powered 802.11 NIC. If you need to change
    the driver, do so with a different wpa-driver option.


    There are a variety of extra options that are supported here. For more
    info, consult /usr/share/doc/wpasupplicant/README.Debian.gz.

    3.4 Using your wireless connection(s)


    With your supplicant running, you should be able to ifup
    <iface>
    and watch your NIC authenticate. Add an
    auto ath0 line to your /etc/network/interfaces
    to automatically bring up ath0 (or whatever your NIC is called)
    at boot - though that’s probably more trouble than it’s worth.


    At this point you should familiarize yourself with wpa_cli.
    Start it with no arguments, while wpa_supplicant is running, to
    get an interactive prompt. A few useful commands:









    helpPrints a command summary
    scanStarts a scan for new SSIDs/networks
    scan_resultsLists the SSIDs wpa_supplicant can currently find
    list_networksLists networks defined in your config, and the number wpa_supplicant will use to refer to them
    select_network <num>Select a particular network to use. Useful if wpa_supplicant isn’t authenticating to the network you want in a busy environment
    terminateCauses wpa_supplicant (not wpa_cli) to quit
    quitCauses wpa_cli to exit

    3.5 Common problems


    I’ve already run into a few frequent hangups with wpa_supplicant,
    and WPA support in general. If you have problems, and solutions, please let
    me know and I’ll add them to the list below.

    3.5.1 wpa_supplicant isn’t locking on to my network


    If you have several networks defined - especially if one of them is an open
    broadcast network (no SSID specified) - or if the SSID you’re looking for
    isn’t the default broadcast SSID of a nearby access point,
    wpa_supplicant may not lock onto the network you want. Fire up
    wpa_cli, use the list_networks command to find the
    internal number of the network you want, and use the select_network
    command to choose that network (and disable all others). Then be patient
    while wpa_supplicant find the network you want and authenticates.
    To hasten that along, use the scan command to start another SSID
    scan.


    Don’t forget that you’ve selected one network when you later try to use a
    different net. You’ll probably need to select another network, or use
    the enable_network command if you move to a different wireless
    coverage area.

    3.5.2 wpa_supplicant authenticated to the right network, but I have the wrong IP


    If wpa_cli status indicates you’ve authenticated, and
    wpa_cli list_networks shows you’re using the network you think
    you should be using, but you’ve got the wrong IP, then you probably just need
    to re-DHCP.


    wpa_supplicant may not scan or authenticate correctly until
    the wireless NIC is up (note: I’m referring to the interface being active - it
    may not yet have an IP address). If you just started DHCPing on that interface,
    it’s possible that DHCP picked up an IP address from a network you weren’t
    intending before wpa_supplicant found and authenticated the correct
    network. DHCP will keep its erroneous IP until the lease timeout expires, so
    you’ll probably want to re-DHCP.


    If this keeps happening (which is likely if the network you want isn’t the
    default broadcast IP on nearby access points) you may want to use the
    select_network command in wpa_cli to disallow all
    other networks while you DHCP.

    3.5.3 After suspend-to-disk, wpa_supplicant‘s
    scan doesn’t pick up any new access points


    This is most likely a minor driver problem. WPA support for Linux is fairly
    new, as is suspend-to-disk, so this may happen occaisionally. Remove the
    driver module for your wireless NIC and reload it (e.g., rmmod
    ath_pci
    , modprobe ath_pci). That usually clears things
    up for me, though your mileage may vary.


    If that doesn’t work, check wpa_cli list_networks to make sure
    you haven’t inadvertently disabled all the networks that exist at your
    present location. Frequent use of select_network can cause some
    confusion.

    Quick Start Guide to Debian with Cisco VPNC Concentrators


    This document details how to make your Debian machine work with a Cisco VPN
    Concentrator (occaisionally these may be called a PIX something-or-other) using
    vpnc, with resolvconf to make handling resolv.conf updates
    easier. It’s
    pretty much assumed that you know all the passwords necessary to use the Cisco
    box, but this may not always be the case. Cisco VPN client software uses a
    login/passwd pair to access the Concentrator. This login/passwd pair is
    separate from the creditials used to authenticate the user, and can be left in
    a hashed form in a config file, given out by your sysadmin. Unfortunately,
    vpnc cannot use the hashed password, and I don’t
    know how to reverse the
    hashing (though it is possible - they aren’t using a one-way hash), so you’ll
    have to bother your sysadmin to find out what the password really is.


    As usual, this guide pertains primarily to Debian Linux users. Other Linux
    users, and *BSD users are, unfortunately, screwed.


    This guide is no longer being actively maintained. If you find an error,
    please let me know and I’ll update the site. However, as my use of
    vpnc has greatly declined recently, I’m no longer keeping this
    page up to date myself.








    1. Introduction



    2. VPNC configuration



    3. Resolveconf



    1. Introduction


    Recently my employer deployed a campus-wide 802.11b network, with connectivity
    in most buildings, common areas, and a few popular eating holes. Unfortunately,
    the powers that be require some sort of authenticated VPN system to access a
    working router - either Cisco’s VPNclient (which is closed-source, proprietary,
    X86-specific, and buggy) or PPTP (which is a Micro$oft protocol, has
    questionable crypto, and is proprietary). Initially, I used various Linux
    PPTP clients; for more details, see my notes.
    Of course, those wacky Debian people had added a new program, vpnc,
    by the time I had made everything work correctly. Figures.


    VPNC is an open-source client
    specifically designed to work with Cisco’s VPN Concentrator, and it doesn’t
    require any kernel modifications, or packages from wierd APT servers, or
    patching. Obviously, I started using that. A little while later, I wrote
    down my notes from that, so random other folk could use it as well. Enjoy!


    2. VPNC Configuration


    VPNC’s upstream source seems to just include one program, vpnc,
    that takes a series of arcane options and builds a tunnel to your Cisco VPN
    Concentrator. No kernel patches are required; at most, you may just need to
    add generic tap and tunnel support to your kernel (though, since I use Debian’s
    packaged kernels, I didn’t need to do this - it’s already in there).


    The Debian package provides a few extra scripts, vpnc-connect and
    vpnc-disconnect that automate everything. Put all the VPN
    concentrator’s options in one config file, and everything will come up when
    you run the vpnc-connect script (and, of course, come down when
    you run vpnc-disconnect). This package is in unstable now, and
    should be a part of the upcoming Sarge (aka Debian 3.1) release. To get
    started, do this:

    apt-get install vpnc
    


    Then write a config file. vpnc-connect configs live in
    /etc/vpnc. The default config is
    /etc/vpnc/default.conf. Below is the config I use to connect to
    the wireless service at Purdue University (aka PAL, the
    Purdue Air Link). I keep
    this in /etc/vpnc/PAL.conf:

    Interface name pal
    IKE DH Group dh2
    Perfect Forward Secrecy nopfs
    IPSec gateway vpn.airlink.purdue.edu
    IPSec ID vpnusers
    IPSec secret vpnusers
    


    Obviously, replace vpnusers in the sample above with the VPN
    Concentrator credentials at your site.


    To bring up a tunnel, just run vpnc-connect []. Without
    arguments, vpnc-connect loads the config in
    /etc/vpnc/default.conf. To load PAL.conf instead,
    run vpnc-connect PAL as root.


    Pretty simple, eh? See why I stopped using PPTP?

    3. Resolvconf


    Ever find yourself joining multiple networks simultaneously, and having to
    manually edit /etc/resolv.conf to reflect the “proper” name
    server for whatever network is used for most of your traffic? You might, if
    you use a VPN tunnel. For example, you may DHCP at home, get a DNS from your
    local broadband provider, then use a VPN to reach the office - where you need
    another DNS server configured in /etc/resolv.conf, to see machines
    on your employer’s internal network. Wouldn’t it be nice if there were a
    Debian package to manage resolv.conf, and select the right DNS
    depending on which networks are available? Well, there is. That’d be the
    resolvconf package. Go, install it:

    apt-get install resolvconf
    


    Yes, you want resolvconf to manage /etc/resolv.conf for
    you. Yes, you want it to create a symlink for /etc/resolv.conf
    for you - it should be linked to /etc/resolvconf/run/resolv.conf
    when the package is installed. Keep this in mind when answering debconf
    questions.


    So, once resolvconf is installed you’ll have an
    /etc/resolvconf directory. You’ll want to edit
    interface-order; it’s a config file that lists interfaces in
    their “preferred” order. As nameservers are identified to the system
    (via DHCP responses, or vpnc tunnels being created) they are
    sorted based on the interface they use and that interface’s order in this file.
    A nameserver using an interface near the top of this file will always be near
    the top of your resolv.conf. Simple. If you use vpnc,
    you’ll want to add the tunnel interface to interface-order.
    Mine, including the interface for Purdue’s wireless network (as defined in
    PAL.conf in the previous section) is:

    # interface-order(5)
    lo.inet*
    lo.dnsmasq
    lo.pdnsd
    lo*
    tun*
    tap*
    pal
    eth*
    ath*
    wlan*
    ppp*
    *
    


    The other file you’ll probably want to edit is
    /etc/resolvconf/resolv.conf.d/base. This lists options that
    should always be in your resolv.conf. Yes, resolvconf is
    smart enough to merge any search list you provide here with whatever a DHCP
    server provides. Mine is:

    search fmepnet.org
    options ndots:2
    


    Incidentally, the “options ndots:2“ line means that a hostname
    must have at least two dots (.) to be considered a fully qualified
    domain; with this option, ping bob.cx will try to ping
    bob.cx.fmepnet.org, if it exists, rather than pinging a server
    in the Christmas Islands.


    So, that’s all there is to it. Set up those few files, and your list of
    nameservers will be quietly, correctly updated any time you DHCP or set up a
    VPN tunnel. Nice. Gotta hand it to those Debian folks, don’t ya?

    Quick Start Guide to Debian PPTP Clients


    This document describes setting up a functional PPTP client with MPPE encryption
    enabled. While it does focus on getting things going with Debian Linux, some
    information may be useful for individuals trying to patch things together using
    other Linux-based systems.


    *BSD users are, unfortunately, screwed.


    At this point I’m no longer actively maintaining this page. If you find errors,
    please let me know and I’ll try to correct them. However, I now use vpnc to
    connect to the local VPN systems; since PPTP is a bit more difficult to set
    up, I went that route. For details on setting up vpnc to work with your
    favorite Cisco VPN Concentrator, look here.








    1. Introduction



    2. PPTP basics




    3. Debian Installation




    1. Introduction


    Recently my employer deployed a campus-wide 802.11b network, with connectivity
    in most buildings, common areas, and a few popular eating holes. Unfortunately,
    the powers that be require some sort of authenticated VPN system to access a
    working router - either Cisco’s VPNclient (which is closed-source, proprietary,
    X86-specific, and buggy) or PPTP (which is a Micro$oft protocol, has
    questionable crypto, and is proprietary). Well, PPTP has an open-source
    implementation for Linux (which is 2.4-specific), so that’s the way to go.
    Since I use Debian, I built some binary packages and made everything work
    nicely. Since others at work also have Debian laptops, this little guide was
    born.


    2. PPTP Basics

    2.1 Overview


    PPTP (short for Point-to-Point Tunneling Protocol) is an increasingly common
    way of creating a VPN (Virtual Private Network) connection between two hosts,
    such as an employee’s laptop/home computer and a server deep in his/her
    employer’s network. An authentication connection between client and server is
    created over the existing default network connection. The PPTP client turns
    this network connection into a virtual network interface; any packets sent over
    the virtual interface are encrypted and sent to the endpoint machine.


    The VPN negotiation and authentication occurs over a TCP connection to the
    server. The protocol is basically CHAPv2 (aka MS-CHAP), just as is used to set
    up a PPP link. Actual packet data goes between client and server as (optionally
    encrypted) GRE/IP packets. Under Linux (and probably FreeBSD and/or NetBSD)
    pptp is used to set up and maintain the connection, but the
    kernel-level interaction is handled by the PPP subsystem. Authentication is
    done via pppd (and can be automated using Debian’s usual PPP
    mechanisms), the network interface created will be a PPP device (eg,
    ppp0) - as far as the kernel’s concerned it’s just a PPP link
    that’s going over a socket connection (set up by pptp) rather than
    out a serial port device.

    2.2 MPPE Encryption


    Most PPTP servers will want the client to support MPPE, a sort of encryption
    (of dubious strength). That provides the “P” in “VPN”. :-)


    To do MPPE (either 40-bit or 128-bit, or the stateless versions) you need both
    a kernel patch (to decrypt PPP frames as they come in over the PPTP link) and
    a patch to pppd (to negotiate the type of MPPE encryption and set
    up the connection appropriately).


    There are currently two different implementations of MPPE for Linux. Both
    provide pppd patches and kernel patches, but they aren’t
    compatible - the pppd and kernel patches you use must come from
    the same site.


    Currently Debian’s pppd (well, the version in unstable, and
    probably the version that will be released with Sarge - the current testing)
    provides MPPE encryption (and MPPC compression, see
    below). Obviously, the trick is finding the right kernel
    patch to match it. :-) Appropriate kernel patches, for both Linux 2.4 and 2.6,
    can be found at the polbox.com
    site. They work - I’m using them to type this now, in fact.

    2.3 Compression (MPPC)


    Along with MPPE there’s one other popular PPTP protocol - MPPC. It provides
    a pretty basic sort of compression over PPP, usable by PPTP. Aside from that
    (and that the polbox.com site
    has patches to support it) I know nothing about this.

    2.4 Links


    Obviously, Debian users will be most interested in the
    polbox.com site; the patches
    from polbox.com have already found their way into pppd‘s upstream
    CVS. However, that
    author only provides implementations for Linux. FreeBSD (and maybe NetBSD)
    users may be more interested in the
    SourceForge site. It’s
    another set of patches (MPPE support only, though - no MPPC), but they do
    provide some notes for the BSDs.


    The observant Debian user may have noticed a kernel-patch-mppe
    package. This patch comes from the SourceForge site and will not work with
    the pppd in unstable/testing. Mildly frustrating…


    Both sets of patches should apply fairly cleanly (though the pppd
    patch from SourceForge may require a bit of tweaking). Each site has all the
    basic instructions necessary to make things work, should you feel like adding
    MPPE support by hand. The rest of this document will focus on getting prebuilt
    packages to work quickly.


    3. Debian Installation

    3.1 Necessary Packages


    So you’re using Debian. Congratulations. The good news is, PPTP has already
    made its way into the core distribution. The bad news is you’ll probably want
    to be running at least “testing” (aka sarge, as of this writing) and preferably
    “unstable” (aka sid). Testing and unstable have the latest upstream version of
    the pptp-linux package, as well as a pppd package
    with built-in MPPE (and MPPC) support. I’ve also made pre-built kernel
    packages to provide MPPE support in Linux 2.4.24. Feel free to add my APT
    archive to your sources.list:

    deb http://www.fmepnet.org/debian/ main main
    deb-src http://www.fmepnet.org/debian/ main main
    


    If you want basic PPTP client support with MPPE encryption, get these packages:

    apt-get update
    apt-get install pptp-linux ppp kernel-image-2.4.24-1-686+mppe
    


    If you intend to build your own MPPE-enabled kernel you’ll also want
    kernel-source-2.4.24 and the patches from
    polbox.com.


    The kernel-image-2.4.24-1-686+mppe package on my site is built
    using the config from kernel-image-2.4.24-1-686 in
    unstable. The package version ends in “+mppe“ to differentiate
    it from the upstream version.

    3.2 Automating via PPP


    All of Debian’s PPP config files live in /etc/ppp. From here on
    in I’ll assume that you want to turn on MPPE encryption; if that’s not the case
    you’ll need to strip the appropriate options out of my config file examples.
    For demonstration purposes, I’ll give the config files I use to access
    PAL (the Purdue Air Link),
    the campus-wide wireless service at Purdue
    University
    .


    The next few sections are a file-by-file description of what you need to change
    to get this working quickly.

    3.2.1 /etc/ppp/peers/PAL


    This file defines the pppd options to use for a given service
    provider. The name of the service provider is the filename - in this case,
    PAL. You’ll want something along the lines of this:

    pty "pptp 10.1.1.14 --nolaunchpppd"
    name <login>
    remotename PPTP
    require-chap
    file /etc/ppp/options.pptp
    ipparam tunnel
    


    For full descriptions of these options consult the pppd man page.
    Naturally, 10.1.1.14 is the IP address of your PPTP server (DNS
    names can be used here, but DNS resolution when not yet authenticated can be
    troublesome on PAL, so it’s best to avoid this hassle).
    <login> should be changed to match your login credential on
    the PPTP server.


    Of course, the “file /etc/ppp/options.pptp“ line means we need an
    options.pptp file, so let’s create that next…

    3.2.2 /etc/ppp/options.pptp


    This file gives a space-separated list of options to pppd for the
    PPTP link. Assuming you’re using the
    polbox.com patches you’ll want
    something like this:

    lock noauth nobsdcomp nodeflate require-mppe mppe-stateful
    


    If you’re using the
    SourceForge patches you’ll
    probably want this instead, since some of the option syntax is different:

    lock noauth nobsdcomp nodeflate mppe-40 mppe-128 mppe-stateless
    


    Once again I’ll stress that you probably want to use the polbox.com patches
    with Linux, since those have already made their way into pppd‘s
    CVS repository.


    For descriptions of these options consult the pppd man page.
    These options will disable compression (usually a problem for PPTP servers) and
    enables all forms of MPPE encryption. If you do not include the
    mppe options pppd may fail to connect and will
    probably leave the confusing log message of “Negotiated MPPE XX-bit
    encryption.” That means that the server requires MPPE, pppd can
    support MPPE, but that since you haven’t explicitly turned on MPPE support you
    won’t get encryption (or a VPN tunnel, for that matter).

    3.2.3 /etc/ppp/chap-secrets


    This file contains credentials for all CHAP-authenticated PPP connections,
    including PPTP connections. Here’s an example:

    # Secrets for authentication using CHAP
    # client        server  secret                  IP addresses
    <login>        PPTP    <passwd>        *
    PPTP        <login>    <passwd>        *
    


    Obviously, replace <login> and <passwd>
    with your login and password on your PPTP server. Both PPTP lines will usually
    be necessary. If your chap-sercrets file needs to contain several
    entries you’ll probably want to replace the *s with the actual
    IP addresses of your PPTP and PPP servers. Consult the pppd man
    page for more details.

    3.2.4 /etc/ppp/ip-up.d/000netfix


    This script isn’t strictly necessary. However, if you’re using a service like
    Purdue’s wireless network that requires you to use a PPTP tunnel as your
    default route a script like this is essential:

    #!/bin/sh
    
    if route -n | grep -q ^10.1.1.14; then
        route del default
        SUBNET=`route -n | grep ^10 | grep eth | awk '{print $1}'`
        DEVICE=`route -n | grep ^10 | grep eth | awk '{print $8}'`
        ROUTER=`echo $SUBNET | sed -e 's/\.0$/.1/'`
        route add -host 10.1.1.14 gw $ROUTER dev $DEVICE
        route add default dev ppp0
        ifconfig ppp0 mtu 1400
        cat <<END > /etc/resolv.conf
    search purdue.edu
    nameserver 128.210.11.57
    nameserver 128.210.11.5
    END
    fi
    


    If the PPTP tunnel (in this case we’ll assume it’s ppp0) is to
    be the default route there must be a way to reach the PPTP server. Purdue’s
    wireless connection will put you on a non-routed, private network that can
    reach the PPTP server (10.1.1.14) and nothing else. Once you
    set up a PPTP tunnel as the default route your machine will try to send all
    packets over the tunnel including the encapsulated IP packets bound for
    the PPTP server
    . This little script detects that Purdue’s PPTP server
    is being used and makes certain (via the route add -host line)
    that there’s a non-PPTP route to the server. In addition, this script sets up
    proper nameservers (since after using PPTP with Purdue’s link you’ll wind up
    having bogus DNSes on the inaccessible 10.1.1.14 network).


    This script should be owned by root (and grouped to root), mode 0755.

    3.3 Starting the link


    Assuming you’ve been following the examples closely thus far, you can now bring
    up the PPTP tunnel by typing pon PAL at a root prompt. To bring
    it down, use poff PAL. For the connection error logs, use the
    plog command.


    Unfortunately, because pppd uses several unusual extra programs to
    bring up a PPTP tunnel you probably won’t be able to make this work with most
    graphical connection managers (eg, kppp). If you manage to make
    your favorite GUI PPP dialer start and stop a PPTP connection please drop me an
    email and I’ll post a little blurb here.