Building Hexo with AWS

A couple years ago, I set up a blog with Hexo. Like most blog authors, I promptly stopped adding to it. But still it persisted, living in AWS for the paltry sum of $1.23/month.

I’ve decided I’d like to be able to commit to it again - but that keeping a machine (or even a VM) alive just to run Hexo periodically is a pain. It’s time to make this blog auto-build on a git commit. Turns out, it’s pretty easy to do that with AWS CloudPipeline and CodeBuild.

There are already several blogs showing how to get this going (like this one). If you use AWS CodeCommit to hold your Hexo git repo this is pretty easy…except I spent far too long sorting out a couple fine details.

Making it build

I used AWS CodeBuild - just set up a build, point it at the CodeCommit repo, pick a branch (like master), and away it goes….mostly. CodeBuild uses a special file, buildspec.yml, to drive the build. Here’s mine:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
version: 0.2

phases:
install:
commands:
- npm install -g hexo-cli hexo-deployer-aws-s3
- npm install

build:
commands:
- git clone https://github.com/HmyBmny/hexo-theme-concise.git themes/concise
- hexo generate

post_build:
commands:
- hexo deploy

The install phase is used to customize CodeBuild’s virtual machine to suit your build. After install, CodeBuild will copy in your repository - but oddly enough, behavior varies. I ran CodeBuild manually, and got a git checkout of the repo; run it via CodePipeline, and I got copies of the repository contents (but not as a git checkout). Originally I used a git submodule to hold my preferred theme; that broke when I moved to using CodePipeline (since a copy isn’t really a full git checkout).

To compensate, I do an explicit git clone of my favorite theme in the build step - then run hexo generate to compile up the blog. Just like a Makefile, but with more YAML.

Finally, post_build deploys the mess. Normally, you’d want to use this step to save your build artifacts (presumably in S3, or something), and follow up with a CodeDeploy process in your CodePipeline…but that’s kinda overkill for this.

A note on S3 access

A CodeBuild project is (like all AWS things) associated with an ARN. When running the buildspec.yml, it acts with a pre-associated role - look in the “Environment” config for a “Service role” (it’s arn:aws:iam::849931445269:role/service-role/codebuild-fmepnet-blog-service-role, or something like that, for my case). You’ll need to give that access to write into your S3 bucket. I added this snippet to my bucket to enable appropriate access:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
{
"Sid": "Stmt1595998013517",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::849931445269:role/service-role/codebuild-fmepnet-blog-service-role"
},
"Action": [
"s3:GetObject",
"s3:PutObject*",
"s3:List*",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::fmepnet.org/*"
}

And then, CodePipeline

Adding CodePipeline should be pretty easy - just point it at your existing CodeCommit repo (and branch) and the CodeBuild that compiles it. Then all you need to do is push a commit and within seconds the pipeline will trigger, build your blog, and push it to S3.

Of course, if you use CloudFront as a CDN on top of your S3 blog, be prepared to wait for the cache to time out. Maybe at some point I’ll add cache invalidation for new posts, but that’s a note for another evening.

A Tale of Two Access Points

I’ve used Apple products for the last several years - let’s face it, I’ve become a fanboy. Recently, I began replacing switches around the house with Ubiquiti’s UniFi gear, so when I heard Apple was ceasing development on their in-house WiFi gear I opted to revamp the house with a new Ubiquiti access point. Like any good systems geek, I wanted to capture a before and after view - especially since I opted for a UAP-AC-HD (Ubiquiti’s 802.11ac “wave 2” 4x4 MU-MIMO access point).

At $350 (cheaper on Amazon) each, I wanted to make sure I got my money’s worth.

Surprisingly, I ended up with a bit of a mixed bag - and learned something about my Macbook Air in the process. I like graphs, so I figured I’d summarize and post the experience here.

WiFi at Home

My house is a large vinyl-clad box in Indiana. It’s around 3200 square feet, but it’s rather densely-packed into a 40-foot cube. The first floor had an older Airport Express (the wall-wart 802.11n model) in the kitchen at the back of the house, and an Airport Extreme (the modern pillbox with 802.11ac) in a more central location. The second floor just had a newer Airport Express (still 802.11n, though) in a central room at the front of the house - a fair ways away from the master bedroom in the back corner. Originally, I just had the two Airport Express units, but had poor reception around the living room (particularly after adding a number of WiFi devices, Lutron switches, and Hue bulbs). The Airport Extreme fixed that right up.

I initially measured three configurations - all three Apple APs as-is, the Airport Extreme alone (with the radios turned off of the two Express units), and a new UAP-AC-HD in place of the Extreme (which was unplugged). I later added a fourth round of tests for the UAP-AC-HD, with the transmit power raised to maximum and only one ESSID available (as my initial deployment had two networks, but I’d read that having more than one can decrease the performance in certain Ubiquiti APs).

Each measurement was done with iperf3 -c server.local -i 10 -t 10, running against iperf3 running on a well-connected server in my basement. I’m using a MacBook Air (6,2) for my wifi clients, with most other clients either idle or turned off. Bandwidth was measured from the living room couch, a chair next to the access point in the den, a green couch upstairs, and in the master bedroom.

Initial Charts and Graphs

To quote Lisa Simpson, I make lots of graphs. First, the base case:

Average Throughput, Initial Network

Merely looking at the averages here doesn’t really illustrate the network’s true behavior. While measuring, I went from the downstairs couch to the den, then to the green couch upstairs, then the bedroom - then repeated the process in reverse, and then again in order for a third run. Only three runs isn’t necessarily statistically significant, but even I need to draw the line at calculating a Chi Square test on my off time.

In this case, an interesting effect emerges looking at the three runs in isolation:

Individual Throughputs, Initial Network

The first run went at reasonable speeds - over 400 Mbps from my couch, nearly 600 Mbps in the den (right next to the Airport Extreme), and still 277 Mbps from the green couch upstairs. The bedroom, though, was just far enough away to cause my laptop to lose the 5 GHz signal. Dropping down to the 2.4 GHz radio was a big problem, since the laptop never recovered (at least, not until the WiFi adapter was cycled).

Removing the older Airport Express units (turning off the radios) helped somewhat:

Average Throughput, Airport Extreme Only Individual Throughputs, Airport Extreme Only

Most likely, once my laptop had dropped down to the 2.4 GHz radio it was connecting to the older Airport Express units (which were farther away from the various couches). Still, the fundamental issue remained; the Airport Extreme was great as long as I was on the 5 GHz radio, but once off I would never get back on again.

Ubiquiti’s UAP-AC-HD

Switching to the UAP-AC-HD, it was immediately obvious that the overall throughput was lower than the Airport Extreme:

Average Throughput, UAP-AC-HD Individual Throughput, UAP-AC-HD

Oddly enough, the Ubiquiti was able to better reach the upstairs bedroom - but much less capable when a few feet away in my den (316 Mbps vs 598 Mbps with the Airport Express). Worse, my laptop still suffered from the same behavior on the 5 GHz radio; a trip upstairs was enough to lose the connection, and it never fully rejoined (though the UAP-AC-HD seemed to have much better 2.4 GHz performance, as evidenced by runs #2 and #3 in the graph above).

Ubiquiti can set the AP strength in software; by default, it’s set to “Auto”. If I set this to “High” I get much more consistent results, for only a nominal increase in power consumption:

Average Throughput, tuned UAP-AC-HD Individual Throughput, tuned UAP-AC-HD

Yes, performance upstairs isn’t the greatest, and throughput is still less than that achieved with the Airport Extreme. But at least with the power set to High, I never lose the 5 GHz signal - meaning that however I roam around the house, I don’t drop off the faster radio.

Final Thoughts

I’ll be keeping this AP, despite the obvious peak performance issues. I don’t rely on a high streaming speed from my laptop, and the integration with the rest of Ubiquiti’s UniFi system (plus the insight it brings into what the kids are doing online) makes it more worthwhile. The recent rumblings about Apple effectively abandoning their access point lineup may have something to do with that decision as well…

All in all, it’s not a bad AP - though without a lot of MIMO-capable wireless gear, and a comparatively limited number of devices (far fewer than the 500 the AP claims to support) it’s honestly a lot of overkill for my house. Still, with the ease of setup and management, I could see myself buying another couple of these (or at least the UAP-AC-Pro predecessor) to round out the coverage in the old homestead.

Dynamic DNS on Route 53

At the start of the new year, I moved my domains to Amazon’s Route53. This was remarkably painless, but by itself Route53 doesn’t support any dynamic DNS updates. Most dyndns systems either use TSIG, or some kind of proprietary RESTful service; there’s a number of home routers that support both approaches (at least for popular dyndns providers). When I was running my own nameserver, I kept a couple TSIG keys to do updates for a record pointing to my home. Unfortunately, that’s now completely broken - my domain no longer has a record pointing to the DHCP address I get from my ISP. Time to fix that.

As a colleague of mine once said, “Overkill is a subset of kill.” With that in mind, I figured this would be a great excuse to also make something more practical in AWS Lambda, and configure up an API Gateway to boot. Arguably overkill, but a fun learning exercise.

General Design

I built a small Python program to run on a server at home to request the update (called ddupdate). ddupdate takes a fully-qualified domain name (aka FQDN), the Amazon ZoneID for the relevant domain, and a security token, then passes all these along as query string parameters to a given URL. The URL is provided by Amazon’s API Gateway, and is a simple wrapper around an AWS Lambda function (also written in Python) that does the heavy lifting. TTL, IPv4, and IPv6 addresses are optional parameters passed in the query string. If no IPv4 address is provided, the Lambda picks apart the calling data structure provided by the API Gateway and uses the client’s percieved IPv4 address (which is usually the right thing to use, especially for my home servers).

The security token is a SHA-256 hash of the FQDN, ZoneID, and a secret string known only to the Lambda function. The Lambda can easily verify whatever token is provided by ddupdate, but this does mean the token needs to be shared with the client beforehand. I’ve also whipped up a short ddmktoken script that generates appropriate tokens. As-is, this would be vulnerable to replay attacks and server spoofing, so it’s very important to use an HTTPS URL (like Amazon’s API Gateway uses by default) for contacting the server.

All relevant code is up in my dyndns Github repositorty. Full disclosure: if you pick through the commit history you’ll see I originally implemented the authentication token bits with SHA-1. Naturally, two days after I finished up I saw Google’s announcement about SHA-1’s first collision attack. Yeah, yeah, I know - I shouldn’t have been using SHA-1 in 2017 in the first place. Point taken, mea culpa.

Lambda Function & Roles

The source for my Lambda function can be found in my Github repo; it’s pretty self-explanatory. The function does need a role capable of modifying the zone in Route53, though. Simple enough; just go into AWS’ IAM web UI and create a new role. It needs two blocks of policies: one for modifying Route53 entries, and one for executing AWS Lambda functions. Rather than combining these, I just made two separate policies and added both to the new DynamicDNSmodifications role.

For DNS updates, I used:

1
2
3
4
5
6
7
8
9
10
11
12
13
    "Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"route53:*"
],
"Resource": [
"*"
]
}
]
}

Obviously, that can be locked down to a single ZoneID resource if you have multiple zones (and don’t want dynamic DNS functionality for all of them). For Lambda execution, I used:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
    "Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "logs:CreateLogGroup",
"Resource": "arn:aws:logs:us-east-1:849183371819:*"
},
{
"Effect": "Allow",
"Action": [
"logs:CreateLogStream",
"logs:PutLogEvents"
],
"Resource": [
"arn:aws:logs:us-east-1:849183371819:log-group:/aws/lambda/publish:*"
]
}
]
}

Again, you probably want to lock this down (in particular, to your account’s preferred AWS log group).

Once both policies are attached to a new role, the role should appear in the Existing role drop-down list on your Lambda function’s Configuration tab. Note that the drop-down will only show roles with the Lambda execution/logging policy attached; if your role lacks this, it may not even be considered a valid role for a Lambda function.

API Gateway

Once the Lambda is set up, and given an appropriate set of policies/roles, creating the gateway is trivial. Just need to make a stage for production use, and copy the “Invoke URL” at the top of the production stage’s editor page. That’s what the client tool will need when it goes to update DNS.

Client Tool

Once configured, ddupdate should be able to change records in Route53 whenever it’s run - I’m currently doing every 15 minutes from cron on a home server. Once the command runs, it may take up to 2-3 minutes for Amazon to actually process the request (one of the consequences of deploying DNS changes to multiple time zones, most likely).

It may not be a good idea to leave everything as a command-line option, so ddupdate supports an INI-style config file in either /etc/dyndns.conf or /usr/local/etc/dyndns.conf. It’s possible to list multiple hostnames in the same file, and provide multiple FQDNs on the command line to assign multiple domain names at once. Syntax should look like:

1
2
3
4
5
6
7
8
[host1.example.org]
url=https://asdfli8asd1.execute-api.us-east-1.amazonaws.com/Production
token=12819401012847129401128
zoneid=UDN7DNA2AFIC001
[host1.example.org]
url=https://asdfli8asd1.execute-api.us-east-1.amazonaws.com/Production
token=18219848272727192477179
zoneid=UDN7DNA2AFIC001

Obviously, replace the values with real data for your site.

If this is of use, feel free to borrow it heavily!

Playing with AWS Lambda

I started playing with AWS Lambda tonight. Eventually, I’d like to use Lambda and the API gateway to provide a sort of DDNS (dynamic DNS); a script should run on my home server, touch the API gateway, and Lambda should reprogram an address in Route 53 to match whatever was used for the origin IP. That involves passing a few parameters around, so I figured a good first step was to write a Lambda to collect any arguments and email them to me. Should be easy, right? There’s even a quick example on sending email in the Python smtplib docs. Should be easy, right?

The joys of Amazon email-handling

It turns out Amazon’s Lambda environment doesn’t allow connections to just any SMTP server; you need to use one of the servers that provide AWS’ SES (Simple Email Service). Amazon provides several servers, one per region; use what’s closest. Connections to all other mail servers will fail with a generic “Connection closed” message (presumably Amazon is simply resetting these connections as they’re opened).

Once I was able to open a server connection, I started getting failures due to a lack of authentication. Amazon charges by the email, so I needed to create an IAM user to handle my mail sending (and add Python code to turn on STARTTLS and actually log in). I used the SES credential creation wizard, but any IAM user with the AmazonSesSendingAccess inline policy will work as well. In a custom policy, ensure you’ve allowed the ses:SendRawEmail action.

After that, I started getting errors about my sending and receiving email addresses not be “verified”. Turns out Amazon won’t let you send email unless you’ve proven you own the addresses or domains involved. In my case, I verified my domain with Amazon SES, and (since these were largely testing emails) stuck to sending emails to myself.

By the way, the SES verification directions indicate that the verification is region-specific. If you use multiple SES endpoints, you’ll need to verify your email addresses or domains with each one. For domains hosted by Route 53, this process is easy - there’s even a button to propagate records to Route 53 right from the SES console. There’s also support for DKIM, a system for identifying the validity of emails. Must remember to look into that further someday…

Lambda and API Gateway

Creating an API Gateway interface to a Lambda function is pretty easy, once the lambda already exists. Since I wanted to inspect the HTTP headers coming into the gateway, it was important to turn on the Lambda Proxy Integration checkbox. With that, AWS will expect to get back a dictionary (of headers, body, and statusCode) in return. Much of the API Gateway documentation indicates that this should be a JSON dictionary, but if the Lambda is written in Python the Gateway expects a native Python dictionary back.

The API Gateway will pass useful things to your function in the events and context variables. events contains all the HTTP headers, browser info, etc., while context includes any additional information (including meta-parameters, like permissable runtime). In Python, the context variable is actually an object of typeLambdaContext; useful API client data is probably in context.client_context (though that will be None if nothing is passed).

For my purposes, I’m most interested in events['requestContext']['identity']['sourceIP'] - a string containing the client IP address. I’ll turn that into the basis of a dynamic DNS API in the near future. For now, here’s the code I’m using for my test lambda function:

[Test Lambda function] []
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import smtplib, pprint
from email.mime.text import MIMEText

sender = "www@example.org"
recipient = "me@example.org"
server = "email-smtp.us-east-1.amazonaws.com"
username = "ABCDEFGHIJKLMNOPQRST"
password = "At8aj2lvnASuweAvKu3v49siaselinv492nn1jlHFadjJsjsjwl"
port = "587"

def lambda_handler(event, context):
pp = pprint.PrettyPrinter(indent=4)

rdict = {}
rdict['body'] = "Hello from Lambda: <br><pre>" + pp.pformat(event) + "</pre><br><pre>" + pp.pformat(context.client_context) + "</pre>"
rdict['headers'] = { "Content-Type": "text/html" }
rdict['statusCode'] = "200"

msg = MIMEText(pp.pformat(event))
msg['Subject'] = "Test from lambda"
msg['From'] = sender
msg['To'] = recipient

s = smtplib.SMTP(host=server, port=port)
s.starttls()
s.ehlo()
s.login(username, password)
s.sendmail(sender, [recipient], msg.as_string())
s.quit()

return rdict

Replace the sender, recipient, username, and password variables with your own values.

When visiting the API Gateway with a web browser, I get output similar to the following:

[Lambda sample output]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
Hello from Lambda: 
{ u'body': None,
u'headers': { u'Accept': u'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
u'Accept-Encoding': u'gzip, deflate',
u'Accept-Language': u'en-us',
u'Cache-Control': u'max-age=0',
u'CloudFront-Forwarded-Proto': u'https',
u'CloudFront-Is-Desktop-Viewer': u'true',
u'CloudFront-Is-Mobile-Viewer': u'false',
u'CloudFront-Is-SmartTV-Viewer': u'false',
u'CloudFront-Is-Tablet-Viewer': u'false',
u'CloudFront-Viewer-Country': u'US',
u'Cookie': u'regStatus=pre-register; s_dslv=1482545852452; s_fid=023C0FA3C5B564D7-149E1840C1D08425; s_nr=1482545852462-New; s_vn=1514081675457%26vn%3D1',
u'DNT': u'1',
u'Host': u'81i44Fkwn.execute-api.us-east-1.amazonaws.com',
u'Referer': u'https://console.aws.amazon.com/apigateway/home?region=us-east-1',
u'User-Agent': u'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/602.3.12 (KHTML, like Gecko) Version/10.0.2 Safari/602.3.12',
u'Via': u'1.1 9184810928a31c0038199.cloudfront.net (CloudFront)',
u'X-Amz-Cf-Id': u'KNiwjv9wnadjwJFHbWJCjbdbdyyxx==',
u'X-Forwarded-For': u'89.33.210.12, 205.251.252.177',
u'X-Forwarded-Port': u'443',
u'X-Forwarded-Proto': u'https'},
u'httpMethod': u'GET',
u'isBase64Encoded': False,
u'path': u'/',
u'pathParameters': None,
u'queryStringParameters': None,
u'requestContext': { u'accountId': u'175919371',
u'apiId': u'81i44Fkwn',
u'httpMethod': u'GET',
u'identity': { u'accessKey': None,
u'accountId': None,
u'apiKey': None,
u'caller': None,
u'cognitoAuthenticationProvider': None,
u'cognitoAuthenticationType': None,
u'cognitoIdentityId': None,
u'cognitoIdentityPoolId': None,
u'sourceIp': u'89.33.210.12',
u'user': None,
u'userAgent': u'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_2) AppleWebKit/602.3.12 (KHTML, like Gecko) Version/10.0.2 Safari/602.3.12',
u'userArn': None},
u'requestId': u'713e07f6-db86-11e6-9fb8-3b48cbecf41e',
u'resourceId': u'lasdu138cjc',
u'resourcePath': u'/',
u'stage': u'Production'},
u'resource': u'/',
u'stageVariables': None}

None

Took a bit of digging, but now I have a nice little URL I can visit that calls a Lambda, prints off the various arguments, and even emails me the event data. Nifty!

Moving to Route 53

I’ve run my own nameservers for years. A decade or two ago, setting up a DNS was a fun way to learn how bits of the internet worked; I even hosted backup name services for a couple friends. While nothing here is really broken, I’ve begun using Amazon’s CloudFront and need to make a change. CloudFront (Amazon’s web caching system) uses a randomly-generated hostname and dynamic set of IP addresses, so if you want a static name in your own domain you’ll either need to use a CNAME record or host the domain’s nameserver with Route 53. Unfortunately, since I’d like to have the toplevel of my domain (e.g., http://example.com) point to a CloudFront address, the CNAME option is right out. CNAME records aren’t really allowed for a domain, and can cause all sorts of practical problems.

Setting up a Route 53 DNS host was extremely easy - just go to the Route 53 dashboard, hit Create Hosted Zone, and fill out the (short) form. I just imported my zone config file verbatim (click the new hosted domain name in the list, then Import Zone File, and paste your existing content in the dialog box that appears). AWS automatically changed my NS and SOA records to match their servers and imported everything else (even the AAAA records). Once the hosted domains were in Route 53, all I had to do was navigate to my registrar (Dotster, for now) and enter Amazon’s provided nameservers instead of my own. After the TTL expired, my names began serving from Amazon’s infrastructure.

With all the plumbing re-routed, adding the CloudFront linkage was pretty simple. The only less-than-obvious piece was the Alias radio button in the Create Record Set interface; changing this from “No” to “Yes” changes the form, and provides a list of alias targets (CloudFront distributions, S3 buckets, and Elastic Load Balancers/IPs) to choose from. Save Record Set, and now you’ve got a geographically-distributed, highly redundant infrastructure for serving a low-traffic blog. Nice!

A Hexo Blog Part 3: Serve It with CloudFront

I’m still setting up a Hexo blog in Amazon’s AWS, and the next step on the game plan is to front the AWS S3 bucket with CloudFront. On the plus side, this seems to be incredibly easy. Amazon even has some very thorough documentation on how to set up CloudFront to be a basic web cache.

The first time I made a CloudFront distribution I forgot to include a default root object. It’s an easy fix; make sure the root object is set to index.html, and things should load up fine. It’s also worth noting that I picked my site’s S3 bucket for an origin (rather than the S3 website URL, as indicated in the CloudFront docs. Not sure why this isn’t recommended, but it seems to work fine.

2017-01-07 update: Using your S3 bucket directly from CloudFront (rather than the S3 website URL) doesn’t work fine. Sure, the toplevel page displays, but pages in subdirectories don’t show up. Oops! I see why Amazon says you should use the URL - as soon as I switched it over, everything behaved nicely.

Multiple Origins

In addition to this blog, I also use a webserver for a few other dynamically-created things. These don’t need to be public, or scale as broadly as a blog, but they do need to appear from my domain’s main web server. Thankfully, CloudFront lets you do that by creating multiple origins.

Once you have a distribution made, go back to the CloudFront dashboard and click its ID. Select the Origins tab, then Create Origin. I entered in the domain name of my web server (e.g., myhost.example.com, not www.example.com - that would be the CloudFront service) for the Origin Domain Name, and left Origin Path blank. Then go to the Behaviors tab, and click Create Behavior. I used example/* for a Path Pattern; be warned, this must agree with the web server’s configuration! The web server must be set to respond to GET requests to /example. If your server is set to serve its content on /, then the Origin Path for this origin would need to be set to /example. I mistakenly set both the Origin Path and the Path Pattern when setting this up, and got my web server’s error screen back from CloudFront. The server received a GET /example/example - the Origin Path and Path Pattern were combined. Best to pick one or the other, not both.

By the way, if you do have dynamic content (like I do), it’s probably guided by a query string or set of cookies. There are options for the behaviors on your distribution to forward some or all of these on to the origin. You can even set a regex-style whitelist, if you only want a few query string elements or cookies to be sent along. In my case, I forwarded all query strings over (and ignored cookies) and things seemed to work perfectly. I say “seemed”, because a day later I realized my content was no longer updating: CloudFront’s default TTL is a bit too long for me. At some point I’ll need to modify that application to return HTTP Cache-Control tags.

Deployment Times

AWS docs indicate a deploy should take about 15 minutes (and editing origins and behaviors count as a re-deploy). In my experience, this can take considerably longer - often up to an hour or so. It’ll synchronize eventually, so give it a lot of time.

A Hexo Blog: Part 2 (Hexo into AWS S3)

After setting up a basic Hexo blog, the next logical step is to start publishing the blog to AWS S3. In the past, I’ve used a dedicated VM for this sort of thing, but that means I’ve still got a machine to patch, update, and care about. If I use S3, Amazon takes care of all that - and the costs are lower, to boot.

For the most part, I’m cribbing from Sven Flickinger. However, after following his directions I got some AWS permission errors, so I’m documenting my steps here as well. Your mileage may vary…

First, we need to add the deployer:

1
npm install --save hexo-deployer-s3

This requires a new config stanza for _config.yml:

_config.yml
1
2
3
4
5
6
deploy:
type: s3
bucket: <bucket>
aws_key: <key>
aws_secret: <secret>
region: <region>

At some point, you’ll need to log into AWS and start making an S3 bucket. Buckets need a name, and a region; for my blog, I used the domain name for the bucket name and stuck it in us-east-1. Be sure to enable website hosting, and list index.html as the Index Document.

Once an empty bucket is created, we’ll also need an IAM user with appropriate permissions to upload the blog pieces. Creating an IAM user is simple; be sure to create an access key when you do (or go back into the IAM display, hit the Security Credentials tab, and click Create access key). The access key ID and secret key need to be plugged into _config.yml, or in environment variables AWS_KEY and AWS_SECRET (and removed from _config.yml).

The new IAM user is going to need permissions to manipulate the S3 bucket. Sveen gave a short policy doc, but I found that to be incomplete - at least for the first deploy. Go into IAM, Create Policy, and use the Policy Generator. You can edit the policy document; I’m using this:

S3 Access Policy
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "S3blogFullAccess",
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::example.com/*",
"arn:aws:s3:::example.com"
]
}
]
}

The Sid field is an arbitrary string (no whitespaces) to name the policy. The Resource field lists all things this policy can act upon; it’s important to list both the contents of the bucket (arn:aws:s3:::example.com/*) as well as the top-level of the bucket itself (arn:aws:s3:::example.com). Without both, hexo deploy won’t be able to function.

Once the policy is made, select it (from IAM‘s Policies sidebar) and click the Attached Entities tab. Hit Attach, then pick your user from the list of IAM users. That should be it; hexo deploy should function now.

After the first deploy, your blog should be accessible via <bucket>.s3-website-<region>.amazonaws.com - a not-too-friendly domain name. At some point I’ll get around to integrating this with AWS CloudFront, to put it under a more human-readable name (and will probably type up another blog post as well).

A Hexo Blog: Part 1

For years, I haven’t done much of anything with a personal web site. Around the turn of the century, I played with all the popular PHP/MySQL-powered blog and photo album systems, but these have largely been left by the wayside. Up until recently, my personal web site was merely a few static HTML pages’ worth of notes, running on a hosted virtual machine.

Recently, I decided to start delving into serverless computing (mainly around AWS). Overhauling my personal site, and moving it into AWS, seemed like a nice way to do so. I like having a static web site, but more modern tools would be really handy. A friend recommended I try Hexo, so here we are.

Hexo is a Node.js engine for generating and deploying a set of static web pages for a blog site. Source can be comfortably checked into git, and the hexo CLI tool will generate web pages and deploy them as needed.

Hexo+AWS Game Plan

So here’s my initial plan:

  1. Set up Hexo in AWS’s Code Commit git tree.
  2. Get Hexo to deploy to AWS S3
  3. Front the web server with AWS CloudFront (for theoretically infinite scaling)
  4. Use AWS Lambda to automatically rebuild the blog on every git commit

For now, I’m just covering the first item. The rest are topics for future days…

Getting Started with Hexo

There’s lots of “getting started with Hexo” sorts of postings out there, plus a fairly fleshed-out bit of documentation on the Hexo web site, so I’m not going to go into tons of detail here. Similarly, there’s plenty of documentation on AWS CodeCommit, and how to set up an initial repository, so I’m only going to cover a few oddities here.

Hexo requires a slew of Node.js dependencies. npm is your friend, and will put things in node_package by default. For future reference, I added these:

npm dependencies
1
2
3
npm install --save hexo-cli hexo hexo-renderer-jade hexo-renderer-pug hexo-renderer-stylus hexo-generator-archive hexo-generator-tag hexo-generator-feed hexo-generator-sitemap hexo-browsersync
git add packages.json
git commit -a

That bit about packages.json wasn’t obvious to me (a Node.js-neophyte) initially. Apparently running npm install will parse your packages.json file, and auto-install anything listed therein. Much easier than mucking about with system dependencies, or checking piles of Node scripts into your blog’s git tree. Really gotta learn more about Node one of these days…

Anyway, being a long-time UNIX fan I whipped up a quick Makefile to build everything:

1
2
3
4
5
6
7
8
9
10
11
12
13
all: node_modules public

node_modules: package.json
npm install --save

public: source node_modules/*
hexo generate

distclean: clean
rm -rf node_modules

clean:
rm -rf public

For now, my fledgling blog just lives in an AWS code tree. Eventually, though, I should get around to the other points listed above (though that will be several subsequent posts).

OS X and Dynamic DNS Updates

OS X (and other Apple things) and Dynamic DNS Updates



A while ago I found a couple notes on Dynamic DNS, using TSIG and dynamic
updates, and put together a dynamic subdomain for my domain. Apple gear (or
at least OS X) seems to require a couple particulars to function, and since
I keep forgetting what’s necessary I put together these notes.

Apple-isms


Apple calls TSIG-signed dynamic updates “dynamic global hostname”. On OS X,
this can be turned on in System Preferences; hit the “Sharing” button, assign
a hostname (short name, not fully-qualified), and click the “Edit…” button.
Check the “Use dynamic global hostname” box.


Hostname should be the fully-qualified domain name you want to update. Oddly
enough, the User should also be the FQDN you wish to update. The
Password is the TSIG key.

DNS Config


For this to work, you’ll need a few special records in DNS; Apple calls this
“Bonjour”. Really, it’s a bunch of SRV records. Add the following:

; DDNS update glue
_dns-update.udp         IN      SRV     0 0 53  ddns.example.com.
b._dns-sd._udp          IN      PTR     @    ; browse domain
db._dns-sd._udp         IN      PTR     @    ;
dr._dns-sd._udp         IN      PTR     @    ;
lb._dns-sd._udp         IN      PTR     @    ; legacy browse
r._dns-sd._udp          IN      PTR     @    ; registration domain


Explanations of all these SRV records can be found in the references, below.


In addition, you’ll need to configure your domain to support DDNS, and set up
a TSIG key for your machine. When you set up the TSIG key, you’ll need to
make a 256-bit MD5 authentication code:
dnssec-keygen -a HMAC-MD5 -b 256 -r /dev/urandom -n HOST host.ddns.example.com.


Don’t forget the trailing period on the hostname when using
dnssec-keygen. It’s not necessary for OS X, but bind
really likes it.


One last little oddity: it looks like Apple devices will only update a single
nameserver with their changes. If you have multiple DNSes listed as
authoritative for your dynamic zone, you’ll want to make the first one listed
(and the one listed in the update glue record) able to receive updates, then
fashion some method of replication to your other nameservers. It appears
that if Apple gets a successful submission from one server, it never bothers
to attempt injecting the update into other machines (but it will fail over
to other nameservers, and update them, if the first one fails to respond).

References


Here’s a few links I found handy in piecing things together:

Bandwidth Delay Product Tuning & You


Wide-area networks abound, and fast networks abound (where fast is > 200 Mbps),
but your average consumer will never deal with both at the same time. As of
this writing (late 2013), typical US broadband connections are 40 Mbps or less.
Generally less. Most operating systems seem to be tuned to work acceptably
well across the Internet at these speeds by default. Unfortunately, users of
faster links (like gigabit ethernet, or 10 or 40 Gbps ethernet) are often left
at a loss to explain why their network connections seem amazingly fast on a
local connection (intra-building, across a small academic campus, etc.) but
fall to rather paltry speeds when covering any sort of distance. In my
experience, users generally chalk this up to “the network is slow”, and live
with it. If some network support engineers (ISP, corporate network group,
whatever) is engaged, you usually get some sort of finger-pointing; all sides
have plenty of evidence that both the client, server, and network are operating
just fine, thank you, and that something else must be broken.


In many cases, TCP itself is the limiting factor. TCP must be lossless, even
in the face of packet losses, retransmissions, and corruption. To support
that, a TCP implementation (read: your operating system kernel) must save
every byte of data it transmits until the recipient has explicitly acknowledged
it. If a packet is lost, the recipient will fail to acknowledge (ACK) it (or
will repeatedly ACK the last byte it did receive); the sender can use its
stored copy to re-transmit the missing data. So how big does this buffer
need to be, anyway? Yeah, that would be the
<a href=http://en.wikipedia.org/wiki/Bandwidth-delay_product>bandwidth delay
product.


Bits do not propagate instantly - the speed of light is a constant. That means
a sender must buffer enough data for its network adapter to run at full speed
while waiting for the full round-trip delay to the recipient. The round-trip
delay can be measured via the UNIX ping command; typical values are
in tens of miliseconds. Multiply the bandwidth and the time (in seconds) for
a round trip, and you’ve got the amount of buffer space needed to keep a
connection busy at that distance. For example, a 1 Gbps network connection
with a 54 ms ping latency (say, from the midwest to the west coast), we
require 1 Gb/s * 0.054 s = 54 Mb = 6.75 MB of buffer space. Obviously, a
10 Gbps ethernet connection (and appropriate routers) would require 67.5 MB
of buffer to fill the available bandwidth.


The remainder of this document outlines how to tune TCP stacks in a couple OSes
for high bandwidth delay product communication. There’s a wide array of
OS-specific TCP and IP tuning parameters; here, I’m only focusing on the ones
related to long-haul TCP sessions. For more info, check out the links
referenced below.

Linux


Linux’s TCP stack includes tunables for overall maximum socket memory, as well
as a three-part value for send and receive, listing minimum, initial, and
maximum memory use. There are many other tunables, but as of RedHat Enterprise
6 (kernel 2.6.32 or so) most of these default to usable values for a 10 Gbps
WAN connection. The socket memory settings, however, default to a maximum of
4 MB of buffer space - probably far too small for modern WAN things.


TCP tunables are controlled via sysctl (read the man page). Add
the following to /etc/sysctl.conf:

net.core.rmem_max = 524288000
net.ipv4.tcp_rmem = 8192 262144 131072000
net.core.wmem_max = 524288000
net.ipv4.tcp_wmem = 8192 262144 131072000


The rmem_max line allows up to 0.5 GB of memory to be used for a
socket. Technically, this is way overkill, as the next line (for
tcp_rmem) will limit this to 128 MB max (and 8 kB minimum, with
a default of 256 kB). If 128 MB proves insufficient, simply raise this third
value. Both are repeated for wmem (memory for a sending socket).

Apple OS X


Apple’s TCP stack is BSD-derived. It also uses sysctl for tuning,
but has different tunables from Linux. Total socket memory is limited by
the maxsockbuf parameter; unfortunately, as of OS 10.9, this is
limited to a mere 6 MB - and that must be split (statically!) between send
and receive memory. Honestly, that’s just not enough for long-distance
transfers, but we’ll make the most of it that we can.


Currently, I’m recommending these lines in /etc/sysctl.conf:

kern.ipc.maxsockbuf=6291456
net.inet.tcp.sendspace=3145728
net.inet.tcp.recvspace=3145728
net.inet.tcp.doautorcvbuf=0
net.inet.tcp.doautosndbuf=0
net.inet.tcp.mssdflt=1460
net.inet.tcp.win_scale_factor=8
net.inet.tcp.delayed_ack=0

  • kern.ipc.maxsockbuf: This is the maximum amount of memory to
    use for a socket, including both read and write memory. Again, in 10.9, this is limited
    to 6 MB (and defaults to 6 MB) - rather disappointing, Apple. Note that this
    probably also affects SYSV IPC sockets (though, that’s unlikely to make a
    major difference for anyone).

  • net.inet.tcp.sendspace: Allow for up to 3 MB of memory for a send buffer.
    This, plus net.inet.tcp.recvspace, must be less than maxsockbuf.

  • net.inet.tcp.recvspace: Allow for up to 3 MB of memory for a receive buffer.
    This, plus net.inet.tcp.sendspace, must be less than maxsockbuf.

  • net.inet.tcp.doautorcvbuf,doautosndbuf: MacOS has a mechanism
    for auto-tuning buffer sizes. By default, this is limited to 512 kB for each
    of send and receive. Setting these to 0 will disable the buffer
    auto-tuning entirely.

  • net.inet.tcp.autorcvbufmax,autosndbufmax: If you’d rather
    keep the auto-tuning buffer logic enabled (see above), you’ll want to raise
    this maximum. The default (at least in 10.9) is 512 KB; a value of 3 MB
    (3145728) is more appropriate, and will allow your machine to hit higher
    transfer speeds. I suggest tuning this if your machine handles a lot of TCP
    connections. Most users probably won’t care, but at up to 6 MB per TCP
    connection, you could burn through memory quickly if you’ve got hundreds of
    connections in progress.

  • net.inet.tcp.autorcvbufinc,autosndbufinc: Based on the name,
    I suspect this determines how aggressively buffer auto-tuning ramps up to its
    full buffer size. It defaults to 8 KB; if you do use buffer auto-tuning, and
    if you see poor performance on short-lived connections (but better performance
    on TCP transfers that take at least a couple minutes to complete), you might
    try increasing this value by a factor of 10-20.

  • net.inet.tcp.mssdflt: Yeah, this should be higher. MacOS
    defaults to 512 bytes for its maximum segment size (the largest packet it will
    attempt to send). “Normal” ethernet frames are up to 1500 bytes (and there
    are specs for yet larger packets). 512 bytes is appropriate for modems, but
    not for anything faster (and that includes cable modems). If you’re using
    ethernet, I’d recommend 1460 (that’s a 1500-byte ethernet frame, minus 40 bytes
    of TCP/IP headers). If your ethernet goes through a PPPoE relay (e.g., DSL,
    and maybe some cable modems) you probably want 1440 (to account for 20 bytes of
    PPPoE framing data). Note that this doesn’t really make your connection
    faster - you just use fewer packets (and therefore fewer network resources)
    to get the job done.

  • net.inet.tcp.win_scale_factor: Most TCP implementations
    automatically calculate the window scale factor. In case MacOS doesn’t, I
    set this to 8 - though I’m not certain this is required. Try it, try omitting
    it, see if there’s any difference. If you’re wondering what a window scale
    factor is, I suggest reading the <a href=http://en.wikipedia.org/wiki/Bandwidth-delay_product>wikipedia page. Essentially, it controls how large a buffer
    your machine can advertise to the other side of the TCP connection.

  • net.inet.tcp.delayed_ack: Delayed ACKs are generally a good
    idea - wait until a few packets have arrived, and acknowledge them all at once.
    Fewer reply packets, less network traffic, etc. This can result in slightly
    higher latency (since the receiver waits slightly for multiple packets to
    arrive, even if only one is on the wire). Worse still, in some not-so-rare
    circumstances, this can interact very badly with Nagle’s algorithm
    (a similar sender-side optimization) - so much so that you can get several
    orders of magnitude worse performance, with no obvious reason why. If you
    suspect this is a problem, turn it off; for more information, look
    here.

References, Next Steps


There are a plethora of TCP tuning guides out there. If you’re tuning to a
specific application, or with certain high-end hardware (in particular,
Mellanox 10 and 40 Gbps adapters), I’d recommend looking at ethtool
settings as well.