Ruin The assorted ramblings of Brendan Tobolaski

Jekyll Yet Yet Yet Again

—014/ title: Jekyll yet, yet, yet again permalink: /2014/jekyll-yet-yet-yet-again/ type: posts date: 2014-05-08 00:07 tags:

  • Blogging
  • Jekyll categories:
  • Tech

    As you may be able to tell, I’m back to using Jekyll. This is hardly the first time or even the second Why do i keep switching? Ghost, which I’ve been using since I launched this site is great. It has by far the best interface I’ve ever seen for writing. I just always want something different.

The reason I keep coming back to Jekyll is that I like having all of my blog posts as text files. I work with databases every day, so I have no issues working with them. I just prefer having text files that I can do whatever I want with. I also really like the format that Jekyll uses. I’ve looked at other static site generators and I just don’t like the way that they work.

The one thing that I always miss when I go to Jekyll is having some sort of web interface. I do occasionally like to write from my iPad or iPhone and its pretty handy to have an interface that I can use. Unfortunately, I haven’t found anything that will work for this purpose. Maybe I should do something about that.

This is a fairly basic site at the moment. Its just the default setup from running jekyll new. I’ll be making changes over the coming weeks. Hopefully, I didn’t break anything. Let me know if you see something broken and I’ll get it fixed.

Benchmarking Virtualbox Multiple Core Performance

A while ago, I made a claim that you shouldn’t use Virtualbox with a vm that has more than one cpu because it is so slow that it is slower than using just a single core. People rightly called me on it as I didn’t provide any sort of evidence that this was the case.

To start off with, there are a few other people that have experienced the same thing. Also, I’m not saying that this is the case for all use cases but, in the context of a Drupal (or any php app for that matter) dev environment, you don’t want to use more than one cpu.

The script I’m using to test is what we use to create an empty version of our Drupal sites. You won’t have anything thats quite equivalent but, its performance should mirror regular php / mysql app as its using mostly php scripts to load the database with the required values.

ConfigurationTime
4 Cores61m37.419s
2 Cores45m27.327s
1 Cores28m17.904s

I also ran a test where 25 pages were loaded sequentially. Here are the results:

ConfigurationTime
4 Cores1:01.97
2 Cores54.295s
1 Cores37.966s

Now, its important to note that everything that I did is sequential. The build script runs through everything sequeuntialy. Basically, it processes the source files and it then inserts whatever it needs to into mysql. For the page load, it kind of does the same thing in reverse. It loads the required data from mysql and then returns the page. The slight wrinkle in this is that our Drupal stack will cache things in Memcache but, that should be helped by the multiple core vms.

You may want to point out that nothing I did actually does parralel processing, which is true. However, the additional cores should help with both Memcache and MySQL as they would have more processing power available to them. Regardless, that isn’t the point. This particular test, I think, is representative of the kind of activities that you would be doing during development of a PHP app (or any other single-threaded application).

Can’t all of it be explained by turbo boost? I don’t think so. Here is the cpu that I did the testing on. Judging from the turbo frequency chart, I think its likely that the cpu was running between 3.1 and 3.4ghz, given that the load was fairly low. I don’t think that is a big enough difference to explain the numbers that I saw.

So, if you use Virtualbox for your dev environment, don’t use more than 1 core and disable ioapic. I would recommend that instead you pay for the appropriate version of VMWare and pay for a license of the VMWare provider. Not only is it more stable, it doesn’t have this particular problem. For comparison, here are the numbers for VMWare Fusion.

  • Default build: 23m47.832s
  • Page Load: 44.529s

A Community for Vagrantfiles

I think there should be a community for Vagrantfiles (and their corresponding provisioning scripts), which I envision as being a sort of an awesome mashup of Google, Github, and Reddit. Basically, it would just be a CRUD app where people could submit and vote on Vagrant environments for particular stacks (want a MEAN stack? Here’s the definitive one, etc.). That way, if you wanted to start a new project with a particular stack, you could just git clone the project, optionally delete the .git directory to start fresh, run vagrant up and be done with it. This is the sort of workflow we used to get going on a Laravel app when we won Startup Weekend and it worked incredibly well.
Nathan Leclaire

I agree that this is needed. I’ve been using Vagrant for quite a while. I’ve used it for a large number of projects, almost all of my projects have a Vagrant environment. In spite of all of my usage, every time I start a new project, it always takes me a long time to figure out how I’m going to setup the provisioning. I would love to have some sort of directory for Vagrantfiles. Most likely, this would only be a starting point, but even that would be a huge improvement.

There are a couple of solutions to this problem, PuPHPet and Rove but, they aren’t enough. PuPHPet is focused on PHP, even if it is extended to include other languages, the generated files are horrendous. Even if you wanted to use PuPHPet as a starting point and extend it, it would take way more to extended it then to start from scratch. Rove has a different problem. I still haven’t quite figured out how to make it make a complete environment for an app.

I’m up for building it. Let me know know if you’re interested.

Two Drupal 7 Taxonomy Tips

This is precisely what I was looking for when I wrote my rant about including the version number. I was trying to figure out how you specify the category and tags when pragmatically creating nodes in Drupal 7 using node_save(). So here they are assuming that you already have your $node setup with the other values that you need.

Adding a category to the node

This should be all you need:

$node->field_category[$node->language][]['tid'] = $category_tid;

Unless you have a parent in which case you will need to add the parents all the way up to the top level. Here is an example if you already have the 1 parent’s tid:

$node->field_category[$node->language][]['tid'] = $parent_tid;
$node->field_category[$node->language][]['tid'] = $category_tid;

Adding a tag to the node

This one is pretty simple as you don’t need to worry about any sort of hierarchy.

$node->field_tags[$node->language][]['tid'] = $tag_tid;

Include the Version Number

If you’re going to post some sort of tip/tutorial for a piece of software, be sure to include the version that it works with. This is a daily occurrence for me:

  1. search for how to do something with Drupal
  2. find exactly what it is that I’m looking for
  3. it doesn’t specify what version of Drupal it works with
  4. Try it anyway
  5. Turns out it isn’t for Drupal 7
  6. Search again, this time specifying “drupal 7”
  7. Sort through multiple pages of things somewhat related to what I’m trying to find
  8. Find something close-ish and try it
  9. Repeat 8 an indefinite number of times until something works correctly

I could cut out a number of these steps if everyone would just include what version of Drupal their tutorial works with. Of course, this applies for pretty much any piece of software. I’m sure your readers will appreciate it.

Removing StartSSL from your trusted CAs

I’ve been using StartSSL for my ssl certificates because they are extremely cheap. They provide standard ssl certificates for free. They also allow you to validate your identity, after which they will allow you to make wildcard ssl certificates for free.

The catch is that revocation is not free. Normally, I wouldn’t find any sort of issue with this as having to revoke your certificate means that you did something wrong. It seems perfectly reasonable to make people responsible for their mistakes. In this case, an exception should be made in this case since not revoking the old certificates is bad for the public. StartSSL has chosen to not make an exception in this case.

Because of that decision, I’ve switched certificate authorities. I’m now using Gandi. I am also removing StartSSL from my trusted CAs. I would suggest that you do the same.

Heartbleed

On Tuesday, a catastrophic bug in OpenSSL was disclosed, Heartbleed. It abuses the heartbeat extension of ssl/tls and gives an attacker access to a small portion of memory on the server.

This section of memory could be the private key, which would be the worst case scenario. It’s fairly unlikely that would actually happen, unless the bug is exploited right after the web server is restarted.(See the update below) In other cases, this bug could reveal user session data. In some cases this could this could allow an attacker to impersonate a user. It could also reveal the user’s password.

As Bruce Schneier says:

“Catastrophic” is the right word. On the scale of 1 to 10, this is an 11.

Recovery from this is extremely painful. Even though it’s unlikely that your private key leaked, the chance that it did is enough to replace your keys. You need to replace all of your private keys. I’m working on this part. It turns out startssl is awful in situations like this and I’ll be switching them out for a better CA. You should invalidate all user sessions. You should also reset user passwords.

Update

It turns out that its not all that unlikely that the private key can be extracted. CloudFlare issued a challenge for people to attempt to retrieve a private key using heartbleed. Two separate people were able to extract the key. They did reboot the server around the time that the key extraction, so that may have played a part. Considering that there is evidence of heartbleed being exploited up to 2 years ago. The assumption has been that the NSA was the one that was doing it. They have denied it, of course, but that is hardly believable anymore.

Downloading a file using the node module Request

I needed to download a file using Node and I wasn’t sure how to do it. It turns out that its pretty much impossible to search for.

First thing you will need is the request module, npm install --save request. Then this should work:

var fs = require('fs'),
    request = require('request');

function (url, path, callback) {
  request({uri: url})
      .pipe(fs.createWriteStream(path))
      .on('close', function() {
        callback();
      });

This function takes in a url, as a string; the path, also as a string; and a callback function. It will write the file to the specified path and when its complete, it will call your callback function.

I’m much more likely to return a promise instead of a taking a callback parameter but, I thought this might be helpful for more people. I don’t know how to handle errors out. If you happen to know, let me know.

Debugging Node.js

Node.js has a great debugger, node-inspector. Its extremely easy to use. Just run node-debug <start script> and then head to the url that it specifies. By default, it will tell you to connect to 127.0.0.1 but, it will listen on all the interfaces at the specified port. Then all you have to do is connect with Chrome. Its unfortunate but, its not too onerous of a requirement. It will work even if the node process is running on a different machine. Once connected, you have all of the tools you could want from a debugger. The stack traces aren’t always helpful due to the asynchronous nature of node.js.

This is in stark contrast to the other debugger that I use at my job, xdebug. Xdebug doesn’t have a built in interface. It requires you to provide your own interface. I’ve tried quite a number of them but, I’ve only found one that is actually usable PHPStorm/idea (which are both great). Heres the thing, node-inspector is about even with the quality of Jetbrain’s products without the need to load a full ide.

Clustering Ghost

Following from my post about running Ghost as an npm module, I decided to see if I could run Ghost using cluster. As you are probably aware, Node.js runs as a single asynchronous thread. While this works extremely well in most cases, sometimes you just run out of cpu power from a single core. Cluster changes that, it runs multiple instances of your node application and load balances between them. It is extremely simple to use.

While I was think about this, I decided to run some load tests to see how much of an improvement it would be. The standard Ghost setup, a single instance of Ghost with MySQL was able to handle about 50 req/s. I was able to load test with up to 1000 simultaneous connections using blitz.io(gives me a credit if you sign up). Ghost performed pretty consistently with the 50 res/s. Personally, I find that to be a bit too slow but, I’m willing to accept that considering how great Ghost is for writing.

To setup cluster, you’ll need to be running MySQL. SQLite is not thread safe in some functions so I think its better to be safe and simply not try this with SQLite. You also need to be running Ghost as an npm module. Once you have all of that setup, it actually really simple. All you need to do is modify your app.js file to match:

"use strict";
/**
 * Module dependencies.
 */

var ghost = require('ghost');
var path = require('path');
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;
var environment = process.env.NODE_ENV;
var threads = process.env.NODE_ENV === 'production' ? numCPUs : 1;

if (cluster.isMaster) {
  for (var i = 0; i < threads; i++) {
    cluster.fork();
  }

  cluster.on('exit', function(worker, code, signal) {
    console.log('worker ' + worker.process.pid + ' died. Restarting...');
    cluster.fork();
  });
} else {
  ghost({
    config: path.join(__dirname, 'config.js')
  });
}

Thats it. When you start your Ghost site with npm start --prod, it will spawn a worker for each core. If you start it in development mode (i.e. npm start), it will only run a single worker.

After setting this up, I did another load test. This time around, I managed to get 110 res/s. I run this site on Mac Mini with dual core, hyper threading processor. Its not surprising that I only get a scaling factor of 2 since there are only 2 real cores.

For reference, I also ran the load test using SQLite. It managed 40 res/s, which isn’t terrible.