Blogging Daily is a lot of Work

If you write multiple blog posts a day, it's a big waste of time to go to each of the N number of publishing services out there and fill out all of their forms. Any task like that can be automated.

The problem is, most of these services don't want robots spamming their system with junk posts. So, there's no automation possible.

Enter Ruby's Mechanize.

Automate Filling Out Mixx's Forms With Mechanize

Mixx is a service that allows you to submit a url to an engine that gets a lot of traffic. Readers can then sort through categories and tags of posts as they are added.

Here is the manual workflow for doing that:

  1. Open http://www.mixx.com/
  2. Scroll to the login box and fill it out.
  3. Click "Submit a Link" on the next page.
  4. Enter the URL on that page
  5. Click submit
  6. Enter the title, description, tags, and categories on the next page
  7. Fill out the captcha
  8. Submit

Do that multiple times a day for multiple services and you have a full time job ;).

Here's what you can do to automate that with Mechanize:

require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'mechanize'
require 'highline/import'
require 'logger'

def to_mixx(options = {})
# fields
username = options[:username]
password = options[:password]
url = options[:url]
title = options[:title]
description = options[:description]
tags = options[:tags].join(", ") if options[:tags]

# crawler (and logging if desired)
agent = Mechanize.new # agent.log = Logger.new(STDOUT)

# login
page = agent.get("http://www.mixx.com/")
form = page.form_with(:action => "https://www.mixx.com/save_login")
form["user[loginid]"] = username
form["user[password]"] = password
form.submit

# submit post url
page = agent.get("http://www.mixx.com/submit")
form = page.form_with(:action => "http://www.mixx.com/submit/step2")
form["thingy[page_url]"] = url
page = form.submit

# go to final page
form = page.form_with(:action => "http://www.mixx.com/submit/save")

# captcha
iframe_url = page.parser.css("li.captcha iframe").first["src"]
params = iframe_url.split("?").last
captcha_iframe = agent.click(page.iframes.first)
captcha_form = captcha_iframe.forms.first
captcha_image = captcha_iframe.parser.css("img").first["src"]
# open browser with captcha image
system("open", "http://api.recaptcha.net/#{captcha_image}")
# enter captcha response in terminal
captcha_says = ask("Enter Captcha from Browser Image: ") { |q| q.echo = true }
captcha_form["recaptcha_response_field"] = captcha_says
# submit captcha
captcha_form.action = "http://www.google.com/recaptcha/api/noscript?#{params}"
captcha_response = captcha_form.submit
# grab secret
captcha_response = captcha_response.parser.css("textarea").first.text

# submit title, description, tags, categories, and captcha
form = page.form_with(:action => "http://www.mixx.com/submit/save")
form["thingy[title]"] = title
form["thingy[description]"] = description if description
form["thingy[new_tags]"] = tags if tags
form["recaptcha_challenge_field"] = captcha_response
form["recaptcha_response_field"] = captcha_says
done = form.submit
end

to_mixx(
:username => "your name",
:password => "your password",
:url => "http://viatropos.com/blog/automatically-submit-blog-posts-to-mixx",
:title => "Automatically Submit Blog Posts to Mixx"
)

The Cool Part: Programmatically Submitting a Captcha

In the middle of the above snippet, I programmatically submit a captcha with Ruby. Here's what happens...

First, look for the captcha in the current page. Sometimes it's just an image with an input, sometimes it's in an iframe, sometimes it's created with Javascript... Whatever it is, you just need to be able to:

  1. Get the URL of the image for the Captcha
  2. Get the input field name for the Captcha that you're supposed to fill out

In Mixx, they create the Captcha with Javascript, but they have a noscript version in an iframe. So I do this:

  1. Use Nokogiri and XPath to grab the url for the iframe (Mechanize is built around Nokogiri)
  2. Go to the iframe
  3. Scrape out the dynamically generated image url from the captcha iframe
  4. Programmatically open the browser window to show you the image:
    • On Macs: system("open", "http://viatropos.com")
    • On Windows: system("start", "http://viatropos.com")
  5. Console asks you to enter the Captcha text, so you look at browser image, and type it into the terminal and press enter.
  6. Mechanize submits the form, and scrapes out the captcha secret.
  7. I add that secret to the form on the original page

Pretty cool, it cuts out every step in publishing to mixx except interpreting the captcha. Hopefully that saves you time.

Automating Blog Post Submission

So this is currently a static blog hosted on Github Pages. It allows me to write everything in Textmate in Markdown, write metadata using YAML like Jekyll does, and use that information to automatically fill out the form on Mixx.

This document looks like this in Markdown:

---
title: Automatically Submit Blog Posts to Mixx
subtitle: Use Rubys Mechanize to Programmatically Fill out Captcha and Submit Posts to the Social Bookmarking Service Mixx
date: 2010-07-30 @ 06:14pm
tags:
- mixx
- captcha
- ruby
- mechanize
- social-bookmarking
---

## Blogging Daily is a lot of Work...

I wrote that information down when it was most fresh in my mind, and I don't want to think about it again. Thanks Mechanize, you rock.