January 14, 2010

javascript: the good parts :: review

I recommend Douglas Crockford's book to anyone who wants to learn more about JavaScript. It really consolidated my knowledge of the language. The book is short and enjoyable. The author comes off as really credible which is important as the book makes quite a few recommendations on proper usage and style. Here are a few notes that I took while reading the book.

The prototype chain

The prototype is only used when retrieving values. Setting a property always sets it on the receiver directly. To know if the receiver has a property without looking at the prototype, use #hasOwnProperty.

Function calls

Every function has access to two magic objects: this and arguments. This is a link to the current object and its value depends on the invocation pattern of the function. Arguments is an array-like object containing all the arguments passed in to the function.

There are four invocation patterns:

  • As a method: when called on a receiver object, this represents the receiver
  • As a function: when called without a receiver, the this pointer points to the global object
  • As a constructor: when invoked with the new keyword, a new object is created using the function's prototype. This points to that new object.
  • Using #apply: in this case, the this pointer is explicit and passed in to #apply.

Partial application in JavaScript

Here is a clever snippet from the book that allows function application.


Function.prototype.curry = function(){
var slice = Array.prototype.slice;
var args = slice.apply(arguments);
var that = this;

return function(){
return that.apply(null, args.concat(slice.apply(arguments)));
};
};

You can then use it like this:



>>> function add(a,b) {return a + b;}
>>> add.curry(1)(2)
3

Recommendations and gotchas

  • JavaScript does not have block scope
  • Avoid the use of 'new'
  • Always pass in the radix parameter to #parseInt
  • Always use === and !==, not == or !=
  • Avoid the use of 'with', use a local variable instead
  • Avoid the use of 'void'
  • Avoid the use of typed wrappers like new Boolean, new String, or new Number
  • All characeters are 16-bits wide

January 10, 2010

textonyms

I was interested in textonyms -- words that can be represented by the same digits when texting. Rotating through the list of possible words for a certain input, a phone will turn lips into kiss. Or good into home. Book becomes cool.

So I wrote the following script to extract textonyms from a list of words. It should be used on a text file containing one word per line.




#!/usr/bin/env ruby

def populate dictfile
digits2words = Hash.new {|hash, key| hash[key] = []}

File::open(dictfile){|file|
file.each do |line|
digits = line.chomp!.split(//).map do |char| CHAR_TO_DIGIT[char.downcase.to_sym] end
digits2words[digits.to_s] << line
end
}

digits2words.each do |key, values|
puts "#{key}\t#{values.size}\t#{values.join ' '}"
end
end

CHAR_TO_DIGIT = {
:a => 2,
:b => 2,
:c => 2,
:d => 3,
:e => 3,
:f => 3,
:g => 4,
:h => 4,
:i => 4,
:j => 5,
:k => 5,
:l => 5,
:m => 6,
:n => 6,
:o => 6,
:p => 7,
:q => 7,
:r => 7,
:s => 7,
:t => 8,
:u => 8,
:v => 8,
:w => 9,
:x => 9,
:y => 9,
:z => 9
}

if __FILE__ == $0
if !ARGV.empty?
populate(ARGV[0])
else
puts "specify a file name"
end
end

December 2, 2009

urban dictionary's greatest hits

I was curious about what words were the most popular on urbandictionary.com. So I scraped the popular section for each letter in the alphabet and came up with the following table where the ranking is based on the number of upvotes. I didn't want to have my blog associated with any of the words below and I didn't want to give backlinks to the urban dictionary so here's an image of my results.

Here is the code I used to get to the information. It scrapes all the links provided in the popular section for each letter and writes a dump of all entries. The dumps can later be used to study the data.

require 'rubygems'
require 'hpricot'
require 'open-uri'
require 'entry'
require 'net/http'
require 'json'

def retrieve_votes(doc)
  # in-browser, votes are retrieved through an ajax call after the web page is loaded
  (doc/"td.tools").each do |tools|
    id = tools[:id]
    uncacheable_id = id.scan(/\d+/)[0]

    json_response = Net::HTTP.post_form(URI.parse('http://www.urbandictionary.com/uncacheable.php'), {'ids'=> uncacheable_id})
    thumbs = JSON.parse(json_response.body)['thumbs'][0]
    return [thumbs['thumbs_up'], thumbs['thumbs_down']]
  end
end

def retrieve_links letter
  anchors = []

  doc = Hpricot(open(letter))
  (doc/"table#columnist//tr").each do |row|
    (row/"td").each do |cell|
      (cell/"ul"/"li").each do |li|
        (li/"a").each do |anchor|
          anchors <<  anchor.get_attribute(:href)
        end
      end
    end
  end

  anchors
end

def build_entry(doc)
    word = "no words found"
    definition = "no definitions found"

    up, down = retrieve_votes(doc)
    (doc/"td.word").each do |wrd|
      word = wrd.to_plain_text
      break
    end
    (doc/"div.definition").each do |defined|
      definition = defined.to_plain_text
      break
    end

    Entry.new(word, definition, up, down)
end

"ABCDEFGHIJKLMNOPQRSTUVWXYZ".each_char do |letter|
  links = retrieve_links("http://www.urbandictionary.com/popular.php?character=#{letter}")
  entries = []
  links.each do |link|
    sleep 5
    puts "fetching #{link}"
    doc = Hpricot(open("http://www.urbandictionary.com" + link))
    entries << build_entry(doc)
  end

  File.open("letter#{letter}.dump",'w'){|file|
    file.write(Marshal.dump(entries))
  }
end

The Entry class is defined like so.

class Entry
  attr_accessor :word, :definition, :up, :down
  def initialize word, definition, up, down
    @word = word
    @definition = definition
    @up = up
    @down = down
  end

  def <=>(other)
    (other.up - other.down) - (@up - @down)
  end

  def to_s
    "#{word} (#{up} up, #{down} down): #{definition}"
  end
end

In order to get some kind of greatest hits, I used the following script. I had to filter out entries with too many down votes to get rid of the most childish entries.

require 'entry'

entries = []
Dir['letter*.dump'].sort.each do |filename|
  entries.concat(Marshal.load(File::open(filename).read))
end

entries.sort!
entries.reject! {|entry| entry.down > 1000}
html="<table>"
50.times do |i|
  i = i + 1
  html += "<tr><td><b>#{i}</b></td><td style='padding: 0 2em 0 2em;'>#{entries[i].word}</td><td><a href=\"http://www.urbandictionary.com/define.php?term=#{entries[i].word}\">read more</a></td></tr>"
end
html+="</table>"

puts html