February 23, 2008

handy 'and' and 'or' in Ruby

Here's a nice shortcut to verify if any member of an array satisfies a predicate, or to verify if all members of an array satisfy the predicate.

#!/usr/bin/env ruby -w

module Enumerable
def any sym
each {|item| return true if item.send sym }
return false
end

def all sym
each {|item| return false if !item.send sym }
return true
end
end

puts [1, 2, 3, 0, nil].any(:nil?)
puts [1, 2, 3, 0].any(:nil?)
puts ["", "", ""].all(:empty?)
puts ["", "", "dsa"].all(:empty?)

Output:
true
false
true
false

February 21, 2008

the craft of functional programming :: review

I went through most of The Craft of Functional Programming by Simon Thompson. All in all it was a great learning experience.


The book is aimed at beginners and the text is very dense, which saves a lot of time. The exercises were very useful to me and sometimes even enlightening. The one thing I found annoying was the lack of transparency in the sample code snippets: the code given in the book sometimes needs an import from the standard library to function, or it shadows a name from the standard library. All the problems can be solved by referencing the source files or hoogle but it's not as straightforward as it could be.


The book covers the basics fairly well but is very scarce on program complexity analysis, the type system, and monads.

If I had to recommend a "track" for learning haskell, it would go like this:

  1. Go through the haskell wikibook

  2. Read The Craft of Functional Programming up to chapter 18

  3. Read IO inside on the haskell wiki

  4. Read the rest of the book

  5. Find other more advanced resources...



All the while, there are exercises in the wikibook, in the book, on the haskell wiki at 99 problems, and at project euler.

February 17, 2008

news.yc statistics

I was thinking idle thoughts about the community at news.yc and I decided to pull out a few statistics. I found the following page with some stats on it but I wanted more.

The following was deduced from a sample of about 130 users, including the 50 leaders. This is a small sample and I can only access 180 posts per user, so this is not perfect, but still interesting. Note: for a few of those statistics, I only considered users with more than 8 posts.

  • Total number of posts: 8317

  • Total number of points on those posts: 42458

  • Minimum points per post ratio: 0.72

  • Maximum points per post ratio: 20.2

  • Mean points per post ratio: 5.75

  • Std deviation on the ratio: 3.22

  • Overall points per post ratio: 5.10

  • Mean number of posts: 62.00

  • Std deviation on the number of posts: 68.36


When a user has a high points per post ratio, he is probably representative of the community. All kinds of guidelines could be derived from this: if a user's points per post ratio is too low below the average, they probably are posting too much, or not a strong link in the community as a submitter. If the standard deviation on the mean ratio is low, then the community is rather cohesive: lots of people interact without leading or counting on others. If the standard deviation is high, than the users are possibly separated in two groups, one reading the posts of the other. These kinds of statistics might be useful as an early index of community 'dilution'.


Here's a graph showing the points per post against the number of posts.

We can see that it's very hard to maintain a good points per post ratio when the number of posts reaches about 40. Also, if a newcomer wanted to know what kind of users are on news YC, we should point them to paul, palish, sharpshoot, or pg.

Here's the same graph but only showing the leaders.


Zoom on the users with 180 posts:


Zoom on the bottom left cluster for all users:


Read the comments on this at news YC.


Here is the ruby code that I used. You can tweak it to find out your own points per post ratio.

go.rb

#!/usr/bin/env ruby
require 'open-uri'

HOSTNAME = "http://news.ycombinator.com/"
LOG = "... "

def compute(user, url, total_points, total_posts)
more_url = nil
open(HOSTNAME + url) {|doc|
doc.read.each {|line|
more_url = find_more_url line if more_url.nil?
line.scan(/(\d*) point[s]{0,1}<\/span>/) do |match|
#puts "analysing match: #{match}"
total_points += (match[0].to_i - 1) #posts start at one point
total_posts += 1
end
}
}
begin
sleep(5)
puts LOG + "page done"
$stdout.flush
#if there is a more button, follow it and recurse
total_points, total_posts = compute(user, more_url, total_points, total_posts) unless more_url.nil?
rescue
puts "caught #{$!}"
puts "stopped after #{total_posts} posts for user #{user}"
end
return [total_points, total_posts]
end

def go users
users.each do |user|
puts LOG + "processing #{user}"
$stdout.flush
begin
total_points, total_posts = compute(user, "submitted?id=#{user}", 0, 0)
ratio = total_points.to_f / total_posts
puts "#{user} #{total_points} #{total_posts} #{sprintf("%.2f", ratio)}"
rescue
puts "caught #{$!}"
puts "user #{user} not processed"
end
end
end

def find_more_url line
if line =~ /<a href="([^"]*?)" rel="nofollow">More<\/a>/
$1
else
nil
end
end

def file2list file
users = []
File.open(file) {|f|
f.each {|line|
users.push line.chomp unless line.nil?
}
}
users
end

def extract_names users, url=""
more_url = nil
open(HOSTNAME + url) {|doc|
puts 'transferring data'
doc.readlines.each {|line|
more_url = find_more_url line if more_url.nil?
line.scan(/<a href="user\?id=([^"]*)">/) do |match|
users.push match[0] unless users.include? match[0]
end
}
}

users = extract_names(users, more_url) unless more_url.nil?
return users
end

#extract_names([]).each {|u| puts u}
go(file2list(ARGV[0]))



extract.rb

#[0] = name
#[1] = points
#[2] = posts
#[3] = ratio

def std_deviation data, mean, index
sq_errors = 0.0
data.each {|row|
sq_errors += (row[index] - mean)**2
}
puts "Std deviation: #{Math::sqrt(sq_errors/data.length)}"
end

def num_posts_metrics data
min = 99999
max = 0
total_posts = 0
data.each {|row|
total_posts += row[2]
min = row[2] if row[2] < min
max = row[2] if row[2] > max
}
puts "Total posts: #{total_posts}"
puts "Min # posts: #{min}"
puts "Max # posts: #{max}"
mean_num_posts = total_posts/data.length
puts "Mean # posts: #{sprintf("%.2f", mean_num_posts)}"
std_deviation(data, mean_num_posts, 2)

end

def ratio_metrics data
min = 9999
max = 0
total_posts = 0
total_points = 0
sum_ratio = 0.0
data.each {|row|
total_posts += row[2]
total_points += row[1]
min = row[3] if row[3] < min
max = row[3] if row[3] > max
sum_ratio += row[3]
}
puts "Total posts: #{total_posts}"
puts "Total points: #{total_points}"
puts "Min ratio: #{min}"
puts "Max ratio: #{max}"
mean_ratio = sum_ratio/data.length
puts "Mean ratio: #{sprintf("%.2f", mean_ratio)}"
overall_ratio = total_points.to_f / total_posts
puts "Overall ratio: #{sprintf("%.2f", overall_ratio)}"

std_deviation(data, mean_ratio, 3)
end

#munge munge
data = []
File.open(ARGV[0]){|file|
file.each do |line|
data << line.chomp.split
end
}

data = data.map {|row|
[row[0], row[1].to_i, row[2].to_i, row[3].to_f]
}

#output
ratio_metrics(data.select {|row|
row[2] > 8
})
puts
num_posts_metrics data

February 14, 2008

command line quizzes part 2

Here's part two of the command line quizzes. Part one can be found here.
Q: Given that "top -b -n 1" gives the following output, write a script which kills all processes that use more than 25% of the processing power.


Tasks: 85 total, 1 running, 84 sleeping, 0 stopped, 0 zombie
Cpu(s): 4.7%us, 0.6%sy, 0.0%ni, 93.7%id, 0.8%wa, 0.2%hi, 0.1%si, 0.0%st
Mem: 506372k total, 495416k used, 10956k free, 12364k buffers
Swap: 0k total, 0k used, 0k free, 195588k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6159 root 19 -1 357m 38m 5380 S 2.0 7.9 7:02.21 X
6271 mats 20 0 123m 38m 25m S 2.0 7.8 1:33.09 amarokapp
6287 mats 20 0 25840 9916 7324 S 2.0 2.0 0:29.45 gkrellm
1 root 20 0 1612 544 472 S 0.0 0.1 0:00.84 init
2 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0
4 root 15 -5 0 0 0 S 0.0 0.0 0:02.08 ksoftirqd/0
5 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0
6 root 15 -5 0 0 0 S 0.0 0.0 0:00.16 events/0
7 root 15 -5 0 0 0 S 0.0 0.0 0:00.00 khelper
42 root 15 -5 0 0 0 S 0.0 0.0 0:00.11 kblockd/0

Here's a solution.

#!/bin/bash
#script to kill all processes which use more than 25% of the cpu
top -b -n 1 | awk 'NR >= 8 && $9 > 25 {print $1}' | xargs -r kill

Thanks to Ralph for this solution.


Q: Given the following two CSS files, write a script to verify that every class is only defined once (catch duplicate declarations).

File 1: first.css

body {font-family:Verdana;}
.header {font-size:90%; }
.article {line-height:1.1em;}
.footer {font-size:80%; }
.main .left #contents {float:left;}

File 2: second.css

#article .title {font-size:110%;}
.header {font-weight:bold;}
.email {font-family:Courier;}
.address {line-height:0.8em;}
.left {text-align:left;}

Here's a solution.

#!/bin/bash
#script to find out if two css files contain the same class name

#Use sed to filter out the css rules and put all the ids and classes on their own line
#Use sed again to only keep the class names
#Use sort and uniq to figure out the dups
classnames=$(sed -e 's/\([^{]*\).*/\1/g' -e 's/ /\n/g' $@ | sed -n -e '/\.\w\+/ p' | sort | uniq -dc | sort -r)
echo -e "\nDuplicate classes:\n"
echo "$classnames"
echo -e "\nGrepped files:\n"
#grep the files in argument for the class names found above
grep --color -n "$(echo "$classnames" | awk '{print $2}' | sed -e 's/\./\\./g' -e 's/\n/\|/g')" $@

It can be called like so.
./dupcss.sh first.css second.css

And gives the following output.
Duplicate classes:

2 .left
2 .header

Grepped files:

first.css:2:.header {font-size:90%;}
first.css:5:.main .left #contents {float:left;}
second.css:2:.header {font-weight:bold;}
second.css:5:.left {text-align:left;}

Grepping the files and displaying the results is just a convenience to the user.


I don't have any other quizzes on the back burner but I do have a few good links:

February 13, 2008

command line quizzes part 1

When I wanted to learn about bash and the unix command line tools, I looked for some easy quizzes and challenges but could not find any. Here's two for a start. (edit: part two can be found here)
Q: Find the most common 4-letter word in the following text.


Ruby is a pink to blood red gemstone, a variety of the mineral corundum (aluminium oxide). The common red color is caused mainly by the element chromium. Its name comes from ruber, Latin for red. Other varieties of gem-quality corundum are called sapphires. It is considered one of the four precious stones, together with the sapphire, the emerald and the diamond. Improvements used include color alteration, improving transparency by dissolving rutile inclusions, healing of fractures (cracks) or even completely filling them.

Prices of rubies are primarily determined by color (the brightest and best "red" called Pigeon Blood Red, command a huge premium over other rubies of similar quality). After color follows clarity: similar to diamonds, a clear stone will command a premium, but a ruby without any needle-like rutile inclusions will indicate the stone has been treated one way or another. Cut and carat (size) will also determine the price.

Rubies have a hardness of 9.0 on the Mohs scale of mineral hardness. Among the natural gems only diamond is harder, with a Mohs 10.0 by definition.

Here's a solution.

#!/bin/bash
#a script to find the most common 4-letter word in a text file
sed 's/ /\n/g' text.txt | grep -E '^\w{4}$' | sort | uniq -c | sort -r | head -1

Giving the following output. The most common 4-letter word is 'will' with 3 occurrences.
3 will



Q: Write a 10x10 matrix with random integers from 0 to 9 to standard output.

Here's a solution.

#/bin/bash
#a script to generate a 10x10 matrix with random numbers from 0 to 9
for i in `seq 1 10`; do
for i in `seq 1 10`; do expr $RANDOM % 10; done | xargs
done

Giving the following output.
3 1 0 3 8 5 7 7 0 8
4 5 1 8 8 0 8 7 5 4
6 1 1 3 6 4 8 8 0 9
6 6 1 8 6 9 9 6 3 4
5 1 7 2 1 4 5 4 8 5
7 5 8 7 9 9 5 4 2 1
7 0 8 2 9 4 6 2 7 8
8 6 8 6 9 9 6 3 2 3
8 1 9 3 9 6 7 3 5 8
8 5 9 8 9 0 7 1 0 3

February 9, 2008

project euler in Haskell

I was looking for a way to practice my lowly haskell skills and I found Project Euler. All my solutions can be found at Google Code: myhseuler. I solved a bit more than 30 problems as of this writing.

You can check it out like so:
svn checkout http://myhseuler.googlecode.com/svn/trunk/ myhseuler-read-only

February 3, 2008

function composition in Ruby

Here is an easy way to compose functions:

#!/usr/bin/env ruby

# method composition example
# > require 'compose'
# > (method(:multwo) << method(:multwo) << method(:add)).call(2,3)
# => 20

class Proc
def << g
lambda {|*args| self.call(g.call(*args)) }
end
end

class Method
def << g
lambda {|*args| self.call(g.call(*args)) }
end

end

def add left, right
left + right
end

def multwo arg
arg * 2
end