It's been a while since I updated this blog, so I'm going to crosspost a literary thought from a private Facebook discussion:
Fahrenheit 451 has strong antifeminine sentiments; the main characters are men, the only thoughtful woman is a womaninthefreezer, women are always used as the examples of societal airheadednaivete, but criticising this seems to play into the book's own narratives of censorship:
"Now let's take up the minorities in our civilization, shall we? Bigger the population, the more minorities. Don't step on the toes of the doglovers, the catlovers, doctors, lawyers, merchants, chiefs, Mormons, Baptists, Unitarians, secondgeneration Chinese, Swedes, Italians, Germans, Texans, Brooklynites, Irishmen, people from Oregon or Mexico. The people in this book, this play, this TV serial are not meant to represent any actual painters, cartographers, mechanics anywhere. The bigger your market, Montag, the less you handle controversy, remember that! All the minor minor minorities with their navels to be kept clean. Authors, full of evil thoughts, lock up your typewriters. They did. Magazines became a nice blend of vanilla tapioca. Books, so the damned snobbish critics said, were dishwater. No wonder books stopped selling, the critics said. But the public, knowing what it wanted, spinning happily, let the comicbooks survive."
In the postscript to my edition, Bradbury explicitly addresses this complaint (among others) noting that "there's more than one way to burn a book". He further recounts that, ironically, many students in the US have been given editions which are partially censored (mostly swearwords like 'damn' and 'hell').
Here's an interesting idea: the book is improved because because it simultaneously criesout for criticism of it's genderstereotypes (or even a rewrite) while explicitly warning against the dangers of specialinterest censorship. A progressive reading this book in the 21st century needs to wage, in their own mind, the societalcensorship battle that Bradbury describes.
Note: It was written in the 50s, which is an explanation if not an excuse.
Note 2: Women aren't a minority, but they are an interest group in the same style as the others and Bradbury explicitly addresses 'feministas' as a minority in the postscript of my edition.
Elegant AlmostEverywhere
Programming, hacking, science and trivia
Thursday, 28 August 2014
Fahrenheit 451 (a nontechnical post)
Posted by
Daniel RodgersPryor
at
9:21 pm
Labels:
Bradbury,
Fahrenheit 451,
feminism,
free speech,
literature
Sunday, 9 February 2014
Pascal's Triangle
While smoothing some evenlyspaced time series data a while ago, I found the need to generate rows from Pascal's triangle; for my own entertainment, I tried to come up with some minimal (but not totally unreadable) code:
Gaussian blurring/smoothing is just another instance of the general rule that 'Gaussians come up everywhere'. When dealing with discreet data, it's important to remember that the binomial distribution is the discrete equivalent of the Gaussian. The $n^{th}$ row of Pascal's triangle is a list of the binomial coefficients $\binom nk$. These are the weightings (up to a normalisation factor of $\frac{1}{2^n}$), describing how each raw datapoint will contribute to it's neighbors:
$$x_i^{smoothed} = \sum\limits_{k = 0}^{n} \binom nk x_{i\frac{n}{2} + k}$$
Each smoothed point is thus heavily influenced by its near neighbours, but not by its more distant ones. Note that $n$ determines the radius of the smoothing filter (in units of your data's sampling period).
This kind of clunky, discrete sum is just a special case of a convolution. Convolutions generalise the notion of smoothing to filtering, enabling a huge range of operations (notably Fourier Transforms) to be performed with these same kinds of discrete sums. Convolutions generalise to to higher dimensions (and continuous spaces) as well. In 2D, the weights or 'filter kernel' become a matrix; filtering is then sliding every pixel in the kernel matrix over each pixel in the data matrix. This gives a complexity of $O(n^2m^2)$ for an $m \times m$ kernel. Gaussians (binomials) have some special properties that can make these sums more efficient however: multidimensional Gaussians are separable. A 2D Gaussian is just the product of two orthogonal 1D Gaussians: $G(x, y) = G(x)G(y)$ [Note that the separated components of a 2D Gaussian might not align with the x and y axes of your data matrix. This is only a problem if the Gaussian is anisotropic, ie. if it has different widths in different directions].
This means that convolutions with Gaussian kernels can be done one dimension at a time; sliding each datapixel over each pixel in two length $m$ binomial/Gaussian kernels, reducing the time complexity to $O(n^2m)$. This reduction extends to higher dimensions too  you can always break down a Gaussian into 1D components and convolute them sequentially.
def pascalRow(n): " Returns the nth row of Pascal's triangle" return [1] if n<=0 else reduce(lambda row, n: row[:1] + [(row[1] + n), n], pascalRow(n1), [0])This is short and quite fast, but severely lacking in readability; a trait only made slightly excusable by it's modularity, and clarity of function. The question you may be asking though, is 'why Pascal's triangle in the first place'?
Gaussian blurring/smoothing is just another instance of the general rule that 'Gaussians come up everywhere'. When dealing with discreet data, it's important to remember that the binomial distribution is the discrete equivalent of the Gaussian. The $n^{th}$ row of Pascal's triangle is a list of the binomial coefficients $\binom nk$. These are the weightings (up to a normalisation factor of $\frac{1}{2^n}$), describing how each raw datapoint will contribute to it's neighbors:
$$x_i^{smoothed} = \sum\limits_{k = 0}^{n} \binom nk x_{i\frac{n}{2} + k}$$
Each smoothed point is thus heavily influenced by its near neighbours, but not by its more distant ones. Note that $n$ determines the radius of the smoothing filter (in units of your data's sampling period).
This kind of clunky, discrete sum is just a special case of a convolution. Convolutions generalise the notion of smoothing to filtering, enabling a huge range of operations (notably Fourier Transforms) to be performed with these same kinds of discrete sums. Convolutions generalise to to higher dimensions (and continuous spaces) as well. In 2D, the weights or 'filter kernel' become a matrix; filtering is then sliding every pixel in the kernel matrix over each pixel in the data matrix. This gives a complexity of $O(n^2m^2)$ for an $m \times m$ kernel. Gaussians (binomials) have some special properties that can make these sums more efficient however: multidimensional Gaussians are separable. A 2D Gaussian is just the product of two orthogonal 1D Gaussians: $G(x, y) = G(x)G(y)$ [Note that the separated components of a 2D Gaussian might not align with the x and y axes of your data matrix. This is only a problem if the Gaussian is anisotropic, ie. if it has different widths in different directions].
This means that convolutions with Gaussian kernels can be done one dimension at a time; sliding each datapixel over each pixel in two length $m$ binomial/Gaussian kernels, reducing the time complexity to $O(n^2m)$. This reduction extends to higher dimensions too  you can always break down a Gaussian into 1D components and convolute them sequentially.
Posted by
Daniel RodgersPryor
at
2:48 am
Labels:
binomial,
combinatorics,
convolution,
filtering,
gaussian,
kernel,
math,
maths,
python,
smoothing
Saturday, 27 April 2013
3D Printing Nylon
Oozy, webbed nylon printing 
At first my prints were unbelievably oozy  the nylon just flowed from the nozzle and leaked all over the print. I also had problems getting it to adhere to standard blue tape. Apparently both of these are pretty common.
Damaged bed surface after print removal 
The oozing problem was far more insidious. Initially, I tried to use retraction, but it had no effect (actually it made things worse by causing the print head to sit still momentarily, melting the plastic beneath it and letting it flow). I then happened upon a great idea; get rid of retraction entirely. Get the print head moving so quickly that not much plastic has time to flow while it's moving. In combination with the 'avoid crossing perimeters' flag in slic3r (which forces the printer to complete each part separately, getting rid of that spiderweb of interpart trails), and the switch to a 0.2mm nozzle produced magnificent results:


InMoov index finger  the surface is still a bit rough, but the sanded joints are slipperysmooth and its many times stronger than the ABS versions I've managed to make (which suffer from cracking and shattering near the weak, stressbearing pivot points). 
Nylon loves to absorb water, water that boils and bubbles out of the extruder during printing; it's often recommended to dry it out in an oven at low heat before printing, but I haven't found that to be necessary. There's a small trail of steam from the print nozzle, but there appear to be no adverse effects, though this may not be true if you live in a highhumidity area.
I could comfortably hammer the printed lego parts without damage; a massive improvement over ABS. Despite some reports that gluing/bonding 618 is ineffective, I found that superglue worked perfectly on sanded nylon parts (see the indexfinger above).
Apparently, natural nylon is very easy to colour with fabric dyes too; this allows each printed part to be coloured separately, removing the need for multiple rolls in different predyed colours (though I have yet to test this).
All in all I'm quite impressed; nylon 618 is a fiddly material for printing, but the results are certainly worth it for any part that needs significant mechanical strength.
Posted by
Daniel RodgersPryor
at
10:44 pm
Labels:
3D printing,
inmoov,
nylon,
nylon 618,
ooze,
printrbot,
retration,
settings,
taulman,
temperature,
ubis
Thursday, 25 April 2013
Powersets
While looking over purported Google interview questions, I found one that was just 'write some code to make powersets'. It turns out that the powerset of a set $S$ is the set of all its subsets (including itself):
$$\mathcal{P}( S ) = \{s \subseteq S \}$$
My first attempt at a solution was an ugly iterative algorithm that used lists to represent sets. An hour or so of tinkering led me to this lazy, recursive solution using python's built in set type (which is a hash table and thus has constanttime lookups):
Length 3 subsets of S=[1, 5]
set([1, 2, 3])
set([1, 2, 4])
set([1, 2, 5])
set([1, 3, 4])
set([1, 3, 5])
set([1, 4, 5])
set([2, 3, 4])
set([2, 3, 5])
set([2, 4, 5])
set([3, 4, 5])
Powerset of S=[1, 3]
set([])
set([1])
set([2])
set([3])
set([1, 2])
set([1, 3])
set([2, 3])
set([1, 2, 3])
The length of the powerset of S with S = 12 is 4096
$$\mathcal{P}( S ) = \{s \subseteq S \}$$
My first attempt at a solution was an ugly iterative algorithm that used lists to represent sets. An hour or so of tinkering led me to this lazy, recursive solution using python's built in set type (which is a hash table and thus has constanttime lookups):
def subsetsOfLength(S, l): ''' Return a generator of all subsets of S that are of length l. Note: this function uses recursion, so l must be less than your python implementation's recursion limit (default 1000 for C Python). ''' if l > len(S): pass # Return a lengthzero iterator when subsets larger than S itself are requested. elif l == 0: yield set() # The empty set is the only set of length 0 elif l == 1: for s in S: yield {s} # Just yield each element if only lengthone elements are required # This clause isn't strictly required, but it saves len(S) redundant function calls that would all # return the empty set (to be unioned with x) else: S = S.copy() # Avoid mangling the original parameter while S: x = {S.pop()} # x is a set containing a single, arbitrary element from S # Yield the union of x with every possible subset of S\{x} (that has # the correct length to make the union be length l): for y in subsetsOfLength(S, l1): yield x.union(y) def powerset(S): ''' Return a generator for the powerset (the set of all [nonstrict] subsets) of some set S ''' for l in xrange(len(S)+1): for x in subsetsOfLength(S, l): yield x if __name__ == '__main__': # Module Test/Demo print 'Length 3 subsets of S=[1, 5]' print '\n'.join((str(x) for x in subsetsOfLength(set(range(1, 6)), 3))) print print 'Powerset of S=[1, 3]' print '\n'.join((str(x) for x in powerset(set(range(1, 4))))) print print 'The length of the powerset of S with S = 12 is %d' % sum(1 for _ in (powerset(set(range(12))))) # This could be calculated apriori, as 2^S, but that wouldn't be a testHere's the test output:
Length 3 subsets of S=[1, 5]
set([1, 2, 3])
set([1, 2, 4])
set([1, 2, 5])
set([1, 3, 4])
set([1, 3, 5])
set([1, 4, 5])
set([2, 3, 4])
set([2, 3, 5])
set([2, 4, 5])
set([3, 4, 5])
Powerset of S=[1, 3]
set([])
set([1])
set([2])
set([3])
set([1, 2])
set([1, 3])
set([2, 3])
set([1, 2, 3])
The length of the powerset of S with S = 12 is 4096
Posted by
Daniel RodgersPryor
at
9:57 pm
Labels:
generators,
hash table,
lazy,
powerset,
powersets,
python,
recursion,
set,
sets
Prime Numbers
A few months ago, during a lecture on Haskell and functional programming, we were introduced to the Sieve of Eratosthenes, an elegant way to generate prime numbers. We were shown a neat implementation in Haskell that used just a few lines; I wanted to see if I could replicate that neatness/laziness in Python.
Primes below 10:
2
3
5
7
First 10 primes:
2
3
5
7
11
13
17
19
23
29
10,000th prime:
104729
Neat, huh?
I love infinite generators  it's flatout cool to be able to create an object describing every prime number using just a few lines.
from itertools import takewhile, chain, count, islice from collections import deque knownPrimes = [2] # Numbers are tested by counting upwards from here. This also acts as a cache. def storeAndReturn(p): knownPrimes.append(p) return p def isNotDivisibleByKnownPrimes(n): return all((n % p) != 0 for p in takewhile(lambda p: p**2 <= n, knownPrimes)) def primeFilter(sequence): return (storeAndReturn(i) for i in sequence if isNotDivisibleByKnownPrimes(i)) def allPrimes(): return chain(knownPrimes, primeFilter(count(knownPrimes[1] + 1))) def primesUpTo(limit): return takewhile(lambda p: p < limit, allPrimes()) def firstNPrimes(n): return islice(allPrimes(), None, n, None) def nthPrime(n): return deque(firstNPrimes(n), maxlen=1).pop() # deque construction is apparently the # fastest way to consume an iterator (partly because it's raw C) if __name__ == '__main__': # Module Test print "Primes below 10:" print '\n'.join(map(str, primesUpTo(10))) print "First 10 primes:" print '\n'.join(map(str, firstNPrimes(10))) print "10,000th prime:" print nthPrime(10000)I succeeded, and even added some more interesting ways to use the allPrimes() generator (although a few of the oneliners should probably be brokenout to aid readability).
Primes below 10:
2
3
5
7
First 10 primes:
2
3
5
7
11
13
17
19
23
29
10,000th prime:
104729
Neat, huh?
I love infinite generators  it's flatout cool to be able to create an object describing every prime number using just a few lines.
Posted by
Daniel RodgersPryor
at
6:34 pm
Labels:
generators,
haskell,
lazy,
prime numbers,
prime sieve,
primes,
python
Cheating (sortof) at Draw Something
A year or so ago Draw Something was all the rage. For those who don't remember, it's basically pictionary  each turn you draw a word and your opponent/partner has to guess it from a given set of letters, then it alternates. Usually its lots of fun, but occasionally, you get something ridiculous and unintelligible.
Unwilling to accept defeat and too stubborn to ask for help, I set out to solve the problem automatically. On the assumption that bad draw something images contain basically no information, I decided to approach the problem using the text alone.
The idea: find all combinations and permutations of the given letters that were of the correct length and sort them by how frequently they are used. Initially I intended to use Google search hits to rank the letter combinations, obviously, this would be unbelievably slow and, as it turns out, violates Google's terms of service. Eventually I settled on a much better solution; use a local copy of Wikipedia. It's important to use something other than a dictionary because Draw Something frequently uses brands and popculture references that would't crop up in most standard dictionaries.
The article text alone was around 30GB (unzipped) when I grabbed it at the start of 2012, so lazy parsing is essential. I used python's etree module, but using it lazily turned out to be non obvious. Even when using the etree.iterparse function, all processed nodes are kept in memory forever, they're just loaded onebyone. The answer, obvious in retrospect, is to delete each node after reading its contents using myElement.clear(). After separating each word of text, I added it to a big dictionary mapping words to their frequencycounts, that dictionary then gets pickled and saved to disk, ready for later use.
Here's the code:
Speaking of permutations, Python's itertools library has some wonderfully simple functions for lazy combinatorics. Generating potential words of a certain length is just: permutations('abbdejmnorux', 7)
Combinatorial permutations are naturally enormous, but the fixed sizes for this problem make that insignificant on modern hardware. Here's the final code:
Also:
...
moderna
jourdan
bombard
broaden
majored
dunmore
unarmed
abdomen
rebound
bermuda
done!
Rebound. The answer was rebound. Obvious now, right?
Confusing Draw Something image taken from http://baddrawsomething.tumblr.com/
What? 
The idea: find all combinations and permutations of the given letters that were of the correct length and sort them by how frequently they are used. Initially I intended to use Google search hits to rank the letter combinations, obviously, this would be unbelievably slow and, as it turns out, violates Google's terms of service. Eventually I settled on a much better solution; use a local copy of Wikipedia. It's important to use something other than a dictionary because Draw Something frequently uses brands and popculture references that would't crop up in most standard dictionaries.
The article text alone was around 30GB (unzipped) when I grabbed it at the start of 2012, so lazy parsing is essential. I used python's etree module, but using it lazily turned out to be non obvious. Even when using the etree.iterparse function, all processed nodes are kept in memory forever, they're just loaded onebyone. The answer, obvious in retrospect, is to delete each node after reading its contents using myElement.clear(). After separating each word of text, I added it to a big dictionary mapping words to their frequencycounts, that dictionary then gets pickled and saved to disk, ready for later use.
Here's the code:
from glob import glob import cPickle as pickle from itertools import ifilter import xml.etree.ElementTree as etree import string import re import unicodedata wordCount = {} identity = lambda x: x def incrementCount(word): wordCount[word] = wordCount.get(word, 0) + 1 regexWordSeperators = "&[^;]+?;" charWordSeperators = "()[]{}.,/_=:&;\\^~" charSperatorTraslation = string.maketrans(charWordSeperators, " "*len(charWordSeperators)) rubbishChars = (string.punctuation + string.digits).translate(string.maketrans("",""), charWordSeperators) def countWords(text): global debugcount text = unicodedata.normalize('NFKD', u'%s' % text).encode('ascii','ignore') # Much nicer conversion! text = re.sub(regexWordSeperators, " ", text).replace("%20", " ") # replace html characters like < with spaces text = text.translate(charSperatorTraslation, rubbishChars).lower() # clean and prepare for # splitting (by whitespace) into words text = ifilter(identity, text.split()) # Remove empty strings if there are any map(incrementCount, text) def processElement(elem, event): txt = None if event == 'end' and elem.tag[4:] == 'text': txt = elem.text elem.clear() # Delete from element tree in memory!! root.clear() # Delete empty element from root (which # has some tiny memory footprint leading to a leak that matters with HUGE files) return txt # Parse XML Tree: itertree = etree.iterparse(glob("./*.xml")[0], events=('end', 'start')) # We don't care about any events except start and end # Define ArticleText Generator: firstEvent, root = itertree.next() textSnippets = ifilter(identity, (processElement(elem, event) for event, elem in itertree)) map(countWords, textSnippets) # Do the counting work here # Store with open("wordCount.pckl", 'wb') as f: pickle.dump(wordCount, f) print "All of Wikipedia processed! %d words found :)" % len(wordCount)It takes about an hour to run and the resulting file is 180MB or so; that's a lot of words! it turns out that when you have as much text as is on Wikipedia, occasional typos cause every word to permute into dozens of misspellings. Ideally a good threshold (say 5 occurrences) could be used to filter words before pickling them. Rather than waiting another hour, I just filtered the words after matching them to permutations  it makes every search slower because it has to load and search 180MB of dictionary before filtering.
Speaking of permutations, Python's itertools library has some wonderfully simple functions for lazy combinatorics. Generating potential words of a certain length is just: permutations('abbdejmnorux', 7)
Combinatorial permutations are naturally enormous, but the fixed sizes for this problem make that insignificant on modern hardware. Here's the final code:
import cPickle as pickle from itertools import permutations, ifilter, ifilterfalse, islice # Parameters # LETTERS = "abbdejmnorux" LENGTH = 7 # Definitions # with open("wordCount.pckl", 'rb') as f: wikiWordCount = pickle.load(f) def usageFrequency(word): return wikiWordCount.get(word, 0) def unique_everseen(iterable, key=None): "List unique elements, preserving order. Remember all elements ever seen." # unique_everseen('AAAABBBCCDAABBB') > A B C D # unique_everseen('ABBCcAD', str.lower) > A B C D # This python recipe is taken from: http://docs.python.org/2/library/itertools.html seen = set() seen_add = seen.add if key is None: for element in ifilterfalse(seen.__contains__, iterable): seen_add(element) yield element else: for element in iterable: k = key(element) if k not in seen: seen_add(k) yield element # Specification of Output # candidates = unique_everseen(permutations(LETTERS, LENGTH)) candidates = (reduce(lambda x,y: x+y, candidate) for candidate in candidates) # zip candidate strings to usage frequency, sort+filter by this value, then unzip and dump frequency info candidates = (pair[0] for pair in sorted( ifilter( lambda pair: pair[1], ((candidate, usageFrequency(candidate)) for candidate in candidates) ), key = lambda pair: pair[1] ) ) # Evaluation & Output # print '\n'.join(reversed(list(islice(candidates, 0, 200)))) print "\ndone!\n"This takes around 30 seconds to crawl though every possible mix of letters, sort them by usage frequency and print out the top few; that's pretty good!
Also:
...
moderna
jourdan
bombard
broaden
majored
dunmore
unarmed
abdomen
rebound
bermuda
done!
Rebound. The answer was rebound. Obvious now, right?
Confusing Draw Something image taken from http://baddrawsomething.tumblr.com/
Posted by
Daniel RodgersPryor
at
6:05 pm
Labels:
combinatorics,
Draw Something,
etree,
itertools,
permutations,
python,
wikipedia dump,
word count,
word frequency
This blog
For a long time I've been intending to document and share my various projects and hacks; this blog hopefully represents the end of my laziness in that regard. For a while I'm going to be retroactively documenting some recent projects, but soon, this blog should become a place where I can share progress and work though problems as I find them.
Subscribe to:
Posts (Atom)
Blog Archive
About Me
 Daniel RodgersPryor
 Melbourne, Victoria, Australia
 I'm hard at work on an MSc in Physics at the University of Melbourne, my research topic is 'building nanoscale neural networks'. In my (limited) spare time I tinker with 3D printing, electronics and programming.