Sonnet Algorithm Number 1
I would like to say there are a number of things wrong with this algorithm. But in my defense I wrote it in two hours and had never used the python nltk (natural language tool kit).
First the code and then the critique:
import nltk, re, pprint,sys,string
from urllib import urlopen
from sets import Set
try:
filename = "input"
stanza = ["evil","sin","punish"]#,"blessing"]
couplet = ["redeem"]
if sys.argv[1] == filename:
url = sys.argv[2]
data = urlopen(url).read()
tokens = nltk.word_tokenize(data)
text = nltk.Text(tokens)
#text = nltk.Text(nltk.corpus.gutenberg.words('bible-kjv.txt'))
for i in stanza:
text.concordance(i,lines=4)
#text.similar(i)
for i in couplet:
text.concordance(i,lines=2)
#text.similar(i)
else:
pattern = re.compile("[0-9]|\;|\:|\,")
poemwords = re.compile("|".join(stanza+couplet))
poem = []
f = open(filename,"r")
lines = f.readlines()
f.close()
lines = lines[1:len(lines)]
f = open(sys.argv[1]+".txt","w")
for l in lines:
if l.strip()[-1] != ":" :
sense = pattern.split(l)
newline = [s for s in sense if poemwords.search(s) != None][0].strip()
print newline+"\n"
f.write(newline+"\r")
f.close()
except:
print "Error"
for arg in sys.argv:
print arg--------------
The bash script:
python damnengine.py input http://www.gutenberg.org/cache/epub/10/pg10.txt > input python damnengine.py damn
-----------
The Algorithm
The overarching idea was to use the sonnet structure abab cdcd efef gg and to use a different word related to subject of the poem for each stanza. Because I had no time to find rhyming words when I was writing this (perhaps I will do this in v2) I just used the basic 4 line 2 line structure.
What did I do?
- I picked the words for each stanza and the couplet at the end- 4 words total- that relate to my subject matter. These words were not terribly imaginative on my part. I could probably use the text.similar function to select words.
- Downloaded the text from gutenberg (free) and load it into the nltk parser. Really I should have postparsed the text and divided it up by sentences rather than lines. There are plenty of lines that start with partial words.
- Used text.concordance to find the first 4 lines in the text with my key words, and then find 2 lines with the couplet words. Really what I should have done is find all lines with the selected words (or the similar words) and then selected the lines with the appropriate rhymes and/or manipulate the word structure of lines or mix and match lines. The list goes on...
- Then because concordance prints everything to the standard output I ended up writing a shell script that pipes this output into a file and then processes the file again. I could have just redefined the standard output to a file and avoided the whole shell script thing but then I wrote this in 2 hours, and spent 10 minutes saying f@#$@k why isn't the concordance output going into my hashtable.
- I ran the program a second time (this is processed in the second part of the massive conditional statemnt in the program). This time I processed the results of the concordance. I split each line based upon punctuation and select the part of the sentence that contains my ket word. Again this feels like a hack, and I had to do it because the line breaks broke in the middle of words so I could not just take an entire line willy nilly. But when I run this script against something punctuation light - like Njal's saga, it does not work. This just means I would have to preparse for sentences.
- Thats it. There is something buggy - because one time I ran this program it output 3 lines 4 lines 3 lines 2 lines and I think it has something to do with my splitting up the line based on punctuation.
Conclusion
- Despite all these shortcomings I am actually quite happy with the poems that this generates and they do in some sense seem to be my poems - although slightly dada. I do feel like I have authorial power over these creations. First, by the selection of the format (the modified sonnet), and then by the selection of the key words and the decision to have each stanza contain one key word.
Next Steps
- Break out the key word selection to a function or externa file
- Add functionality for finding rhyming words
- Look at all concordances not just the first 4 lines and find the 'best' match - whatever I decide best is.
- Preparse the text based on sentences. I do think however that I should not be matching entire lines and bringing them into a poem. So perhaps I need to...
- Work on some syllable analysis functionality AND
- I would say some sentence structure analysis but this seems antithetical to poetry - eventually I will probably incorporate sentence structure information so I can create my own grammatically incorrect lines with some intentionality
Final Thoughts
Sometimes less is more. This is a super simple algorithm but produces lovely cryptic poems. The artistry is in the selection of the text and the selection of the key words.