FastML

Machine learning made easy

^one weird trick for training char-^r^n^ns

Character-level recurrent neural networks are attractive for modelling text specifically because of their low input and output dimensionality. You have only so many chars to represent - lowercase letters, uppercase letters, digits and various auxillary characters, so you end up with 50-100 dimensions (each char is represented in one-hot encoding).

Still, it’s a drag to model upper and lower case separately. It adds to dimensionality, and perhaps more importantly, a network gets no clue that ‘a’ and ‘A’ actually represent pretty much the same thing.

The simplest solution is to discard uppercase and just use lowercase. We propose a more elegant way to deal with the two problems mentioned above: inserting special markers before each uppercase letter.

Hello World -> ^hello ^world

The resulting text is still quite readable. Of course you need to make sure there are no carets in your input to start with, but this is a minor matter: one could use any character as a marker, or invent one. Remember that a char is just a sparse vector: we can make it longer by one element, and that abstract element can be our marker.

R33
Gents witnessing the emergence of R33, the very first char-RNN, back in the day

Here’s how to convert mixed-case text:

s = 'Hello World'
re.sub( '([A-Z])', '^\\1', s ).lower()

What we do is insert a caret befor each uppercase letter and then turn the whole string to lowercase (\1 is a backreference to a subgroup marked by parens in the first pattern; we need to quote the backslash, hence \\1). An alternative is to perform both operations inside sub() using a function to modify the match and return a replacement:

re.sub( '([A-Z])', lambda match: "^" + match.group( 1 ).lower(), s )

Should we need to convert stuff back, we’d use a similar construct:

s = '^hello ^world'
re.sub( '\^(.)', lambda match: match.group( 1 ).upper(), s )

A caret means “start of the line” in a regular expression, so we need to quote it with a backslash.

Does it work? It does. The network is especially quick to learn . ^ combo, representing the end of a sentence and an uppercase letter at the beginning of the next one.

The trick described above is meant for text. People have used char-RNNs for modelling other stuff. It is conceivable to use a similar gimmick for source code, or music, for example to insert bar markers - that might help a network to learn the rhytm.

Talk about the article with a chatbot (Llama 3.3 70B from together.ai). The chatbot is rate-limited to six requests per minute. The messages are stored on your device.
code, neural-networks

« Adversarial validation, part one Adversarial validation, part two »

Comments