Horoscope Analysis

18 Aug 2016

I have been interested in text analysis and recurrent neural networks for a while. If you don’t know what a recurrent neural network is, I highly encourage you to check out Andrej Karpathy’s blog post The Unreasonable Effectiveness of Recurrent Neural Networks. While recurrent neural networks (RNNs) can be used on a wide range of data, one of their most promising applications is simulating conversations and other text data. Encouraged by Andrej’s post and the wide variety of Twitter bots I’ve been seeing in the wild I decided to take a stab at RNNs on astrology horoscope.

After downloading 10 years worth of daily horoscopes I looked into them with text analysis tools from SciKit Learn. First I wanted to see which words were most unique to each horoscope and which were most general (while writing this post I found a post by billpmurphy looking at a very similar question). For this analysis I used ‘Term Frequency - Inverse Document Frequency’ which ranks words based on their frequency in the document, but penalizes words that are common across the entire corpus.

So which words are most unique and most common? The following table lists the five most and least common words along with their TF-IDF score.

Aries Leo Sagittarius Taurus Virgo Capricorn Gemini Libra Aquarius Cancer Scorpio Pisces
Most Unique
folk (0.20) lions (0.69) archer (0.45) taurean (0.21) virgos (0.87) capricorns (0.64) geminis (0.61) libran (0.49) aquarians (0.81) geminis (0.11) scorpios (0.89) remote (0.10)
aggressive (0.11) methodically (0.06) sagittarian (0.21) variety (0.13) immerse (0.05) goat (0.14) versatile (0.09) virgos (0.13) loom (0.06) fluctuate (0.11) hunting (0.04) contains (0.09)
pique (0.11) im (0.06) scorpios (0.10) uninvited (0.10) utter (0.05) sagittariuss (0.07) witted (0.08) resulting (0.09) shaking (0.06) confront (0.10) volcanic (0.04) doldrums (0.09)
perseverance (0.10) gusto (0.06) classroom (0.08) dedicated (0.10) imposing (0.04) rainbows (0.06) fatigued (0.08) disenchanted (0.09) fighting (0.05) refresh (0.09) virgos (0.04) rate (0.08)
intricate (0.08) hate (0.05) scale (0.08) indulgent (0.08) exasperated (0.04) prompted (0.06) ether (0.07) covered (0.08) designer (0.05) emerges (0.09) unstoppable (0.04) temperamental (0.08)
Most Common
tingling (0.00) miscalculated (0.00) miserly (0.00) overnight (0.00) outshining (0.00) overloaded (0.00) overestimating (0.00) overreactions (0.00) output (0.00) overstating (0.00) overactive (0.00) overview (0.00)
merrily (0.00) miscalculate (0.00) miserably (0.00) overloading (0.00) outs (0.00) overload (0.00) overestimated (0.00) overreaction (0.00) outperformed (0.00) overstated (0.00) outwardly (0.00) overuse (0.00)
merrier (0.00) miracles (0.00) miserable (0.00) overload (0.00) outrageously (0.00) overindulging (0.00) overestimate (0.00) overreacted (0.00) outlived (0.00) oversights (0.00) outward (0.00) overturn (0.00)
tip (0.00) tip (0.00) misdirection (0.00) overindulging (0.00) outrageous (0.00) overhauls (0.00) overemotional (0.00) overloading (0.00) outline (0.00) overshadowing (0.00) outshining (0.00) overstretched (0.00)
lain (0.00) 13th (0.00) lain (0.00) 13th (0.00) 13th (0.00) 13th (0.00) lain (0.00) 13th (0.00) 13th (0.00) 13th (0.00) 13th (0.00) 13th (0.00)

Not surprisingly, each sign talks about itself the most. Next I fed the different horoscopes into a RNN. For this I used the python version of char-rnn with an RNN size of 512, three layers, and 75 epochs.

The algorithm sometimes spits out good advice: “This summer could lead to some exciting opportunities: stick rigidly to what you can do and no more! Friends prove to be worth it!” But there are lots of misspellings, fragments, and general nonsense: “That leonine indicates close to you might hove or to concentrate on what could be a lot happening side! The current influences working as it may sound look.”

As I train it more, I hope that the results will be better. You can check out daily horoscopes here.