A Teeny Story About Reinforcement Learning and Creativity

2025-02-22

Hello there, my good fraternal readers. As usual, allow me to post an obligatory random image I got outta nowhere - this time the nowhere is Discord!

Now, supposedly I am going to write this for tomorrow, but I'm writing it today in the case that the me tomorrow gets a genius spark of inspiration. Frankly, I'm already feeling up to writing until this very sentence where I would love to do it tomorrow.

A bit of a disclaimer, however. I do not claim to be an expert on the topic, and I am merely sharing what I thought was interesting about the topic. Now that it's out of the way.

Anywho, I'd like to talk about the connection I discovered while reading the early pages of Marcus Du Satoy's The Creativity Code and the recent research on Reinforcement Learning for reasoning models as of about a week ago.

Competitive Programming with Large Reasoning Models

Interestingly, if we were to cross-reference between the two books. The research correlates to the development and nurturing of Transformational Creativity, one of Boden's Three Creativity

2025-02-22: In the book, Boden's three creativity are as so:

Exploratory - Discovering at the bounds of your field while following rules
Combinationary - Combining different concepts into one novel concept
Transformational - Inventing a new paradigm by breaking assumptions and rules

While I would love not love to dive deeper into the topic, this post isn't about that type of thing. It's about MY thoughts and ideas. Now, while reading this I was thinking about how Transformational Creativity likely meant Generative Pre-trained Transformer (GPT), but then I'm realised I'm likely drawing conclusions from potatoes to apples. They may overlap a little, but they are different concepts. These just share the same unfortunate base word of transform.

It's been shown that for AlphaZero, which was trained on zero datasets and learned solely on Reinforcement Learning (RL) training, it was able to figure out actual revolutionary strategies that could consistently beat top chess engines in the market. Surprisingly, this insight didn't inspire researchers to perhaps use RL training on LLMs until recently when Chain of Thought reasoning became a thing.

Edit: I'd like to comment that the me a day into the future does not share the same views on the ending sentence of the paragraph now. I believe that a year or two is a relatively short timeframe for this RL training to be applied. However, that part is pretty trivial, but I thought I should've cleared things up or two such that the me in the future feels nicer.

This insight could shed some light on generating better and better strategies and methods in dozens of field including Math, Science, Art, Music and whatnot. It could break our traditional paradigms on how something should be, and figure out better ways to represent the problem at hand. If I do recall correctly, this exact thing is currently being developed at Google right now for doing math proof.

2025-02-22: AlphaProof, it's not I was unaware of the AI's name, but I was honestly too lazy to do it yesterday. Still, I believe that AlphaProof has really good grounds on contributing towards the greater good of mankind. The ideas behind AlphaProof (and apparently AlphaGeometry 2 too? I didn't know that even existed.) could further be upgraded and progressed with more resources backing the training of it.

Regardless of what anyone is doing though, it is abundantly clear that RL Learning has massive potential in terms of generating creative thoughts. Could it be that creativity is algorithmic rather than spontaneous? That's a great thought experiment to think about.

2025-02-22: Understandably, around this time I was already quite tired. So, I ended my writings in such a brief way. I do think it was a pretty good ending, but I have some things to add. Creativity isn't really well defined objectively. If we talk about something intrinsic to the human feelings, we honestly find it difficult to make it objective and oftentimes we have to resort to philosophical platitude, which does not jive well with the requirements of research, where you have to define and quantify these subjective things. I believe that the inherent human touch to something doesn't really exist, you can't transfer emotions into a painting. You can convey them, but if an AI masters the art of having to convey them, you could falsify the human touch. Assuming we were to think these machines as machines capable of creativity, then they would've already been creative a long time ago since the advent of GPT-3.5. Verdict of today, AIs can be creative.

Now then, if I have any changes to make to this post, I'll be sure to update it. I'm just too trigger-happy to upload this right now, and while it does not provide as much value as I'd like, I'm trying to get rid of perfectionism rather than providing value. So! Remember to both kill yourself and to not expect great value from any of my writings (I predict one day it'll be my horcrux).

2025-02-22: Hello, the me from 1 day ago. You're doing great. It's 05:40 here right now, but I came back to update the thoughts you had.