I've built an algorithm that learns by itself how to jump obstacles using Python with a simple form of neuroevolution.
The game is inspired by the Google dinosaur game that appears on Chrome when there is no connection.
The idea is that the red square has to avoid obstacles positioned at different heights and approaching at an increasing speed: the algorithm is given some information and must learn to jump at the right moment and survive as long as possible.
To build the game I've used tkinter module, that allows to handle the graphical interface.
The obstacles are generated on the right side of the window at variable heights and they move towards the red square.
The game is completely handled by Game class, that contains several functions, such as:
jump() function:
def jump(self): #SE STO SALTANDO PROSEGUO MOTO FINCHE' NON TORNO A GROUND if self.cang_y <= 330 and self.jumping == True: self.new_y = 330 - 10.5 * self.time_count + 0.2 * self.time_count * self.time_count self.movement_y = self.new_y - self.cang_y self.w.move(self.canguro, 0, self.movement_y) #print ("-->jump " + str(self.cang_y) + " > " + str(self.new_y)) self.cang_y = self.new_y elif self.cang_y > 330: #print ("-->FIX jump") self.new_y = 330 self.movement_y = self.new_y - self.cang_y self.w.move(self.canguro, 0, self.movement_y) self.cang_y = self.new_y self.jumping = False else: self.new_y = 330 self.movement_y = self.new_y - self.cang_y self.w.move(self.canguro, 0, self.movement_y) self.cang_y = self.new_y self.jumping = False self.jump_triggered = False
Once the game is ready, the program has to learn to play. The algorithm has only one command: jump; and it has to rely on three pieces of information: the obstacle's distance, its height and its speed. This data will be interpreted by neural networks structured like this:
The algorithm I have built is a kind of genetic algorithm where the populations are sets of neural networks, each with its own weigths. Each neural network is used to play and is given a score that is calculated according to how long it survived.
Each neural networks is a NeuralNetwork class that, along with Layer and Neuron classes handles the calculations, the weights, the activation function and the outputs.
play() function constantly feeds the neural network with the current information and, according to the output, decides whether to trigger the jump.
self.population[i].set_input([self.dist_x, self.obs_speed*100, 330-self.obs_y]) received = self.population[i].calc_output() if received[0] > received[1] and not self.jump_triggered: self.time_count=0 self.jumping=True self.jump_triggered = True else: pass
The neural network that survived the longest is used to generate the new population. The first new network is exactly identical to the best one found so far, while the others will have 15% chance for each weight to mutate, that is to be replaced by a random one.
evolve() function:
def evolve(self): self.generation += 1 self.best_score = 0 self.best_nn = -1 for i in range(0, self.n_entities): if self.scores[i] > self.best_score and self.scores[i] > self.last_best: self.best_score = self.scores[i] self.best_nn = i if self.best_nn >=0: self.population[0] = self.population[self.best_nn] if self.best_score > self.last_best: self.last_best = self.best_score pacchetto_pesi_rand_layer_1 = self.rand.rand(self.n_entities, 5, 3) *2 - 1 pacchetto_pesi_rand_layer_2 = self.rand.rand(self.n_entities, 5, 5) *2 - 1 pacchetto_pesi_rand_layer_3 = self.rand.rand(self.n_entities, 2, 5) *2 - 1 rands = self.rand.random(size=(100000)) c=0 for y in range(1, self.n_entities): self.population[y] = self.population[self.best_nn] mutation_rate = 0.15 for l in range(1, self.population[y].n_layers): for n in range(0, self.population[y].layers_sizes[l]): for w in range(0, self.population[y].layers_sizes[l-1]): r = rands[c] c += 1 if r < mutation_rate: if l==1: self.population[y].layer[l].weights[n][w] = pacchetto_pesi_rand_layer_1[y][n][w] elif l==2: self.population[y].layer[l].weights[n][w] = pacchetto_pesi_rand_layer_2[y][n][w] elif l==3: self.population[y].layer[l].weights[n][w] = pacchetto_pesi_rand_layer_3[y][n][w]
In the simulations I have run, the algorithm has almost always succeeded in finding a neural network able to survive forever.