Making XPilot-AI Super NEAT!!!

COM409 Final Project by Jun Yi He Wu

Introduction

What's XPilot-AI

XPilot is a 2-dimensional flight combat simulation game where multiple agents/players can compete. There are multiple game modes available, however, this player will focus on combat mode, where each agent within the environment must score points by shooting at the enemy and avoid getting hit. The controls are turned right and left, thrust, and shoot. The game is open source, and the control mechanics are simple which makes it great to experiment and implement Artificial Intelligence research and concepts.

What's Neuroevolution of Augmenting Topologies (NEAT)

NeuroEvolution of Augmenting Topologies (NEAT) is a genetic algorithm (GA) for the generation of evolving artificial neural networks (a neuroevolution technique) developed by Kenneth Stanley and Risto Miikkulainen in 2002 while at The University of Texas at Austin. It alters both the weighting parameters and structures of networks, attempting to find a balance between the fitness of evolved solutions and their diversity. It is based on applying three key techniques: tracking genes with history markers to allow crossover among topologies, applying speciation (the evolution of species) to preserve innovations, and developing topologies incrementally from simple initial structures ("complexifying").

What is NEAT-Python

NEAT-Python is a library developed by Cesar Gomes Miguel, Carolina Feher da Silva, and Marcio Lobo Netto! It has many useful tools and parameters that you can use or modify that facilitate the training of Agents using the NEAT Algorithm, such as the config file that makes it easier to configure one's NEAT implementation without delving too much into the code. I will be using NEAT-Python as the main NEAT library to train my XPilot Agent

Project Goal:

The goal of this project is to implement the NeuroEvolution of Augmenting Topologies (NEAT) algorithm to evolve an XPilot agent for combat. Given that I am not great at coding expert systems, I want to delegate this task to the algorithm, because NEAT is truly NEAT!!!

Methodology

NEAT Configuration

Click here to see the full configuration text file. The NEAT config file is set up as a simple Feed Forward Neural Network (with no recurrent links) to take in 25 inputs, normalized to 0-1, into 5 categories. The first category is degrees-based inputs, these are normalized from 0-360 to 0-1. The inputs are selfTracking(), selfHeadingDeg(), and aimdir(0). The second category is coordinates-based inputs, these are normalized from 0-1100 (adjusted to lifeless.xp map) to 0-1. The inputs are selfX(), selfY(), screenEnemyX(), screenEnemyY(), shotX(0), shotY(0). The third category is distance-based inputs, normalized from -1-250 to 0-1. These include wallFeeler(1000, 0), wallFeeler(1000, 45), wallFeeler(1000, 90), wallFeeler(1000, 135), wallFeeler(1000, 180), wallFeeler(1000, 225), wallFeeler(1000, 270), wallFeeler(1000, 315), enemyDistance(0), shotDistance(0). The fourth category is speed-based inputs, normalized from 0-100 to 0-1. These includes enemySpeed(0), selfVelX(), selfVelY(), shotVel(0), selfSpeed(). The last category is only shotAlert. This input is different than the other one because occasionally it jumps to a value of 3,000, therefore it's normalized from 0-3000 to 0-1. All of these were measured beforehand to understand the possible ranges before applying normalization. There are 4 outputs, which are: Thrust(), turnLeft(), turnRight(), and fireShot(). These are then checked by a conditional controller. If it's the output > 5, then execute the output, if it's not then do not execute. The population is set to 20 individuals, with an elitism of 2 to keep the best two individuals. The compatibility threshold is set to 3 to facilitate reproduction, the minimum species size to 2 to avoid species of 1 individual, and the activation function is set to sigmoid.

Fitness Functions

There are approximately 6 Fitness Functions. These are (in order of greater fitness values) the Reward for Killing the Enemy, the Reward for Escaping a Shot, the Reward for Remaining Close to the Enemy, the Punishment for Remaining Far from the Enemy, the Reward for Remaining Alive, and Punishment for Dying. The agents have 30 seconds to prove their fitness, about 480 frames or in-game Clock ticks, and the robot they are playing against is Terminator, a pretty decent XPilot bot.

1. Reward for Killing the Enemy

Since this agent is being trained for combat, asserting a shot at the enemy is one of the most important Fitnesses. This is done by keeping track of a global variable PrevScore and then obtaining the Delta of PrevScore to the Current score. If the change in score is> 15 then we assume that the agent landed a shot, and we will reward it with 100 fitness points.

2. Reward for Escaping a Shot

Landing a shot at the enemy is amazing, but always getting hit isn't, therefore it's important to reward the agent for "escaping the shots" behaviors. This is done through another global variable called ShotWasClose. If shotAlert returns a value less than 50, then ShotwasClose, and we make the variable True. On the next frame check if the agent is alive, if the agent is alive then reward the agent with 30 fitness points.

3. Reward for Remainign Close to Enemy

Because we want the agent to eventually learn to engage in combat, we want to reward it for staying close to the enemy. To do that it's as simple as to check if the enemyDistance(0) is less than 150, if it's less then we reward the agent 10 fitness points

4. Punishment for Remaining Far from Enemy

To encourage this behavior even more, we will add a punishment if the agent is far from the enemy. This is done by checking if the enemyDistance(0) is more than 150. We will subtract 10 fitness points from the agent.

5. Reward for Remaining Alive

This is an essential fitness function, but not the most important one, therefore it's on the last tier of fitness functions. If the agent is alive, reward the agent with 5 points

6. Punishment for Dying

Similarly to remaining close to the enemy, we want to reinforce this behavior by adding a negative case. If the agent is not alive, then it will punish the agent by subtracting 5 points. Furthermore, it takes the agent a couple of frames to revive on the scene, further adding punishments for dying.

Scripts Set-Up

There are three main files, TestNEAT, TrainNEAT, and NEATManager. TestNEAT is used for testing the best Neural Network. It automatically obtains the best NEAT Genome Object (on a relative directory) by opening a pickle file saved through TrainNEAT, and then using that NEAT Genome Object to create a Net. This net then runs the AI_loop controlled by a conditional controller to dictate Thurst, turnLeft, turnRight, and fire. The inputs are normalized automatically. TrainNEAT is used for training XPilot Agents using NEAT-Python. It obtains the config text file from the relative directory, and then either creates a population or continues a population. It also creates a NEAT reporter (displays relevant data on the terminal such as fitness, etc...) as well as automatically saves a checkpoint when one generation is over. It then takes the genome objects within the population to create the Neural Networks, which are appended to a global variable called Nets. After that, the AI_loop is activated, and the Nets are then individually deployed through indexing after every 480 frames, and it will control the conditional controllers for Thurst, Fire, turnLeft, and turnRight. The inputs are also normalized. After all the genomes within the global variable Nets have their fitness tested, then it will quit the AI_loop and the population is saved onto a pickle file. NEATManager is used to run TrainNEAT until an "x" number of generations. It does that by using the subprocess module from Python to call "TrainNEAT." In the next section, I will explain why I did this.

Challenges

There are conflicting coding architectures between NEAT-Python and XPilot's core AI_loop. In essence, both of these are loops. NEAT-Python requires that the fitness functions "report" back to Pop. run(someFitnessFunction(), generations), otherwise, it cannot perform NEAT operations. AI_loop on the other hand is XPilot's core main loop. It cannot take in parameters because of the way it's designed, and thus the use of global variables, and when you use their API, to quit the AI_loop, the core automatically dumps. For NEAT-Python to work with the AI_loop, it will require the AI_loop to "quit" and then pass the fitnesses back to the Pop.run(someFitnessFunction(), generations), so that it can perform the NEAT algorithm and iterate through the next generation, which is impossible for the AI_loop. That's why it took a long time to figure out a way to go around this problem. The method I used to resolve this problem is to not use NEAT-Python's built-in generation functionality. On the Pop.run(someFitnessFunction(), generations), I programmed it to only run for one generation. After discovering that NEAT-Python can save progress as a checkpoint, I just needed a script that can continuously call TrainNEAT, to continue the generations without creating a core dumped situation. That's why I have created a script named NEATManager, to subprocess.run TrainNEAT for a number of generations, to simulate that missing component I am not using from NEAT. Another challenge is the accessibility of NEAT-Python's documentation. The website's documentation is limited and unconventional. At least for me, a beginner in NEAT, I found it extremely counterintuitive. I did not know how to set it up, how to train, how to update its fitness. Thankfully, there were a couple of tutorials on YouTube that used the library for simple applications, which helped me to understand the essential functionalities of the library.

Results + Improvements

The config file is set up to stop the algorithm once it reaches a maximum of 20,000 fitness points. However, reaching that is highly unlikely. I ran TestNEAT for a total of 200 generations. However, given the amount of inputs (a total of 25 inputs), this is a very complex problem the NEAT algorithm is attempting to solve. Running TestNEAT for only 200 generations is not enough training time to create a great combat robot, even a decent one. Because of how long each generation is (10 Minutes), and also unforeseen challenges, such as the AI_loop and the NEAT loop incompatibility, I ended up not having enough time to train the robot for long enough. To improve this, this must be running at least 10,000 generations, for it to be finally good enough for combat. Currently, most of the population's performance is far from what I expect it to be. Most of the robots within the population start thrusting rapidly upwards and continuously shooting until they hit the front wall. I suspect given the amount of inputs within the neural net, that the NEAT algorithm has not "reached enough complexity," meaning the Neural Network is still pretty simple. However, there was one specific genome that occasionally exhibited interesting behavior, such as thrusting while continuously spinning to the left or right (reminiscent of the Spinner strategy), and usually staying still in the first round until the enemy is nearby to activate it. I think that's the genome that occasionally scores higher than most of the population. While most score around -5000 fitness points, there is occasionally a fitness value between -150 to -15, and I believe this is because of this particular genome. I will submit the project as it is now, but I will let it continue running for the Winter Break and check if there are further improvements

Here's the described unwanted behavior of most genomes within the population. Continuosly thrusting until it hits a wall

Here's the described behavior of "Spinner," ocassionaly the bot, depending on where it spawn, it's able to crash onto the enemy ship

References + Credits

  • NEAT-Python Documentation
  • Nick's Repository: Applying NEAT to XPilot
  • NEAT Original Paper
  • Simple AI Tutorial with NEAT-python
  • Neat AI does Flappy Birds using NEAT and a Genetic Algorithm