Alex Towle & William Zhang COM 407 Final Project

Goals

The goal of our project was to train an Xpilot agent to play capture the flag in the presence of enemy agents using a genetic algorithm controlled expert system. Capture the flag works in Xpilot by having a massive ball located in a treasure on the map. An agent must find this ball and tow it by a breakable cable to a goal, where it must throw it in. The goal of our project is to train an Xpilot agent to play capture the flag in the presence of enemy agents using a genetic algorithm controlled expert system.

Motivations

We were inspired by the papers Evolving expert agent parameters for capture the flag agent in Xpilot (Parker, Penrose, 2012) and The Incremental Evolution of Attack Agents in Xpilot (Parker, Parker, 2006)

Overview

Our initial concepts for the project revolved around creating an Xpilot agent capable of collaborating with other instances of itself to create multiple distinct roles, such as a "ball runner" that "defenders" would protect. After some discussion, this was deemed too ambitious to achieve within our time frame. As such, we ultimately settled on incrementally evolving an agent that would play capture the flag while being capable of responding to the presence of an enemy agent. It would not be capable of counterattacking by shooting and would thus be entirely reliant on avoiding enemy shots to score goals. Our agent's expert system has components that control turning, thrusting, and tethering. There were three steps to our incremental evolution process. First, we created a GA to train the expert system on capturing the flag. Then we made a second GA to train shot avoidance, and finally a third GA that would learn when to switch between capturing and shot avoidance behaviors. All of our GAs used roulette wheel selection.

Part 1: Capturing the Flag

Chromosome length: 173
Population size: 50
Generations run: 168
Average fitness: 2000-2500 in final generations

For the first part of our project, we initially designed a GA with a fitness function that rewarded proximity of the ball to the goal. It was possible to score up to 1000 from this. Successfully scoring a goal would reward a large additional bonus of 5000 to fitness. The rules for capturing the flag took into account if the ball's trajectory was on track to reach the goal, as well as the ball distance from the goal and the ship's proximity to the goal so it would not crash into them. We also created our own map to use for training: an empty box based off the simple.xp map with an empty goal on the sgent's end, a ball on the other end, and no obstacles. Our expert system was based off ones we developed earlier in the semester for combat bots, so we used those same wall avoidance parameters and had the GA learn them alongside the new rules. We did not have many real difficulties getting the training running, but the learning process was unnecessarily slow. Our GA gave each individual three trials, each trial allotting the agent three minutes to successfully score as many goals as possible. When training finished, the resulting agent displayed little in the way of "aggression" and would usually idly drift around the map while carrying the ball, only firing its thrusters when it came close to a wall. Despite this, the agent would still actively navigate towards the goal whenever this happened and many later individuals succeeded in scoring several goals. We generally determined this behavior to be the result of the wall avoidance rules being too restrictive. Around this time, we also spoke to Jim about our progress and future steps to take. As a result of this discussion, we redesigned the GA and expert system with the goals of more quickly training a more aggressive bot. The new GA would only run one trial for each individual, which ended as soon as they successfully captured the ball once. It assessed fitness by granting individuals that managed to get the ball into the goal a very high fixed fitness, while awarding ones that did not based on the ball's proximity to the goal. Additionally, we relaxed thrust restrictions on the expert system as well as added sections that would make it actively navigate towards the ball and goal so long as it would not immediately lead to a crash later. While the resulting agents quickly learned how the capture the flag, some individuals would simply crash immediately afterwards while others would be capable of avoiding a crash. This presented a mild challenge when selecting a "final" chromosome for future use.

Difficulties with this stage include:
Conducting only one trial in one location for each chromosome (this makes it tough if the ship finds itself in a different position later in the training, we could solve this by parallelizing).
In the event that the chosen chromosome misses it recovers poorly, because it learned a very specific path to the goal, and will float passively if disturbed.

This video showcases the bot's ability to capture the flag.

Part 2: Shot Avoidance

Chromosome length: 125
Population size: 50
Generations run: 114
Average fitness: 200-300 in final generations

We designed a new expert system for shot avoidance that tracks nearby bullets and attempts to dodge when one is directly to its front or rear, turning and thrusting away. It works based on the shotAlert function in the Xpilot AI library. We also designed another map purpose-built for shot avoidance. This version of the map is a small box with an enemy goal in the middle holding a ball. An immobile enemy agent is also present. The agent should immediately grab the ball and begin trying to dodge shots from the enemy while the ball is attached. An individual's fitness is determined by the length of time it is able to survive. It successfully learned to dodge several bullets while carrying the ball, using the weight of the ball as a pivot point.

Difficulties with this stage include:
Being unable to modify the lethality of the goal. This eradicates potentially good chromosomes because the goal destroys them right away.
This video showcases the bot's ability to avoid shots. However, it also demonstrates a major difficulty of our project where the bot dies to the goal, since we were unable to modify the goal's lethality for this stage of the project.

Part 3: Combining

Chromosome length: 18
Population size: 50
Generations run: 42
Average fitness: 45-70 in final generations

The final step of the project was to combine the two previously-trained expert systems into one by creating a third expert system which takes both sets of variables and the chosen chromosomes from the first two parts into account and determines which expert system and chromosome is active at any given time. A third GA was also used to train the antecedents for this expert system with the goal of learning ideal conditions to switch between modes, based on shotAlert and the ball's distance to the goal. The map for this part is a variation on the map for part 1, the only change being that there are now enemy spawns. An agent's fitness is awarded 5000 if it scores a goal, plus however long it survived. If it did not score a goal, it's fitness is 1000 - the ball's distance to the goal.

Difficulties in this stage include:
An error in earlier code required us to restart training on this part the night before the due date. More time would be necessary to see better emergent behaviors.
Compounding issues mentioned in parts 1 and 2 make the bot very unreliable. Our fitness scores seemed almost completely random due to the bot's unreliability when it comes to navigating an unfamiliar location. If we were to continue this project, we would like to find a way to modify the lethality of the goal to improve dodging behaviors, as well as run multiple tests on multiple spawns for the capturing portion of the testing. We decided to run only 1 trial from only 1 spawn to make it more time efficient but it was not a good idea as it put bots who got lucky on equal footing with bots that were well trained for the task.

This video showcases the bot's ability to get the ball into the goal while also being able to avoid bullets. However, it also showcases the issue where the bot is unable to recover and repeat the task, except in very rare cases.

Overall Assessment

While the first two parts of the training went seemingly very well from the results, the third stage did not go so well. This is due to issues in the first two stages that did not become apparent until the third stage. These issues include too few trials for each individual from unvaried locations in an attempt to save time in part 1, which resulted in lucky individuals often becoming the dominant population, as they were often unable to repeat their feat. We also would like to make the goal non-lethal in part 2 to give each individual a better chance of demonstrating its ability. Despite these drawbacks and an unsatisfying and mostly unsuccessful part 3, we believe that were these issues to be fixed, then the agent would be able to learn to capture the flag reliably every time while also dodging bullets.