Reinforcement Learning Competition

Total Award:

  • Time to register:

    07/14/2020 - 08/05/2020
  • Time to upload source code:

    08/08/2020 - 09/26/2020
In Vietnam for the first time!


In this game, you have to control your character moving on a map represented as a two-dimensional matrix to overcome obstacles and dig for as much gold as possible. Pay close attention to the obsstacles on the map and be careful with your energy bar to come up with the right tactics!


You are a treasure hunter walking into the forest to search for gold mines buried deep down the land in the forest. Don't hesitate because other hunters are also rushing to find the treasures. The sacred forest is full of dangerous traps but don't be discouraged, be the winner and proud of yourself!


You need to dig for as much gold as possible using the least amount of energy. There aremany traps and obstacles in the forest so move wisely and use the right command in time.


  • treasure hunters (1 ≤ n ≤ 4) start at the same point in the forest.
  • The forest map is represented as a w*h two-dimensional matrix. (w=21, h =9)
  • The original amount of energy of each hunter is 50.
  • The game has 100 turns and for each turn, the player has to perform one of the following actions.
GoLeft GoRight GoUp GoDown Free Craft
Go to the left ← Go to the right → Go up ↑ Go down ↓ Take a rest Dig for gold
  • After each turn, the system will send the actions performed by all players and updateeach player's gold and energy.
  • To win, players must pass the obstacles and dig for the most amount of gold.


Going to one of the following cells or digging for gold consumes the following amount ofenergy​:


-1 energy


-4 energy

Dig for gold

-5 energy


is the random value between [-5, -20]​ energy


-5 -> -20 -> -40 -> 100​ energy. The amount of energy neededwill increase with the number of times the treasure hunters enter the swampy box.The first time will be subtract 3 energy, the second will be 5 ... (the number of timescounted by turns)


-10 energy. Use only 1 time (per turn). If in a turn, more than one player goesinto a trap box (not used), then all will be deducted 10 energy. After being used, thetrap turns into land.


In order to regain energy, the adventurers must take time to rest:

Number of turns 1 turn 2 consecutive turns 3 consecutive turns In case energy exceeds E
Energy regained [E/4] [E/4] + [E/3] [E/4] + [E/3] + [E/2] No regain

Note: the result is rounded down to the nearest integer


Assuming that the amount of gold in the mineis G and there are m people digging for gold.The amount of gold each person collects is calculated as in the table below:

Gold The Gold eachperson gains Remaining Gold Energy consumed
G >= m*50 50 G-m*50 5
G < m*50 ceil(G/m) 0
(this gold mineturns into land)


    • Not standing at a gold mine but digging for gold: -10
    • No action is taken at the current turn: Eliminated
    • Player's energy <= 0Eliminated


  • Only one player left because other players “died” (out of energy or away from the map), this player is the winner.
  • If  mines are out of gold before or T turns happen:
    • The player with the highest amount of gold will be the winner.
    • If there is more than one player with the same amount of gold and that amount is the highest, the player who has the most energy is the winner.
    • If there are several players with the same amount of gold and energy and those amounts are the highest, the system will randomly select a winner.


  • Operating System: Ubuntu 18.04.4 LTS (4 vCPUs, 2.5 GHz, No GPU, 2 GiB memory)
  • Languages: Python3 (ver. 3.6.9)
  • Code Size + Library: < 30MB (players must use the available library of the coding language or embed library into your source code so it can be run on the server.)
  • Tensorflow 1.14.0 and 2.2.0
  • Keras 2.3.1
  • Numpy 1.18.4
  • Pandas 0.15
  • PyTorch 1.5.0
  • joblib 0.16.0
  • ray 0.8.6 (ray[rllib], ray[tune])
  • requests 2.24.0
  • semver 2.10.2 
  • tf-agents 0.3.0 (0.5.0 on tensorflow 2.2.0)
  • Pyqlearning v1.2.4
  • Mushroom-RL v1.4.0
  • gym 0.17.2
  • opencv-python
  • prettytable 0.7.2
  • yacs 0.1.7


  • The strucure is as below: 

    • Your project needs 2 folders src and build
    • Folder src includes source code and file build.sh: When file build.sh is run, source codes which are compiled will be sent to folder build. Note that a source code is successfully built if there is no message (including warning) returns when executing file build. Run command is python3.
    • Folder build needs file run.sh
    • Your project will be run when calling file run.sh
  • How to upload source code:
    • When your client connects successfully to server and gets at least 50 points, it is considered satisfactory and the status is Verified
    • Compress the project into a .zip file with any name and upload the file to system
    • After uploading the code, the system will return one of the following status: Building, Build Failed, Verifying, Failed, Verified
      • Build Failed, Failed: the system will allow users to download the log file.
      • Verifying, Failed, Verified: users will be able to view the game play screen.


Server and player will communicate via socket. Socket information (host, port) will be transfer through input parameters when file run.sh

  • Messages between player and server:
    • When player connects to server successfully, server will send a message to player with the following content:
          "playerId": int,
          “posx”: int,
          “posy”: int,
          “energy”: int,
          "gameinfo": {
              "numberOfPlayers": int,
              "width": int,
              "height": int,
              "steps": int,
                      "posx": int,
                      "posy": int,
                      "amount": int
                      "type": int, //0: land, 1: forest, 2: trap, 3: swamp
                      "posx": int,
                      "posy": int,
                      "value": int

Player information includes:
- playerId: the player's ID
- (posx, posy): the initial initialized coordinates
- energy: initial initialized energy

♦ Game information includes:
- numberOfPlayers: the number of players participating in the game
- (width, height): size of the map
- steps: the maximum number of steps (turns) in the game 
- golds: information of gold mines including coordinates (posx, posy)and initial amount of gold (amount)
- obstaclesinformation of obstacles along the way including coordinates (posx, posy), type of obstacles (type), and value of obstacles (value <= 0, players will loss (-value) energy if they move into the cell). If obstacle's type is forest (type = 1), value is 0.

Each type of obstacles is represented as different integers as follows: 0: land, 1: forest, 2: trap, 3: swamp

  • Each turn, the player sends one of the following commands: "0", "1", "2", "3", "4", "5". The commands represent the following actions:
     "0"  go left
     "1"  go right
     "2"  go up
     "3"  go down
     "4"  take a rest
     "5" dig for gold
    except the above commands eliminated
  • After sending actions to server, the server responds to the new status of the map as follows:
                    "playerId": int,
                    "posx": int,
                    "posy": int,
                    "score": int,
                    "energy": int,
                    "lastAction": int, //0: go left, 1: go right, 2: go up, 3: go down, 4: free, 5: craft, 6: eliminated (in case, action = null)
                    "posx": int,
                    "posy": int,
                    "amount": int
            ] ,
                    "posx": int,
                    "posy": int,
                    "type": int,
                    "value": int

The status of all players in the match players includes:
- playerId: the player's ID
- (posx, posy): the current coordinates of the player
- score: player's score (amount of gold mined)
- energy: the remaining energy of the player
- lastAction: player's action at previous step (state = 6 means player is eliminated)
status: player's status (playing, eliminated or stop - end game)

golds: the current information of golds on the map

changedObstacles: list of obstacles which have been changed by previous action (gold -> plan, trap -> plan, or new value for swamp)

Value of lastActionis one of following:

 "0"  go left
 "1"  go right
 "2" go up
 "3"  go down
 "4"  free
 "5"  craft
 "6" eliminated

Value of statusis one of following:

 "0"  playing
 "1" eliminated (went out of map)
 "2" eliminated (empty energy)
 "3" eliminated (wrong action)
 "4" stop (empty golds)
 "5" stop (no more steps)
  • If player is eliminated, server will send last message to player and then disconnet the player.
  • When the game is over, the server will disconnect the players.
  • How to calculate the coordinates of the map:
    • (0,0)is the top left point of the map.
    • (posx, posy) is calculated from(0,0) with posx is the posx-thth column and posyis the posy-th row on the map.


  • The time needed to connect to the server: 1 minute. That means within 1 minute from the project is run (run file run.sh), if the server does not receive the connection from the player, it is considered a failed connection. And players will be eliminated from the competition.
  • Waiting time between steps: 1000 ms. This means that within 1000 ms from the moment the server sends the status of the game to the player, if the server does not receive action from the player, the player will be eliminated from the match. The connection between the player and the server will be disconnected.


  • Right after receiving the .zip file, the server will execute the compile code (run file build.sh) and run the project (run file run.sh) to ensure the source structure is correct and the connection to the server socket is successful. Note that this time the server will neither score nor evaluate whether the message content that the player sent to the server is right or wrong.

  • Sample source code:

  Download sample source code: sample-source

  Environment settings guide: MInerAI-Set-up-Environment.pdf 

  Local training guide: MinerAI-CodeAISample_en.pdf

  Version Date: 2020/09/08 10:26

  Change Log: Change logs.docx

  (You can also use Git to clone source code and documents from following repository: https://github.com/xphongvn/rlcomp2020)


Teams are allowed to update unlimited submissions on the sever system, but can only play maximum 20 turns with the Organizer's Bot.

The turn with the Bot will be automatically activated after the code is verified successfully. Each turn has 5 matches. 

No Data Found

No Data Found