Chemical molecules can be abstracted as graphs where nodes represent atoms and edges represent bonds. Each node is assigned a feature vector containing information about the atom. For example the type of atom (carbon), the number of bonds (eg. 3), etc. However there exists no study that investigates the importance of these features to perform better property predictions. Often, all features provided from a chemical software package are taken without much thought about the potential of overfitting or the presence of redundant features.
During this project, different sets of node features and edge features will be investigated (based on chemical reasoning), ultimately leading to a justified explanation of which information is required to achieve optimal results.