Tech Thoughts

Deep Learning for Solving Math Problems

07.02.2023


Tensor-Product multi-head attention

This post presents a university project developed for the Deep Learning course exam, where we explored how deep learning models can be applied to answer mathematical questions. Our work started by analyzing the Mathematics Dataset described in prior research, which helped us understand the challenges and guided our experimental setup.


To establish meaningful comparisons, we implemented three different models:


The goal was to evaluate how different sequence-to-sequence architectures, based on diverse design strategies, perform on mathematical reasoning tasks.


Due to hardware limitations (we worked with the free version of Google Colab), we could not use the full dataset or run extensive training. Instead, we selected three modules ("numbers round," "calculus differentiate," and "polynomials evaluate") totaling around 6 million samples. This setup forced the models to learn multiple types of problems without specializing too narrowly.


Despite the limited resources, our experiments confirmed some key findings:


Although constrained by training time and compute power, this project clearly showed the potential of modern Transformer architectures for structured reasoning tasks like mathematics. We are confident that with more resources, the models - especially the TP-Transformer - would have achieved even stronger results.


Check out the GitHub repo for the full PDF report and the Colab notebook with the complete implementation.

Deep Learning Mathematics Dataset Transformer LSTM Tensor-Product Transformer Sequence-to-Sequence Models Machine Learning