AI-enabled Document Classification for Document Management System

Abto-logo-2
ABTO Software
  • Score Awaiting client review
    n/a
  • Date Published
  • Reading Time 3-Minute Read
document-classification-1110x720-tiny-768x498-1

Abto Software built an AI document classification service for a construction management system.

Overview of the Client’s Document Management System

Our client is a European provider of project and document management tools used in construction and engineering. The comprehensive cloud-based solution is available both for mobile and desktop platforms. It is mostly used by architects, engineers, real estate developers, housebuilders, and contractors.

The company has more than two decades of experience in providing digital solutions for the construction industry. It strives to keep up with the times and offer its customers the most efficient construction management solutions. They approached Abto Software with the task of implementing an automated document classification service for their construction DMS (Document Management System).

AI Document Classification

Our solutions vary from customer support automation for FinTech to demand forecasting for retail. This time, we had to implement an AI service for automatic document classification within a construction DMS.

Our client wanted to solve the main end-user problem of their customers. That is an annoying and time-consuming process of manual input of the document details into the DMS. Since this task is very tedious, users often skipped it. This caused many malfunctions in other modules of the document management system. We aimed to automate the classification process and ensure a smooth user journey.

Benefits of the Delivered Document Classification Solution

  • Smooth user journey. Our solution automates a tedious and often frustrating process of manual entry of the document details into the DMS.
  • Extensive document support. The delivered document classification solution supports multipage documents, image-only PDF files, and other non-readable documents.
  • Robust multilabel classification. The solution performs classification within three different labels, 31 industry-specific classes in total.
  • High classification accuracy. We have achieved 96% classification accuracy on the document level and 98% classification accuracy on the label level.
  • Enhanced accessibility. As the document classification had to be integrated into the client’s document management system we deployed it on AWS cloud.
  • Reliable scalability. Cloud deployment of the developed APIs allows our client to scale the solution using Amazon scaling services.
  • GDPR compliance. The selected development and deployment methods ensure the highest security and customer data protection standards.

Team & Technologies

  • Team: project manager, solution architect, 2 data scientists, Python developer, DevOps engineer
  • Project duration: 2.5 months
  • Tech stack and data Science tools: Python, scikit-learn, Tesseract OCR, Amazon Web Services (AWS)
  • Investigated text vectorization methods: Word2vec, fastText, GloVe, TF-IDF, Universal Sentence Encoder, BERT
  • Investigated classification algorithms: LSTM, GRU, RNN, Bidirectional RNN, SVM, KNN, XGBoost, AdaBoost, Logistic Regression, Decision Trees, Naïve Bayes methods (Gaussian Naïve Bayes, Multinomial Naive Bayes, Categorical Naïve Bayes)