top of page

LLMs in izDox AI Platform

Project associated with:

bizAmica

Context

izDOX is an AI platform developed by bizAmica for document processing and extraction. Originally built using deep learning models, the platform required enhancements to improve the accuracy and comprehensiveness of data extraction from financial and logistics documents. By leveraging Large Language Models (LLMs), the document processing pipelines were optimized, resulting in a 25% improvement in information extraction accuracy while requiring fewer training data.

Requirements

Enhanced Data Extraction:

  1. Improve the accuracy and comprehensiveness of data extraction from various document types, specifically financial and logistics documents.


Reduction in Training Data:

  1. Leverage LLMs to reduce the amount of training data required compared to traditional deep learning models.


Seamless Integration:

  1. Ensure that the new LLM-based pipelines integrate smoothly with the existing document processing infrastructure.


Versatile Processing Capabilities:

  1. Develop pipelines capable of handling both text and vision-based document processing tasks.


Scalability and Performance:

  1. Maintain scalability and performance efficiency with the new LLM-based enhancements.

Approach

Requirement Analysis:

  1. Collaborating with stakeholders to understand the specific needs and expectations for the new pipelines.

  2. Identifying key areas where traditional models underperformed and where LLMs could provide significant improvements.


Design and Architecture:

  1. Designing two new pipelines: one for text extraction using LLMs and another for vision-based document processing.

  2. Planning the integration of these pipelines with the existing document processing architecture.


Implementation:

  1. Text Extraction Pipeline:Utilizing LLMs to extract text from documents.
    Parsing the extracted text into a structured format compatible with the current pipelines.

  2. Vision-based Pipeline:Implementing vision LLMs to process document images.
    Parsing the structured output for seamless integration with existing systems.


Testing and Validation:

  1. Conducting extensive testing to ensure that the new pipelines accurately and comprehensively extract data from financial and logistics documents.

  2. Validating the reduction in training data requirements and overall performance improvements.


Deployment and Monitoring:

  1. Deploying the new LLM-based pipelines within the izDOX AI Platform.

  2. Implementing monitoring and logging to track performance and accuracy, and to identify any issues for prompt resolution.

Technologies Used

Large Language Models (LLMs):

  • Languages: Python

  • Libraries: Hugging Face Transformers, Langchain

  • Cloud Platforms: AWS


Deep Learning Models:

  • Languages: Python

  • Libraries: TensorFlow, PyTorch


Text Processing Pipelines:

  • Languages: Python

  • Libraries: Hugging Face Transformers, SpaCy

  • Cloud Platforms: AWS (Textract), Azure

bottom of page