Retrieval-Augmented Generation (RAG) is highly dependent on how documents are stored, indexed, and retrieved. When building high-performance customer support agents, scanning and ingest speed directly dictate how quickly your bots learn from updated software manuals, FAQs, or enterprise KB documents.
The Bottleneck of Traditional Document Parsing
Traditional PDF/HTML ingest mechanisms involve loading entire files, performing naive segment slicing, and passing them to an embedding endpoint consecutively. This sequential approach often takes minutes to process moderately-sized databases, failing completely when handling enterprise datasets containing over 10,000 files.
Hierarchical Chunking & Thread-Safe Parallel Parsing
Our engineering team solved the ingestion bottleneck by designing a highly concurrent hierarchical parsing architecture. When a file is uploaded to CustomerGPT:
- The pipeline divides documents into logical, semantic sections based on header hierarchies (H1, H2, H3 tags) and file layouts.
- Texts are sliced into highly dense, overlapping vector chunks (typically 800 tokens with a 150-token window) to preserve structural context.
- Embeddings are generated concurrently in thread pools using batched pipelines linked to high-performance remote models.
- Vector indices are stored in a multi-tenant PostgreSQL database equipped with pgvector index accelerations to enable high-speed cosine distance calculations.
Performance Benchmarks
By using asynchronous batch generators, our system achieves standard ingestion times of less than 2 seconds for datasets containing up to 50,000 document pages:
| Dataset Size (Pages) | Standard Pipeline Ingestion | CustomerGPT Parallel Ingest |
|---|---|---|
| 1,000 pages | 14.2 seconds | 0.25 seconds |
| 10,000 pages | 2.4 minutes | 0.82 seconds |
| 50,000 pages | 11.8 minutes | 1.96 seconds |
This massive ingestion speed enables our Scale and Enterprise users to keep their chatbots fully synchronized in near real-time without blocking dashboard interfaces or stalling ongoing user support sessions.