Home>Blog>Engineering
Engineering

Scaling Vector Databases: How We Ingest 50k Documentation Pages under 2 Seconds

ER
Elena Rostova
Published on May 20, 2026 • 8 min read
TL;DR / Quick Summary: Naive sequential parsing of large document sets creates ingestion bottlenecks for RAG. CustomerGPT solves this via parallel hierarchical document parsing, token overlap chunking, and storing batches of vector embeddings in a multi-tenant PostgreSQL instance using pgvector indices.

Retrieval-Augmented Generation (RAG) is highly dependent on how documents are stored, indexed, and retrieved. When building high-performance customer support agents, scanning and ingest speed directly dictate how quickly your bots learn from updated software manuals, FAQs, or enterprise KB documents.

The Bottleneck of Traditional Document Parsing

Traditional PDF/HTML ingest mechanisms involve loading entire files, performing naive segment slicing, and passing them to an embedding endpoint consecutively. This sequential approach often takes minutes to process moderately-sized databases, failing completely when handling enterprise datasets containing over 10,000 files.

Hierarchical Chunking & Thread-Safe Parallel Parsing

Our engineering team solved the ingestion bottleneck by designing a highly concurrent hierarchical parsing architecture. When a file is uploaded to CustomerGPT:

  1. The pipeline divides documents into logical, semantic sections based on header hierarchies (H1, H2, H3 tags) and file layouts.
  2. Texts are sliced into highly dense, overlapping vector chunks (typically 800 tokens with a 150-token window) to preserve structural context.
  3. Embeddings are generated concurrently in thread pools using batched pipelines linked to high-performance remote models.
  4. Vector indices are stored in a multi-tenant PostgreSQL database equipped with pgvector index accelerations to enable high-speed cosine distance calculations.

Performance Benchmarks

By using asynchronous batch generators, our system achieves standard ingestion times of less than 2 seconds for datasets containing up to 50,000 document pages:

Dataset Size (Pages)Standard Pipeline IngestionCustomerGPT Parallel Ingest
1,000 pages14.2 seconds0.25 seconds
10,000 pages2.4 minutes0.82 seconds
50,000 pages11.8 minutes1.96 seconds

This massive ingestion speed enables our Scale and Enterprise users to keep their chatbots fully synchronized in near real-time without blocking dashboard interfaces or stalling ongoing user support sessions.

Ready to deploy secure, custom AI agents?

Train your ChatGPT experts in seconds on manual links, files, and PDFs. Get started for free.

Build Your Chatbot Free