The Evolution of Data Ingestion in AWS Bedrock Knowledge Base

How I sped up indexing, removed blocking, and made the UX sane

Nov 18, 2025

Let me start with why we decided to use AWS Bedrock Knowledge Base in the first place. Our original plan was to upload documents directly to Bedrock and work with them as-is. But there was a hard limit we couldn’t ignore: Bedrock does not accept files larger than 5 MB.

Our client needed to upload documents up to 50 MB. Splitting or recompressing them would only create more complexity. The cleaner solution was to use the Knowledge Base as a place to store and index large documents, and then let Bedrock work with the resulting chunks.

Once we made that shift, the main challenge became speed and stability of indexing. That led us to rethink the entire ingestion flow.

The old way: StartIngestionJob

Blocking behavior, full-bucket rescans, and a clunky UX

The old flow looked simple but was far from efficient:

Upload a file to S3.
Run StartIngestionJob.
The job rescans the entire bucket.
The KB stays blocked until the job finishes.

The issues were obvious:

The whole KB gets blocked during indexing.
The entire bucket is scanned even if you add just one new file.
Users have to wait 3–5 minutes.
No visibility into the status of individual files.
Documents get reindexed even if nothing changed.

Fine for a prototype, painful for a real product.

The new way: IngestKnowledgeBaseDocuments

Fast, granular, non-blocking indexing

Switching to IngestKnowledgeBaseDocuments changed everything. Now:

Upload the file to S3.
Send it directly for indexing.
The KB stays available.
Only that specific file is indexed.

The benefits are immediate:

Multiple files can be indexed in parallel.
Indexing starts instantly, without waiting for a job.
The user can keep working.
Each file has its own status.
No full-bucket rescans.

The system went from “heavy and slow” to something much closer to real-time.

Custom metadata: the backbone of multi-tenancy

To support multiple users on a single KB index, each document gets a metadata block:

{
  userId: “user-email@example.com”,
  fileName: “report.pdf”,
  fileHash: “sha256-hash”,
  uploadedAt: “2024-01-15T10:30:00Z”,
  fileSize: 5242880,
  contentType: “application/pdf”
}

This solves several problems at once:

Every user sees only their own documents.
Filtering with userId + fileName keeps results precise.
We track who uploaded what and when.
fileHash prevents duplicate indexing.
Search becomes more relevant.

Example filter:

const filter = {
  andAll: [
    { equals: { key: “userId”, value: { stringValue: “user@example.com” } } },
    { equals: { key: “fileName”, value: { stringValue: “contract.pdf” } } }
  ]
};

Tricks that improved speed and UX

A. Batching: up to 10 documents per request

const batches = chunk(files, 10);
for (const batch of batches) {
  await ingestDocumentsBatch(batch);
}

Fewer API calls, faster throughput, AWS limits respected.

B. Caching file status

Avoids reindexing a document if it’s already indexed:

const cache = new Map<fileHash, {
  s3Key: string,
  indexed: boolean,
  checkedAt: timestamp
}>();

TTL is one hour. Hash-based, so renaming files doesn’t matter.

C. Asynchronous status polling

The user doesn’t wait:

waitForDocumentsIndexed(s3Uris, timeout: 60000)
  .then(results => updateCache(results))
  .catch(err => scheduleRetry());

Polling every 2–3 seconds.

D. Retry with backoff

Smooths out rate limit spikes:

async function retryWithBackoff(fn, maxRetries = 3) {
  for (let i = 0; i < maxRetries; i++) {
    try {
      return await fn();
    } catch (err) {
      if (err.statusCode === 429) {
        const delay = Math.pow(2, i) * 1000 + random(0, 1000);
        await sleep(delay);
      } else throw err;
    }
  }
}

E. Cache cleanup

Keeps the cache lean:

cleanExpiredCache() {
  const now = Date.now();
  for (const [hash, entry] of cache.entries()) {
    if (now - entry.checkedAt > CACHE_TTL) {
      cache.delete(hash);
    }
  }
}

Full workflow: from upload to model response

The user uploads a PDF.
The file goes to S3.
We call IngestKnowledgeBaseDocuments.
KB starts indexing.
The user sees a “file is being processed” message.
Background polling checks status.
Once indexing is done, we update the cache.
For the next query, KB returns only the relevant chunks.
The model generates the final answer.

Fast, predictable, no blocking.

Results after the migration

Before (StartIngestionJob)

3–5 minutes to index a single file.
KB completely blocked.
No user isolation.
Only job-level status.

After (IngestKnowledgeBaseDocuments)

30–60 seconds per file.
KB remains responsive.
Full multi-tenant metadata separation.
File-level status.
Batching up to 10 files.
Cache removes redundant work.

Roughly a 5x speed improvement.

What comes next

Indexing metrics: latency, failures, distribution.
Webhooks for “file indexed” events.
Document versioning via metadata.
More filtering options: by size, date, type.
Predictive indexing for frequently used files.

Key takeaways

The KB solves Bedrock’s file-size limit and handles documents up to 50 MB.
Direct ingestion removes blocking and boosts speed significantly.
Metadata enables true multi-tenant behavior.
Batching, caching, and async polling keep the UX smooth.
Backoff logic keeps the system stable during heavy load.

If you rely on Bedrock KB, switching to direct ingestion will make the whole system feel lighter and more responsive.

vandriichuk’s Substack

Discussion about this post

Ready for more?