The Evolution of Data Ingestion in AWS Bedrock Knowledge Base
How I sped up indexing, removed blocking, and made the UX sane
Let me start with why we decided to use AWS Bedrock Knowledge Base in the first place. Our original plan was to upload documents directly to Bedrock and work with them as-is. But there was a hard limit we couldn’t ignore: Bedrock does not accept files larger than 5 MB.
Our client needed to upload documents up to 50 MB. Splitting or recompressing them would only create more complexity. The cleaner solution was to use the Knowledge Base as a place to store and index large documents, and then let Bedrock work with the resulting chunks.
Once we made that shift, the main challenge became speed and stability of indexing. That led us to rethink the entire ingestion flow.
The old way: StartIngestionJob
Blocking behavior, full-bucket rescans, and a clunky UX
The old flow looked simple but was far from efficient:
Upload a file to S3.
Run StartIngestionJob.
The job rescans the entire bucket.
The KB stays blocked until the job finishes.
The issues were obvious:
The whole KB gets blocked during indexing.
The entire bucket is scanned even if you add just one new file.
Users have to wait 3–5 minutes.
No visibility into the status of individual files.
Documents get reindexed even if nothing changed.
Fine for a prototype, painful for a real product.
The new way: IngestKnowledgeBaseDocuments
Fast, granular, non-blocking indexing
Switching to IngestKnowledgeBaseDocuments changed everything. Now:
Upload the file to S3.
Send it directly for indexing.
The KB stays available.
Only that specific file is indexed.
The benefits are immediate:
Multiple files can be indexed in parallel.
Indexing starts instantly, without waiting for a job.
The user can keep working.
Each file has its own status.
No full-bucket rescans.
The system went from “heavy and slow” to something much closer to real-time.
Custom metadata: the backbone of multi-tenancy
To support multiple users on a single KB index, each document gets a metadata block:
{
userId: “user-email@example.com”,
fileName: “report.pdf”,
fileHash: “sha256-hash”,
uploadedAt: “2024-01-15T10:30:00Z”,
fileSize: 5242880,
contentType: “application/pdf”
}
This solves several problems at once:
Every user sees only their own documents.
Filtering with userId + fileName keeps results precise.
We track who uploaded what and when.
fileHash prevents duplicate indexing.
Search becomes more relevant.
Example filter:
const filter = {
andAll: [
{ equals: { key: “userId”, value: { stringValue: “user@example.com” } } },
{ equals: { key: “fileName”, value: { stringValue: “contract.pdf” } } }
]
};
Tricks that improved speed and UX
A. Batching: up to 10 documents per request
const batches = chunk(files, 10);
for (const batch of batches) {
await ingestDocumentsBatch(batch);
}
Fewer API calls, faster throughput, AWS limits respected.
B. Caching file status
Avoids reindexing a document if it’s already indexed:
const cache = new Map<fileHash, {
s3Key: string,
indexed: boolean,
checkedAt: timestamp
}>();
TTL is one hour. Hash-based, so renaming files doesn’t matter.
C. Asynchronous status polling
The user doesn’t wait:
waitForDocumentsIndexed(s3Uris, timeout: 60000)
.then(results => updateCache(results))
.catch(err => scheduleRetry());
Polling every 2–3 seconds.
D. Retry with backoff
Smooths out rate limit spikes:
async function retryWithBackoff(fn, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (err) {
if (err.statusCode === 429) {
const delay = Math.pow(2, i) * 1000 + random(0, 1000);
await sleep(delay);
} else throw err;
}
}
}
E. Cache cleanup
Keeps the cache lean:
cleanExpiredCache() {
const now = Date.now();
for (const [hash, entry] of cache.entries()) {
if (now - entry.checkedAt > CACHE_TTL) {
cache.delete(hash);
}
}
}
Full workflow: from upload to model response
The user uploads a PDF.
The file goes to S3.
We call IngestKnowledgeBaseDocuments.
KB starts indexing.
The user sees a “file is being processed” message.
Background polling checks status.
Once indexing is done, we update the cache.
For the next query, KB returns only the relevant chunks.
The model generates the final answer.
Fast, predictable, no blocking.
Results after the migration
Before (StartIngestionJob)
3–5 minutes to index a single file.
KB completely blocked.
No user isolation.
Only job-level status.
After (IngestKnowledgeBaseDocuments)
30–60 seconds per file.
KB remains responsive.
Full multi-tenant metadata separation.
File-level status.
Batching up to 10 files.
Cache removes redundant work.
Roughly a 5x speed improvement.
What comes next
Indexing metrics: latency, failures, distribution.
Webhooks for “file indexed” events.
Document versioning via metadata.
More filtering options: by size, date, type.
Predictive indexing for frequently used files.
Key takeaways
The KB solves Bedrock’s file-size limit and handles documents up to 50 MB.
Direct ingestion removes blocking and boosts speed significantly.
Metadata enables true multi-tenant behavior.
Batching, caching, and async polling keep the UX smooth.
Backoff logic keeps the system stable during heavy load.
If you rely on Bedrock KB, switching to direct ingestion will make the whole system feel lighter and more responsive.


