Building AI Applications with Cloudflare Vectorize - Quick Guide

Cloudflare’s Vectorize is a globally distributed vector database designed for building AI-powered applications with Cloudflare Workers[1].

Setup and Configuration

Create a Vectorize Database

class_obj = {
    "class": "MyCollection",
    "vectorizer": "text2vec-openai"
}
 
client.schema.create_class(class_obj)

Core Implementation Steps

1. Generate Embeddings

import { Ai } from '@cloudflare/ai';
 
const ai = new Ai(env.AI);
const embedding = await ai.run('@cf/baai/bge-base-en-v1.5', {
    text: [userQuery]
});

2. Insert Vectors

await env.TEXT_EMBEDDINGS.upsert([{
    id: someId,
    values: vector,
    metadata: {
        text: originalText
    }
}]);

3. Query Similar Vectors

let matches = await env.TEXT_EMBEDDINGS.query(queryVector.data, { 
    topK: 1 
});

Key Features

Storage Capabilities

Stores vector embeddings from various ML models[1]
Supports integration with OpenAI and Cohere embeddings[5]
Enables storage of metadata alongside vectors[9]

Query Functionality

Performs semantic similarity searches[4]
Supports classification and recommendation systems[7]
Enables fast nearest-neighbor searches with response times under 100ms[4]

Use Cases

Primary Applications

Semantic search implementation
Classification tasks
Recommendation systems
Anomaly detection
Retrieval Augmented Generation (RAG)[7]

Integration Options

Works with Cloudflare Workers AI for embedding generation
Connects with R2 for image storage
Integrates with D1 for structured data storage[1]

The system automatically optimizes and regenerates indexes when new data is inserted, making it efficient for production deployments[7]. For optimal performance, vectors should be generated once and stored, rather than regenerating them on every request[5].

Sources [1] Overview | Vectorize - Cloudflare Docs https://developers.cloudflare.com/vectorize/ [2] Cloudflare Workers Vector Demo https://community.cloudflare.com/t/cloudflare-workers-vector-demo/572199 [3] Vector Embedding Tutorial & Example - Nexla https://nexla.com/ai-infrastructure/vector-embedding/ [4] From prototype to production: Vector databases in generative AI … https://stackoverflow.blog/2023/10/09/from-prototype-to-production-vector-databases-in-generative-ai-applications/ [5] Vectorize: a vector database for shipping AI-powered applications to … https://blog.cloudflare.com/vectorize-vector-database-open-beta/ [6] The 5 Best Vector Databases | A List With Examples - DataCamp https://www.datacamp.com/blog/the-top-5-vector-databases [7] Vector databases - Cloudflare Docs https://developers.cloudflare.com/vectorize/reference/what-is-a-vector-database/ [8] What Is A Vector Database? Top 12 Use Cases - lakeFS https://lakefs.io/blog/what-is-vector-databases/ [9] Creating a Smart Second Brain: Leveraging Cloudflare Workers … https://dev.to/andyjessop/building-an-ai-powered-second-brain-in-a-cloudflare-worker-with-cloudflare-vectorize-and-openai-23di

Recommended Chunk Sizes

Base Recommendations

Standard chunk size: 1024 characters
Overlap setting: 128 characters
Starting baseline: 250 tokens (~1000 characters)

Chunking Strategies for Cloudflare Workers

Fixed-Size Chunking

const chunks = [];
const chunkSize = 1024;
const overlap = 128;
 
function chunkText(text) {
  for (let i = 0; i < text.length; i += chunkSize - overlap) {
    chunks.push(text.slice(i, i + chunkSize));
  }
  return chunks;
}

Performance Considerations

Document Size	Optimal Chunk Size	Overlap
Small (<10KB)	512	64
Medium (10-100KB)	1024	128
Large (>100KB)	2048	256

Key Features

Automatic index regeneration when new data is inserted
Support for metadata storage alongside vectors
Integration with Cloudflare Workers AI for embedding generation
Compatibility with R2 for document storage

For optimal vector database performance in Cloudflare Workers, maintain chunk sizes that balance context preservation with retrieval efficiency. Smaller chunks improve search precision but require more storage space, while larger chunks preserve more context but may reduce retrieval accuracy.

Sources [1] Chunking: Let’s Break It Down | DataStax https://www.datastax.com/blog/chunking-to-get-your-data-ai-ready [2] How to Choose the Right Chunking Strategy for Your LLM Application https://www.mongodb.com/developer/products/atlas/choosing-chunking-strategy-rag/ [3] 7 Chunking Strategies in RAG You Need To Know - F22 Labs https://www.f22labs.com/blogs/7-chunking-strategies-in-rag-you-need-to-know/ [4] Considerations for Chunking for Optimal RAG Performance – Unstructured https://unstructured.io/blog/chunking-for-rag-best-practices [5] Chunking Strategies for RAG in Generative AI https://adasci.org/chunking-strategies-for-rag-in-generative-ai/ [6] How to Chunk Documents for RAG https://www.multimodal.dev/post/how-to-chunk-documents-for-rag

Here are the optimal vector dimensions for Cloudflare Vectorize implementations:

Standard Dimensions

Model Type	Dimensions	Use Case
BGE Base	768	Text embeddings with Workers AI[1][4]
OpenAI	1536	Text with higher precision[4]
Cohere	768	Multilingual text[4]

Technical Constraints

Maximum Limits

Upper limit: 1536 dimensions per vector[3]
Metadata limit: 10KiB per vector[3]
Storage capacity: Up to 5,000,000 vectors per index[3]

Performance Considerations

Smaller dimensions offer faster search performance
Larger dimensions provide better accuracy for similar content
More dimensions increase compute and memory usage[1]

Cost Optimization

For cost-effective implementation, consider starting with 384-768 dimensions for experimental workloads, as this provides a good balance between accuracy and resource usage[2]. Scale up to higher dimensions only when needed for specific accuracy requirements.

Sources [1] Vector databases - Cloudflare Docs https://developers.cloudflare.com/vectorize/reference/what-is-a-vector-database/ [2] Pricing | Vectorize - Cloudflare Docs https://developers.cloudflare.com/vectorize/platform/pricing/ [3] Limits | Vectorize - Cloudflare Docs https://developers.cloudflare.com/vectorize/platform/limits/ [4] Create indexes | Vectorize - Cloudflare Docs https://developers.cloudflare.com/vectorize/best-practices/create-indexes/

Core Chunking Methods

Fixed-Size Chunking The simplest approach uses character-based splitting with a defined chunk size and overlap. While basic, it provides a foundation for more sophisticated methods[1][2]. Best suited for initial prototyping and simple use cases where semantic coherence is less critical.

Recursive Chunking A more sophisticated approach that splits text hierarchically using multiple separators in descending order (paragraphs, sentences, words)[3][4]. This preserves document structure better than fixed-size chunking while maintaining reasonable chunk sizes.

Advanced Techniques

Semantic Chunking Splits text based on meaning rather than fixed sizes by analyzing sentence embeddings and semantic similarity[1][4]. This ensures chunks maintain topical coherence and logical flow, though it requires more computational resources.

Smart Chunking Offers multiple strategies:

Basic: Combines sequential elements while respecting size limits
By Title: Preserves section boundaries
By Page: Maintains page-level separation
By Similarity: Groups topically similar content[3]

Specialized Methods

Document-Specific Chunking

Markdown: Splits based on headings and formatting
LaTeX: Chunks by document structure and commands
HTML: Preserves element hierarchy and metadata[2][5]

Best Practices

Start with smaller chunks (500-1000 tokens) and adjust based on performance[3]
Maintain overlap between chunks to preserve context
Use structure-aware chunking when possible
Evaluate chunking impact on retrieval performance[3]
Consider semantic boundaries over fixed-size limits[4]

Sources [1] Mastering Text Splitting & Chunking Techniques - GoPenAI https://blog.gopenai.com/mastering-text-splitting-chunking-techniques-b95dad5b5a7b?gi=c672ac65e2c8 [2] Chunking Strategies for LLM Applications - Pinecone https://www.pinecone.io/learn/chunking-strategies/ [3] Considerations for Chunking for Optimal RAG Performance https://unstructured.io/blog/chunking-for-rag-best-practices [4] 7 Chunking Strategies in RAG You Need To Know - F22 Labs https://www.f22labs.com/blogs/7-chunking-strategies-in-rag-you-need-to-know/ [5] A Primer on Text Chunking and its Types - LanceDB Blog https://blog.lancedb.com/a-primer-on-text-chunking-and-its-types-a420efc96a13/ [6] Best Practices For Text Chunking | Restackio https://www.restack.io/p/text-chunking-best-practices-answer-cat-ai [7] How to Chunk Text Data: A Comparative Analysis - GeeksforGeeks https://www.geeksforgeeks.org/how-to-chunk-text-data-a-comparative-analysis/ [8] How to Chunk Text Data — A Comparative Analysis https://towardsdatascience.com/how-to-chunk-text-data-a-comparative-analysis-3858c4a0997a?gi=a373fd042164

WikiWe

Explorer