Cloudflare’s Vectorize is a globally distributed vector database designed for building AI-powered applications with Cloudflare Workers[1].

Setup and Configuration

Create a Vectorize Database

class_obj = {
    "class": "MyCollection",
    "vectorizer": "text2vec-openai"
}
 
client.schema.create_class(class_obj)

Core Implementation Steps

1. Generate Embeddings

import { Ai } from '@cloudflare/ai';
 
const ai = new Ai(env.AI);
const embedding = await ai.run('@cf/baai/bge-base-en-v1.5', {
    text: [userQuery]
});

2. Insert Vectors

await env.TEXT_EMBEDDINGS.upsert([{
    id: someId,
    values: vector,
    metadata: {
        text: originalText
    }
}]);

3. Query Similar Vectors

let matches = await env.TEXT_EMBEDDINGS.query(queryVector.data, { 
    topK: 1 
});

Key Features

Storage Capabilities

  • Stores vector embeddings from various ML models[1]
  • Supports integration with OpenAI and Cohere embeddings[5]
  • Enables storage of metadata alongside vectors[9]

Query Functionality

  • Performs semantic similarity searches[4]
  • Supports classification and recommendation systems[7]
  • Enables fast nearest-neighbor searches with response times under 100ms[4]

Use Cases

Primary Applications

  • Semantic search implementation
  • Classification tasks
  • Recommendation systems
  • Anomaly detection
  • Retrieval Augmented Generation (RAG)[7]

Integration Options

  • Works with Cloudflare Workers AI for embedding generation
  • Connects with R2 for image storage
  • Integrates with D1 for structured data storage[1]

The system automatically optimizes and regenerates indexes when new data is inserted, making it efficient for production deployments[7]. For optimal performance, vectors should be generated once and stored, rather than regenerating them on every request[5].

Sources [1] Overview | Vectorize - Cloudflare Docs https://developers.cloudflare.com/vectorize/ [2] Cloudflare Workers Vector Demo https://community.cloudflare.com/t/cloudflare-workers-vector-demo/572199 [3] Vector Embedding Tutorial & Example - Nexla https://nexla.com/ai-infrastructure/vector-embedding/ [4] From prototype to production: Vector databases in generative AI … https://stackoverflow.blog/2023/10/09/from-prototype-to-production-vector-databases-in-generative-ai-applications/ [5] Vectorize: a vector database for shipping AI-powered applications to … https://blog.cloudflare.com/vectorize-vector-database-open-beta/ [6] The 5 Best Vector Databases | A List With Examples - DataCamp https://www.datacamp.com/blog/the-top-5-vector-databases [7] Vector databases - Cloudflare Docs https://developers.cloudflare.com/vectorize/reference/what-is-a-vector-database/ [8] What Is A Vector Database? Top 12 Use Cases - lakeFS https://lakefs.io/blog/what-is-vector-databases/ [9] Creating a Smart Second Brain: Leveraging Cloudflare Workers … https://dev.to/andyjessop/building-an-ai-powered-second-brain-in-a-cloudflare-worker-with-cloudflare-vectorize-and-openai-23di


Base Recommendations

  • Standard chunk size: 1024 characters
  • Overlap setting: 128 characters
  • Starting baseline: 250 tokens (~1000 characters)

Chunking Strategies for Cloudflare Workers

Fixed-Size Chunking

const chunks = [];
const chunkSize = 1024;
const overlap = 128;
 
function chunkText(text) {
  for (let i = 0; i < text.length; i += chunkSize - overlap) {
    chunks.push(text.slice(i, i + chunkSize));
  }
  return chunks;
}

Performance Considerations

Document SizeOptimal Chunk SizeOverlap
Small (<10KB)51264
Medium (10-100KB)1024128
Large (>100KB)2048256

Key Features

  • Automatic index regeneration when new data is inserted
  • Support for metadata storage alongside vectors
  • Integration with Cloudflare Workers AI for embedding generation
  • Compatibility with R2 for document storage

For optimal vector database performance in Cloudflare Workers, maintain chunk sizes that balance context preservation with retrieval efficiency. Smaller chunks improve search precision but require more storage space, while larger chunks preserve more context but may reduce retrieval accuracy.

Sources [1] Chunking: Let’s Break It Down | DataStax https://www.datastax.com/blog/chunking-to-get-your-data-ai-ready [2] How to Choose the Right Chunking Strategy for Your LLM Application https://www.mongodb.com/developer/products/atlas/choosing-chunking-strategy-rag/ [3] 7 Chunking Strategies in RAG You Need To Know - F22 Labs https://www.f22labs.com/blogs/7-chunking-strategies-in-rag-you-need-to-know/ [4] Considerations for Chunking for Optimal RAG Performance – Unstructured https://unstructured.io/blog/chunking-for-rag-best-practices [5] Chunking Strategies for RAG in Generative AI https://adasci.org/chunking-strategies-for-rag-in-generative-ai/ [6] How to Chunk Documents for RAG https://www.multimodal.dev/post/how-to-chunk-documents-for-rag


Here are the optimal vector dimensions for Cloudflare Vectorize implementations:

Standard Dimensions

Model TypeDimensionsUse Case
BGE Base768Text embeddings with Workers AI[1][4]
OpenAI1536Text with higher precision[4]
Cohere768Multilingual text[4]

Technical Constraints

Maximum Limits

  • Upper limit: 1536 dimensions per vector[3]
  • Metadata limit: 10KiB per vector[3]
  • Storage capacity: Up to 5,000,000 vectors per index[3]

Performance Considerations

  • Smaller dimensions offer faster search performance
  • Larger dimensions provide better accuracy for similar content
  • More dimensions increase compute and memory usage[1]

Cost Optimization

For cost-effective implementation, consider starting with 384-768 dimensions for experimental workloads, as this provides a good balance between accuracy and resource usage[2]. Scale up to higher dimensions only when needed for specific accuracy requirements.

Sources [1] Vector databases - Cloudflare Docs https://developers.cloudflare.com/vectorize/reference/what-is-a-vector-database/ [2] Pricing | Vectorize - Cloudflare Docs https://developers.cloudflare.com/vectorize/platform/pricing/ [3] Limits | Vectorize - Cloudflare Docs https://developers.cloudflare.com/vectorize/platform/limits/ [4] Create indexes | Vectorize - Cloudflare Docs https://developers.cloudflare.com/vectorize/best-practices/create-indexes/


Core Chunking Methods

Fixed-Size Chunking The simplest approach uses character-based splitting with a defined chunk size and overlap. While basic, it provides a foundation for more sophisticated methods[1][2]. Best suited for initial prototyping and simple use cases where semantic coherence is less critical.

Recursive Chunking A more sophisticated approach that splits text hierarchically using multiple separators in descending order (paragraphs, sentences, words)[3][4]. This preserves document structure better than fixed-size chunking while maintaining reasonable chunk sizes.

Advanced Techniques

Semantic Chunking Splits text based on meaning rather than fixed sizes by analyzing sentence embeddings and semantic similarity[1][4]. This ensures chunks maintain topical coherence and logical flow, though it requires more computational resources.

Smart Chunking Offers multiple strategies:

  • Basic: Combines sequential elements while respecting size limits
  • By Title: Preserves section boundaries
  • By Page: Maintains page-level separation
  • By Similarity: Groups topically similar content[3]

Specialized Methods

Document-Specific Chunking

  • Markdown: Splits based on headings and formatting
  • LaTeX: Chunks by document structure and commands
  • HTML: Preserves element hierarchy and metadata[2][5]

Best Practices

  • Start with smaller chunks (500-1000 tokens) and adjust based on performance[3]
  • Maintain overlap between chunks to preserve context
  • Use structure-aware chunking when possible
  • Evaluate chunking impact on retrieval performance[3]
  • Consider semantic boundaries over fixed-size limits[4]

Sources [1] Mastering Text Splitting & Chunking Techniques - GoPenAI https://blog.gopenai.com/mastering-text-splitting-chunking-techniques-b95dad5b5a7b?gi=c672ac65e2c8 [2] Chunking Strategies for LLM Applications - Pinecone https://www.pinecone.io/learn/chunking-strategies/ [3] Considerations for Chunking for Optimal RAG Performance https://unstructured.io/blog/chunking-for-rag-best-practices [4] 7 Chunking Strategies in RAG You Need To Know - F22 Labs https://www.f22labs.com/blogs/7-chunking-strategies-in-rag-you-need-to-know/ [5] A Primer on Text Chunking and its Types - LanceDB Blog https://blog.lancedb.com/a-primer-on-text-chunking-and-its-types-a420efc96a13/ [6] Best Practices For Text Chunking | Restackio https://www.restack.io/p/text-chunking-best-practices-answer-cat-ai [7] How to Chunk Text Data: A Comparative Analysis - GeeksforGeeks https://www.geeksforgeeks.org/how-to-chunk-text-data-a-comparative-analysis/ [8] How to Chunk Text Data — A Comparative Analysis https://towardsdatascience.com/how-to-chunk-text-data-a-comparative-analysis-3858c4a0997a?gi=a373fd042164