Vector Tools¶

The vector tools (hanzo-tools-vector) provide semantic search capabilities using vector embeddings and the Infinity embedded database.

Overview¶

These tools enable indexing documents and searching by semantic similarity rather than keyword matching. Perfect for finding conceptually related content.

Installation¶

# Basic install
pip install hanzo-tools-vector

# Full install with all dependencies
pip install hanzo-tools-vector[full]

index - Project Indexing¶

Index project files for search:

# Index current project
index(path=".")

# Index specific directory
index(path="/project/src")

# Index with file filter
index(path=".", include="*.py,*.md")

# Re-index (force update)
index(path=".", force=True)

Parameters¶

Parameter	Type	Default	Description
`path`	str	`.`	Directory to index
`include`	str	-	File patterns to include
`exclude`	str	-	File patterns to exclude
`force`	bool	`False`	Force re-indexing

vector_index - Document Indexing¶

Add documents to the vector index:

# Index a single document
vector_index(
    content="This is the document content...",
    file_path="/docs/guide.md"
)

# Index with metadata
vector_index(
    content="API documentation...",
    file_path="/docs/api.md",
    metadata={"category": "api", "version": "2.0"}
)

# Index multiple documents
vector_index(
    documents=[
        {"content": "Doc 1...", "file_path": "/a.md"},
        {"content": "Doc 2...", "file_path": "/b.md"}
    ]
)

Parameters¶

Parameter	Type	Default	Description
`content`	str	-	Document content
`file_path`	str	-	Source file path
`metadata`	dict	-	Additional metadata
`documents`	list	-	Batch of documents

vector_search - Semantic Search¶

Search indexed documents by meaning:

# Basic semantic search
vector_search(query="How do I authenticate users?")

# Search with limit
vector_search(query="error handling patterns", limit=5)

# Search with score threshold
vector_search(query="database optimization", score_threshold=0.7)

# Search specific project
vector_search(query="API endpoints", search_scope="my-project")

# Filter by file pattern
vector_search(query="testing patterns", file_filter="test_*.py")

# Search all projects
vector_search(query="configuration", search_scope="all")

Parameters¶

Parameter	Type	Default	Description
`query`	str	required	Search query
`limit`	int	`10`	Max results to return
`score_threshold`	float	`0.0`	Min similarity score (0-1)
`include_content`	bool	`True`	Include document content
`file_filter`	str	-	Filter by file path pattern
`project_filter`	list	-	Filter by project names
`search_scope`	str	`all`	`all`, `global`, `current`, or project name

Search Scopes¶

Scope	Description
`all`	Search all indexed projects
`global`	Search only global index
`current`	Search current project (auto-detected)
`<name>`	Search specific project by name

Output Format¶

Found 3 results for query: 'authentication patterns'

Result 1 (Score: 87.3%) - Project: my-api - src/auth/handler.py [Chunk 2]
------------------------------------------------------------------
Metadata: {"category": "auth"}
Content:
def authenticate_user(token: str) -> User:
    """Authenticate user from JWT token..."""
    ...

Result 2 (Score: 82.1%) - Project: my-api - docs/auth.md [Chunk 0]
------------------------------------------------------------------
Content:
# Authentication Guide

This guide explains how to authenticate users...

How It Works¶

Embedding Generation¶

Documents are converted to vector embeddings that capture semantic meaning:

Chunking: Large documents split into smaller chunks
Embedding: Each chunk converted to vector representation
Storage: Vectors stored in Infinity database
Indexing: HNSW index for fast similarity search

Similarity Search¶

Queries are embedded and compared against document vectors:

Query Embedding: Convert query to vector
Nearest Neighbors: Find most similar document vectors
Scoring: Calculate similarity scores (0-1)
Ranking: Return top results by score

Project Detection¶

Projects are automatically detected by looking for LLM.md files:

/home/user/projects/
├── project-a/
│   ├── LLM.md          <- Project root detected
│   └── src/
├── project-b/
│   ├── LLM.md          <- Project root detected
│   └── lib/

Each project gets its own isolated vector index.

Best Practices¶

1. Index Before Searching¶

# Initial indexing
index(path="/project")

# Then search
vector_search(query="your query")

2. Use Appropriate Score Thresholds¶

# High precision (fewer results, more relevant)
vector_search(query="...", score_threshold=0.8)

# High recall (more results, some noise)
vector_search(query="...", score_threshold=0.5)

3. Combine with Keyword Search¶

# Semantic search for concepts
vector_search(query="error handling best practices")

# Keyword search for exact matches
grep(pattern="raise ValueError")

4. Filter by File Type¶

# Search only documentation
vector_search(query="setup instructions", file_filter="*.md")

# Search only code
vector_search(query="authentication", file_filter="*.py")

5. Use Project Scope for Speed¶

# Faster: search specific project
vector_search(query="...", search_scope="my-project")

# Slower: search all projects
vector_search(query="...", search_scope="all")

Comparison: Vector vs Keyword Search¶

Feature	vector_search	grep
Match type	Semantic similarity	Exact text pattern
Finds synonyms	Yes	No
Requires indexing	Yes	No
Speed (large corpus)	Fast (indexed)	Slower
Best for	Concepts, questions	Exact code, identifiers

Example:

# Semantic search finds conceptually related content
vector_search(query="how to handle errors")
# Finds: exception handling, try/catch blocks, error recovery

# Keyword search finds exact patterns
grep(pattern="except Exception")
# Finds: only lines with "except Exception"