Building Knowledge-Powered Agents with Azure AI Search: RAG, Hybrid Search, and Agentic Retrieval - Microsoft Ignite 2025
Microsoft Ignite 2025 - BRK193 provides a deep technical dive into building knowledge-powered agents using Azure AI Search's latest capabilities. This code-focused session demonstrates how to connect agents to diverse knowledge sources—SharePoint, web crawlers, Azure Blob Storage—and optimize retrieval performance through advanced query planning, hybrid search strategies, and semantic re-ranking. As organizations move beyond simple chatbots to agentic retrieval systems that autonomously plan queries, select knowledge sources, and iteratively refine results, Azure AI Search emerges as the foundational platform enabling this transformation. With new reasoning effort modes and Foundry IQ's unified integration layer, developers gain precise control over the cost-accuracy tradeoff in knowledge-intensive AI applications.
RAG Fundamentals: Grounding Agents in Organizational Knowledge
Retrieval-Augmented Generation (RAG) solves the fundamental limitation of language models: they only know what they learned during training. RAG combines retrieval (finding relevant documents from knowledge bases) with generation (synthesizing answers using retrieved context), enabling agents to answer questions about proprietary data, current information, and domain-specific content while providing citations for verification.
How RAG Works: The Three-Stage Pipeline
Query Planning & Reformulation
User query may be ambiguous, vague, or poorly structured for search. Query planning agent analyzes intent, generates search-optimized queries, and determines which knowledge sources to query.
Example:
User Query: "What did we decide about the cloud migration?"
Reformulated Queries:
- • "cloud migration decision meeting minutes"
- • "cloud provider selection criteria 2024"
- • "azure migration timeline board approval"
Retrieval & Re-Ranking
Execute queries against knowledge sources using hybrid search (keyword + vector), retrieve candidate documents (typically 10-50), then re-rank using semantic models to surface most relevant results (top 3-5).
Retrieval Stages:
- • Initial Search: Keyword (BM25) + Vector (cosine similarity) → 50 candidates
- • Fusion: Reciprocal Rank Fusion (RRF) combines keyword and vector scores
- • Re-Ranking: Semantic model evaluates query-document relevance → top 5 results
- • Filtering: Apply metadata filters (date range, department, permissions)
Answer Generation with Citations
Language model synthesizes answer using retrieved documents as context. Each claim in answer is linked to source document with specific page/section reference, enabling users to verify accuracy.
Generation Prompt Structure:
System: You are an assistant that answers questions using
only the provided documents. Always cite sources.
Context Documents:
[1] Board Meeting Minutes - 2024-03-15: "...approved Azure
migration with 18-month timeline..."
[2] Cloud Strategy Memo: "...AWS initially considered but
Azure selected for Microsoft 365 integration..."
User Question: What did we decide about cloud migration?
Answer with citations:
RAG vs. Fine-Tuning: Complementary Approaches
| Characteristic | RAG (Retrieval-Augmented Generation) | Fine-Tuning |
|---|---|---|
| Best For | Answering questions about specific documents, current data, frequently updated knowledge | Teaching model new vocabulary, domain terminology, output formatting styles |
| Knowledge Updates | Real-time: Update index, immediately available | Slow: Requires retraining (hours to days) |
| Traceability | High: Every answer cites source documents | Low: Knowledge embedded in model weights, no citations |
| Cost | Per-query: Retrieval + LLM inference ($0.01-0.10/query) | Upfront: Training cost ($500-10K), then standard inference |
| Typical Use Case | Internal knowledge base Q&A, document search, support chatbot | Medical terminology adoption, legal document formatting, brand voice |
Azure AI Search: The Foundation for Agentic Retrieval
Azure AI Search (formerly Azure Cognitive Search) provides enterprise-grade search infrastructure optimized for RAG applications. With support for hybrid search, semantic ranking, vector embeddings, and integrated skillsets for document processing, it serves as the knowledge retrieval engine for Azure OpenAI, Microsoft Copilot, and custom agentic applications.
Documents Indexed (Scale)
- • Petabyte-scale indexing capacity
- • 10K+ queries per second per replica
- • Automatic scaling and load balancing
- • 99.9% SLA for search availability
- • Geo-replication across Azure regions
Better Relevance (vs. Keyword Only)
- • Hybrid search: Keyword (BM25) + Vector (embeddings)
- • Semantic ranking with cross-encoder models
- • Query expansion and synonym handling
- • Relevance tuning with scoring profiles
- • A/B testing for search quality optimization
Data Source Connectors
- • Azure Blob Storage, Data Lake, Cosmos DB
- • SharePoint Online, OneDrive for Business
- • SQL databases (Azure SQL, SQL Server)
- • Web crawler for public/authenticated sites
- • Custom data sources via indexer API
Hybrid Search Architecture: Keyword + Vector + Semantic
Azure AI Search's hybrid approach combines three complementary search methods, each capturing different aspects of relevance. The fusion of these methods delivers superior results compared to any single technique.
🔤 Keyword Search (BM25)
Traditional full-text search using BM25 ranking algorithm. Excellent for exact term matches, acronyms, product codes, and queries where specific words matter. Fast (millisecond latency) and interpretable.
Strengths & Weaknesses:
✓ Strengths
- • Exact matches (model numbers, IDs)
- • Boolean logic (AND, OR, NOT)
- • Fast execution (<10ms typical)
- • No embedding computation needed
✗ Weaknesses
- • Misses semantic meaning ("car" ≠ "automobile")
- • Requires exact word matches
- • Poor on paraphrased queries
- • No understanding of context
🎯 Vector Search (Embeddings)
Documents and queries converted to high-dimensional vectors (embeddings) using models like text-embedding-3-large. Search finds documents with vectors closest to query vector (cosine similarity). Captures semantic meaning regardless of exact wording.
Strengths & Weaknesses:
✓ Strengths
- • Semantic understanding ("car" ~ "automobile")
- • Works with paraphrases and synonyms
- • Cross-lingual retrieval (query EN, doc SV)
- • Contextual relevance
✗ Weaknesses
- • May miss exact term matches
- • Embedding computation cost
- • Slower than keyword (50-100ms)
- • Requires vector index storage
🧠 Semantic Re-Ranking
After initial retrieval (keyword + vector → top 50 results), semantic ranker applies cross-encoder model to compute precise relevance score for each query-document pair. Promotes most relevant results to top 3-5 positions.
How It Works:
- • Takes query + document text as input to transformer model
- • Computes query-document relevance score (0-1) using attention mechanism
- • Reorders initial results by semantic relevance score
- • Typical improvement: 15-35% better top-3 relevance vs. hybrid alone
- • Cost: ~20ms additional latency, negligible cost per query
Reciprocal Rank Fusion (RRF): Combining Keyword and Vector Scores
Keyword and vector search produce different relevance scores (BM25 score vs. cosine similarity) that can't be directly compared. Reciprocal Rank Fusion elegantly combines rankings without needing score normalization.
RRF Formula and Example
Formula:
RRF_Score(doc) = Σ [ 1 / (k + rank_i(doc)) ]
where:
- k = constant (typically 60)
- rank_i(doc) = rank of document in search method i
- Σ sums across all search methods (keyword, vector)
Example Calculation:
| Document | Keyword Rank | Vector Rank | RRF Score | Final Rank |
|---|---|---|---|---|
| Doc A | 1 | 5 | 1/(60+1) + 1/(60+5) = 0.0318 | 1 |
| Doc B | 3 | 2 | 1/(60+3) + 1/(60+2) = 0.0320 | 1 |
| Doc C | 2 | 8 | 1/(60+2) + 1/(60+8) = 0.0308 | 2 |
| Doc D | 7 | 1 | 1/(60+7) + 1/(60+1) = 0.0313 | 2 |
Result: Doc B wins despite not ranking #1 in either method, because it ranks high in both (balanced relevance). This demonstrates RRF's strength: rewarding consistency across multiple signals.
🇸🇪 Technspire Perspective: Swedish Manufacturing Company
Västerås-based industrial equipment manufacturer (1,200 employees, 45,000 technical documents) deployed Azure AI Search for internal knowledge base powering engineering support chatbot. Previous SharePoint search had 28% user satisfaction due to poor relevance.
Implementation Details
- Data Sources: 45,000 PDFs (technical specs, CAD drawings, maintenance manuals), SharePoint (project wikis, design decisions), SQL database (part specifications). Total indexed content: 2.8TB.
- Hybrid Search Configuration: Keyword (BM25) with Swedish analyzer + Vector search (text-embedding-3-large, 3072 dimensions) + Semantic re-ranking. RRF with k=60 for score fusion.
- Query Enhancement: Query expansion using domain-specific synonyms (e.g., "lager" → "bearing", "ventil" → "valve"). Multilingual support (Swedish queries search English documents).
- Answer Generation: GPT-4o generates answers with citations. Average 3.2 source documents cited per answer. Engineers can click to view exact PDF page referenced.
- Relevance Tuning: Scoring profile boosts recent documents (+20% if <6 months old), frequently accessed docs (+10%), and documents matching user's department (+15%).
- Results: 42,000 queries/month, 84% satisfaction, 6.8-min avg search time (from 18 min), 92% citation accuracy, -48% support tickets, 38× ROI.
Knowledge Sources: Connecting Agents to Data
BRK193 demonstrated connecting agents to three critical enterprise knowledge sources: SharePoint Online, web content, and Azure Blob Storage. Each source requires different indexing strategies, permission handling, and update cadences.
📁 SharePoint Online Integration
SharePoint is the most common enterprise knowledge repository, containing project documentation, team wikis, policies, and collaboration spaces. Azure AI Search's SharePoint connector handles authentication, incremental indexing, and permission preservation.
Setup Steps
- Register Azure AD app with SharePoint API permissions
- Create data source in Azure AI Search pointing to SharePoint site(s)
- Define indexer with field mappings (title, content, metadata)
- Configure skillset for document cracking (PDF, Office docs)
- Schedule incremental updates (every 15 min typical)
- Enable security trimming for permission-aware search
Key Capabilities
- • Permission Preservation: Users only see documents they have access to
- • Metadata Extraction: Author, modified date, file type, SharePoint taxonomy
- • Incremental Updates: Only index changed documents (change tracking)
- • Document Processing: Extract text from PDF, DOCX, PPTX, XLSX
- • Multi-Site Support: Index multiple SharePoint sites into single search index
Code Sample: SharePoint Data Source
{
"name": "sharepoint-datasource",
"type": "sharepoint",
"credentials": {
"connectionString": "SharePointOnlineEndpoint=https://contoso.sharepoint.com;..."
},
"container": {
"name": "defaultSiteLibrary",
"query": "/sites/Engineering"
},
"dataChangeDetectionPolicy": {
"@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
"highWaterMarkColumnName": "_metadata_storage_last_modified"
}
}
🌐 Web Crawler for Public and Authenticated Sites
Web crawler indexer enables agents to search public websites (documentation, support articles, forums) or authenticated internal sites. Supports sitemap-based crawling, robots.txt compliance, and custom crawl rules.
Crawling Strategies
- Sitemap-Based: Provide sitemap.xml URL, crawler discovers all pages automatically. Best for documentation sites with comprehensive sitemaps.
- Seed URLs with Depth Limit: Start from seed URLs, follow links up to N hops deep. Use for sites without sitemaps or to limit scope.
- URL Pattern Filtering: Include/exclude regex patterns. Example: include /docs/*, exclude /archive/*
- Content Type Filtering: Index only HTML, PDF, or specific MIME types
Authentication & Compliance
- Authentication: Support for basic auth, OAuth, API keys, custom headers for authenticated crawling
- Robots.txt: Respects robots.txt directives, honors crawl delays
- Rate Limiting: Configurable request rate (1-10 requests/sec) to avoid overwhelming sites
- Freshness: Incremental re-crawl based on Last-Modified headers or fixed schedule
Use Case: Documentation Search Agent
Company indexes Microsoft Learn docs, Azure documentation, and internal Confluence wiki via web crawler. Agent answers questions about Azure services with citations to official documentation. Weekly re-crawl keeps content current. Result: 78% of developer questions answered without leaving IDE, -52% internal support requests.
💾 Azure Blob Storage: Unstructured Data at Scale
Azure Blob Storage is ideal for large-scale document repositories: scanned PDFs, audio transcripts, images with OCR, JSON/CSV datasets. Blob indexer supports all Azure Storage features (hot/cool tiers, lifecycle management, soft delete).
Document Processing Skillset
- • Text Extraction: PDF, Office docs, plain text
- • OCR: Extract text from scanned images/PDFs (90+ languages)
- • Entity Recognition: Extract people, organizations, locations, dates
- • Key Phrase Extraction: Identify main topics and concepts
- • Language Detection: Auto-detect document language
- • Custom Skills: Call Azure Functions for domain-specific processing
Metadata and Organization
- • Blob Metadata: Index custom metadata tags (department, classification, version)
- • Folder Hierarchy: Use blob path as filterable field (e.g., /legal/contracts/2024/)
- • Container Partitioning: Separate indexes for different security zones or departments
- • Change Detection: Automatic reindexing on blob create/update/delete
- • Soft Delete Support: Removed blobs automatically removed from index
Code Sample: Blob Indexer with OCR
{
"name": "blob-indexer",
"dataSourceName": "blob-datasource",
"targetIndexName": "documents-index",
"skillsetName": "document-processing-skillset",
"parameters": {
"batchSize": 50,
"configuration": {
"dataToExtract": "contentAndMetadata",
"imageAction": "generateNormalizedImages",
"parsingMode": "default"
}
},
"fieldMappings": [
{ "sourceFieldName": "metadata_storage_path", "targetFieldName": "id" },
{ "sourceFieldName": "metadata_storage_name", "targetFieldName": "fileName" }
],
"outputFieldMappings": [
{ "sourceFieldName": "/document/merged_content", "targetFieldName": "content" },
{ "sourceFieldName": "/document/organizations", "targetFieldName": "organizations" }
]
}
Agentic Retrieval: Query Planning and Knowledge Source Selection
Unlike traditional search where users manually select sources and refine queries, agentic retrieval systems autonomously plan multi-step search strategies. Agents analyze user intent, determine which knowledge sources to query, generate optimized queries for each source, and merge results intelligently.
Agentic Retrieval Workflow
Intent Analysis
Agent analyzes user query to understand: (1) information need type (factual Q&A, how-to, comparison, analysis), (2) domain/topic, (3) required recency (historical vs. current), (4) expected answer format.
Example:
Query: "How does our security policy compare to industry best practices?"
Intent: Comparison task, requires internal policy docs + external best practice references, answer format: structured comparison table
Knowledge Source Selection
Agent determines which sources to query based on intent. Internal sources (SharePoint, databases) for organizational data. External sources (web, public datasets) for industry benchmarks. May query multiple sources in parallel.
Decision Logic:
- • Internal policy → SharePoint index (site: /policies)
- • Industry best practices → Web search (NIST, ISO standards sites)
- • Parallel execution: Both searches run simultaneously
Query Generation & Execution
Agent generates source-specific queries optimized for each knowledge base. Internal search may use metadata filters (department, classification). Web search uses natural language queries for search engines.
Generated Queries:
- • SharePoint: "information security policy" + filter: path=/policies, modified>2023-01-01
- • Web: "NIST cybersecurity framework best practices 2024"
- • Web: "ISO 27001 information security controls checklist"
Result Merging & Deduplication
Agent receives results from multiple sources (total 15-30 documents typically). Merges results by relevance, removes duplicates, and groups by theme. May perform additional filtering or re-ranking based on user context.
Merging Strategy:
- • Deduplicate: Same document from multiple sources → keep highest-scoring instance
- • Diversity: Ensure results include both internal and external perspectives
- • Thematic grouping: Cluster results by sub-topic (access control, encryption, audit logging)
Synthesis & Answer Generation
Agent synthesizes final answer using merged results as context. Structures response according to identified format (comparison table in this case). Cites sources inline, distinguishing internal policies from external standards.
Answer Format:
Structured comparison table with columns: [Security Control, Our Policy, NIST Recommendation, ISO 27001 Control, Gap Analysis]. Each cell includes citations to source documents. Summary paragraph highlights major gaps and strengths.
Indexed vs. Remote Knowledge Sources
Agentic retrieval systems combine two types of knowledge sources, each with distinct characteristics and use cases.
📚 Indexed Knowledge Sources
Pre-indexed in Azure AI Search. Documents ingested, processed, embedded, and indexed before query time. Fast retrieval (50-200ms) but requires upfront indexing and storage.
Characteristics:
- • Latency: Fast (50-200ms query time)
- • Cost: Index storage + compute, predictable per-query cost
- • Freshness: Depends on indexing cadence (15 min to daily)
- • Scale: Excellent for large corpora (millions of documents)
- • Control: Full control over ranking, filtering, faceting
Best For:
- • Internal knowledge bases (SharePoint, wikis)
- • Product documentation and manuals
- • Historical data and archives
- • Structured databases (product catalogs)
🌐 Remote Knowledge Sources
Queried in real-time via APIs (Bing Search, external databases, live systems). No pre-indexing required. Higher latency (500-2000ms) but always current.
Characteristics:
- • Latency: Slower (500-2000ms, depends on API)
- • Cost: Per-API-call pricing (e.g., Bing Search $5/1K queries)
- • Freshness: Real-time, always current information
- • Scale: Limited by API rate limits and cost
- • Control: Limited to API capabilities, less customization
Best For:
- • Current news and events (Bing News API)
- • Real-time data (stock prices, weather)
- • External knowledge (public websites, forums)
- • Systems of record (CRM, ERP via APIs)
Reasoning Effort Modes: Balancing Cost, Latency, and Accuracy
BRK193 revealed three reasoning effort modes that control how aggressively agents search for information and refine results. These modes provide developers with a dial to tune the cost-accuracy tradeoff based on application requirements.
Minimal Effort Mode
Fast, cost-optimized retrieval for simple queries
Behavior
- • Single query to knowledge source
- • No query reformulation
- • Top 3-5 results returned
- • No iterative retrieval
- • Basic hybrid search
Performance
- • Latency: 200-500ms
- • Cost: $0.01-0.03/query
- • Accuracy: 70-80% (simple queries)
- • Retrieval rate: ~1 query/request
Best For
- • FAQ chatbots
- • Simple factual questions
- • High-volume applications
- • Cost-sensitive scenarios
Low Effort Mode
Balanced approach with query expansion and multiple sources
Behavior
- • Query expansion (2-3 variants)
- • Multi-source search (indexed + remote)
- • Top 10 results, re-ranked
- • Result deduplication and merging
- • Hybrid search + semantic ranking
Performance
- • Latency: 800-1500ms
- • Cost: $0.05-0.12/query
- • Accuracy: 82-90% (moderate complexity)
- • Retrieval rate: ~3 queries/request
Best For
- • General-purpose assistants
- • Multi-domain questions
- • Internal knowledge + web research
- • Most production applications
Medium Effort Mode
Comprehensive retrieval with iterative refinement and semantic classification
Behavior
- • Semantic query classification
- • Multi-hop retrieval (iterative)
- • Completeness verification
- • Gap identification → additional queries
- • Top 15-20 results, multi-stage ranking
Performance
- • Latency: 2000-5000ms
- • Cost: $0.15-0.35/query
- • Accuracy: 92-96% (complex queries)
- • Retrieval rate: ~5-8 queries/request
Best For
- • Complex research questions
- • High-stakes decisions
- • Comprehensive analysis
- • Expert-level assistants
Semantic Classification & Iterative Retrieval
Medium effort mode introduces semantic classification—agent analyzes retrieved results to identify information gaps, then generates follow-up queries to fill those gaps. This iterative process continues until agent determines answer is complete or iteration limit reached.
Example Iteration:
- Initial Query: "Compare cloud providers for healthcare workloads" → retrieves general comparison articles
- Gap Analysis: Agent identifies missing: HIPAA compliance details, specific healthcare use cases
- Refinement Query 1: "Azure AWS GCP HIPAA compliance healthcare" → retrieves compliance documentation
- Gap Analysis: Still missing: pricing for healthcare-typical workloads
- Refinement Query 2: "cloud cost healthcare PACS imaging storage" → retrieves pricing case studies
- Completeness Check: Agent verifies all aspects covered → synthesizes comprehensive answer
Choosing the Right Reasoning Effort Mode
| Application Scenario | Query Complexity | Volume | Recommended Mode | Rationale |
|---|---|---|---|---|
| FAQ Chatbot | Simple, well-defined | Very High (10K+/day) | Minimal | Cost optimization critical, queries predictable |
| Employee Help Desk | Moderate, varied topics | Medium (1K-5K/day) | Low | Balance between cost and coverage, multi-source common |
| Research Assistant | High, open-ended | Low (50-200/day) | Medium | Completeness matters, users expect thorough answers |
| Legal Document Analysis | Very high, multi-faceted | Low (10-50/day) | Medium | High stakes, missing information unacceptable |
🇸🇪 Technspire Perspective: Swedish Healthcare Provider
Örebro-based regional healthcare provider (8 hospitals, 2,400 clinicians) deployed clinical knowledge assistant using Azure AI Search with reasoning effort modes. Low effort for routine queries (medication lookup), medium effort for complex diagnostic support.
Mode Selection Logic & Results
- Knowledge Sources: Internal clinical guidelines (3,200 documents), UpToDate medical reference, PubMed via API, Swedish national drug formulary (FASS). Total indexed content: 280K documents.
- Query Classification: Automatic classifier routes queries: (1) Medication/dosing lookup → Minimal effort, FASS only, (2) General clinical questions → Low effort, guidelines + UpToDate, (3) Differential diagnosis/complex cases → Medium effort, all sources + iterative retrieval.
- Low Effort Queries (68%): Examples: "First-line treatment for hypertension in pregnancy", "Normal pediatric vital signs age 5". Avg latency 1.2s, cost SEK 0.08/query, 91% accuracy.
- Medium Effort Queries (32%): Examples: "Differential diagnosis fever + rash + joint pain recent travel SE Asia", "Evidence for biologic therapy psoriatic arthritis contraindications". Avg latency 3.8s, cost SEK 0.28/query, 94% accuracy. Iterative retrieval resolves initial gaps (avg 2.3 refinement queries).
- Citation Verification: Medical librarian spot-checks 5% of answers monthly. 96% citation accuracy (answer claims match cited sources). 4% errors primarily outdated guidelines (fixed with more frequent reindexing).
- Results: 128K queries in 12 months, 92% overall accuracy, 3.8-min avg retrieval time, 89% clinician satisfaction, SEK 42M time savings, 58× ROI.
Foundry IQ & MCP Protocol: Unified Knowledge Integration
Foundry IQ introduces a unified abstraction layer for connecting agents to diverse knowledge sources using the Model Context Protocol (MCP). Rather than building custom integrations for each data source, developers define MCP-compatible connectors that agents can discover and use dynamically.
Model Context Protocol (MCP) Overview
MCP standardizes how agents request information from knowledge sources. Each source exposes capabilities via MCP interface: available search methods, supported filters, data format. Agents query using MCP commands, receive structured responses.
MCP Benefits
- Standardization: Single interface for all knowledge sources—SharePoint, databases, APIs, web. Reduces integration complexity from O(n²) to O(n).
- Dynamic Discovery: Agents discover available sources and capabilities at runtime. New sources added without agent code changes.
- Composability: Agents combine multiple MCP sources in single workflow. Example: query internal docs + external API + database in parallel.
- Permission Awareness: MCP includes user context, sources enforce permissions. User only retrieves data they're authorized to access.
MCP Connector Structure
{
"name": "sharepoint-engineering",
"type": "mcp-connector",
"capabilities": {
"search": {
"methods": ["keyword", "vector", "hybrid"],
"filters": ["path", "fileType", "modifiedDate"],
"maxResults": 50
},
"retrieve": {
"supportedFormats": ["text", "markdown", "raw"]
}
},
"authentication": {
"type": "oauth2",
"scopes": ["Sites.Read.All"]
},
"metadata": {
"description": "Engineering documentation from SharePoint",
"dataClassification": "internal",
"updateFrequency": "15min"
}
}
Foundry IQ Workflow: Agent-Driven Knowledge Selection
-
1
User Query Received
Agent receives query: "What's our current Azure spending trend and how does it compare to budget forecasts?"
-
2
Source Discovery via MCP
Agent queries Foundry IQ for available MCP connectors matching query intent. Finds:
- • Azure Cost Management API (current spending data)
- • SharePoint Finance folder (budget documents)
- • Power BI API (existing cost analysis reports)
-
3
Parallel MCP Queries
Agent issues parallel queries to each source via MCP:
Azure Cost API: GET /subscriptions/{id}/costAnalysis?timeframe=last90days
SharePoint: SEARCH "FY2025 budget forecast Azure cloud" + filter: path=/finance
Power BI: GET /reports?filter=name contains 'Azure Cost'
-
4
Result Synthesis
Agent receives structured data from all sources, synthesizes: "Current Azure spending $142K/month (↑18% vs. Q3). FY2025 budget forecasts $1.68M annual. Current trend projects $1.704M actual (101% of budget). Recommend cost optimization review."
-
5
Citations & Provenance
Agent includes citations: [1] Azure Cost Management (accessed 2025-01-24), [2] FY2025 IT Budget.xlsx (SharePoint), [3] Azure Cost Trends Report (Power BI). User can verify each claim.
Implementation Guide: Building Your First Knowledge Agent
This step-by-step guide walks through building a production-ready knowledge agent using Azure AI Search, Azure OpenAI, and the patterns demonstrated in BRK193.
Create Azure AI Search Service & Index (Day 1)
Provision search service, define index schema with fields for content, metadata, and vector embeddings.
Azure CLI Commands
# Create search service (Standard tier for hybrid search)
az search service create \
--name my-knowledge-search \
--resource-group knowledge-rg \
--sku Standard \
--location westeurope
# Create index with hybrid search fields
# (Use Azure Portal or REST API for detailed schema)
# Key fields: id, title, content, contentVector (1536 or 3072 dims),
# metadata (author, date, department), url
Estimated Time: 30 minutes (service provisioning + index creation)
Set Up Data Sources & Indexers (Days 1-2)
Configure SharePoint, Blob, or web crawler data sources. Create indexers with skillsets for document processing and embedding generation.
Example: SharePoint Data Source + Indexer
# Create SharePoint data source
POST https://[servicename].search.windows.net/datasources?api-version=2024-07-01
{
"name": "sharepoint-docs",
"type": "sharepoint",
"credentials": {
"connectionString": "SharePointOnlineEndpoint=https://contoso.sharepoint.com;ApplicationId=...;TenantId=..."
},
"container": { "name": "defaultSiteLibrary", "query": "/sites/KnowledgeBase" }
}
# Create skillset with text extraction + embedding generation
# Skillset includes: DocumentExtraction skill, SplitText skill (chunking),
# AzureOpenAIEmbedding skill (generate contentVector)
# Create indexer to run skillset and populate index
# Schedule: every 15 minutes for near-real-time updates
Estimated Time: 2-4 hours (configuration + initial indexing)
Deploy Azure OpenAI for Embeddings & Generation (Day 2)
Provision Azure OpenAI service, deploy embedding model (text-embedding-3-large) and generation model (GPT-4o).
Deployment Steps
# Create Azure OpenAI resource
az cognitiveservices account create \
--name my-openai \
--resource-group knowledge-rg \
--kind OpenAI \
--sku S0 \
--location westeurope
# Deploy models via Azure OpenAI Studio
# 1. text-embedding-3-large (for query and document embeddings)
# 2. gpt-4o (for answer generation)
Estimated Time: 1 hour (provisioning + model deployment)
Build RAG Application with Hybrid Search (Days 3-5)
Implement application code: query embedding generation, hybrid search execution, result processing, answer generation with citations.
Python Code Sample (Simplified)
from azure.search.documents import SearchClient
from openai import AzureOpenAI
# Initialize clients
search_client = SearchClient(endpoint, index_name, credential)
openai_client = AzureOpenAI(azure_endpoint, api_key)
def answer_question(query: str) -> dict:
# 1. Generate query embedding
query_vector = openai_client.embeddings.create(
model="text-embedding-3-large",
input=query
).data[0].embedding
# 2. Hybrid search with RRF
results = search_client.search(
search_text=query, # keyword search
vector_queries=[{
"vector": query_vector,
"k_nearest_neighbors": 50,
"fields": "contentVector"
}],
query_type="semantic", # enable semantic ranking
top=5
)
# 3. Build context from top results
context = "\n\n".join([f"[{i+1}] {r['title']}: {r['content']}"
for i, r in enumerate(results)])
# 4. Generate answer with citations
completion = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "Answer using provided documents. Always cite sources."},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}
]
)
return {
"answer": completion.choices[0].message.content,
"sources": [{"title": r['title'], "url": r['url']} for r in results]
}
Estimated Time: 3-5 days (development + testing)
Test, Tune, and Deploy (Days 6-10)
Evaluate answer quality on test questions, tune ranking parameters, optimize costs, and deploy to production.
Quality Evaluation & Optimization
- Test Set: Create 50-100 representative questions with ground truth answers
- Metrics: Measure accuracy (answer correctness), citation accuracy (claims match sources), latency, cost per query
- Relevance Tuning: Adjust RRF k parameter (default 60), semantic ranker thresholds, scoring profile weights
- Cost Optimization: Use GPT-4o-mini for simple queries (classifier routes), cache embeddings for common queries
- Deployment: Azure App Service, Container Apps, or AKS depending on scale requirements
Estimated Time: 5-7 days (evaluation + tuning + deployment)
Conclusion: The Knowledge-Powered Agent Future
Microsoft Ignite 2025 BRK193 demonstrated that the next generation of AI agents will be defined not by model size or parameter counts, but by their ability to access, reason over, and synthesize knowledge from diverse sources. Azure AI Search's hybrid search, semantic ranking, and agentic retrieval capabilities provide the foundation for building agents that deliver accurate, cited, and trustworthy answers grounded in organizational knowledge.
Four Key Takeaways for Developers
1. Hybrid Search is Non-Negotiable
Keyword-only or vector-only search each miss critical relevance signals. Hybrid search with RRF fusion and semantic re-ranking delivers 2-3× better top-result relevance. Always implement all three layers.
2. Reasoning Effort Modes Enable Cost Control
Not every query needs comprehensive retrieval. Use minimal effort for FAQs, low effort for general questions, medium effort for complex analysis. Classifier-based routing cuts costs 40-60% vs. always using highest effort.
3. Agentic Retrieval Beats Static Pipelines
Agents that autonomously plan queries, select sources, and iteratively refine results deliver 15-25% better answer completeness than fixed retrieval pipelines. Invest in query planning and gap detection logic.
4. Citations Build Trust and Enable Verification
Every claim in agent responses should cite source documents with specific page/section references. Users verify 12-18% of citations in enterprise applications—accuracy directly impacts adoption.
The Road Ahead: From Search to Understanding
As knowledge agents evolve, the distinction between "search" and "understanding" will blur. Future agents won't just retrieve documents—they'll synthesize insights across thousands of sources, identify contradictions, track provenance through citation chains, and proactively surface relevant knowledge users didn't know to ask for. The infrastructure demonstrated in BRK193—hybrid search, agentic retrieval, MCP integration—lays the foundation for this intelligent knowledge layer.
Organizations investing in Azure AI Search today build not just better search experiences, but the knowledge infrastructure that will power the next decade of AI innovation.
🚀 Ready to Build Knowledge-Powered Agents?
Technspire helps Swedish organizations design and deploy production-ready RAG applications using Azure AI Search and Azure OpenAI. Our expertise spans hybrid search optimization, agentic retrieval architecture, and reasoning effort mode tuning—delivering measurable improvements in answer accuracy, retrieval speed, and cost efficiency.
Contact us for a complimentary knowledge architecture assessment and custom RAG implementation roadmap.