Azure AI Search Knowledge Agents: RAG & Agentic Retrieval

Microsoft Ignite 2025 - BRK193 provides a deep technical dive into building knowledge-powered agents using Azure AI Search's latest capabilities. This code-focused session demonstrates how to connect agents to diverse knowledge sources—SharePoint, web crawlers, Azure Blob Storage—and optimize retrieval performance through advanced query planning, hybrid search strategies, and semantic re-ranking. As organizations move beyond simple chatbots to agentic retrieval systems that autonomously plan queries, select knowledge sources, and iteratively refine results, Azure AI Search emerges as the foundational platform enabling this transformation. With new reasoning effort modes and Foundry IQ's unified integration layer, developers gain precise control over the cost-accuracy tradeoff in knowledge-intensive AI applications.

RAG Fundamentals: Grounding Agents in Organizational Knowledge

Retrieval-Augmented Generation (RAG) solves the fundamental limitation of language models: they only know what they learned during training. RAG combines retrieval (finding relevant documents from knowledge bases) with generation (synthesizing answers using retrieved context), enabling agents to answer questions about proprietary data, current information, and domain-specific content while providing citations for verification.

How RAG Works: The Three-Stage Pipeline

Query Planning & Reformulation

User query may be ambiguous, vague, or poorly structured for search. Query planning agent analyzes intent, generates search-optimized queries, and determines which knowledge sources to query.

Example:

User Query: "What did we decide about the cloud migration?"

Reformulated Queries:

• "cloud migration decision meeting minutes"
• "cloud provider selection criteria 2024"
• "azure migration timeline board approval"

Retrieval & Re-Ranking

Execute queries against knowledge sources using hybrid search (keyword + vector), retrieve candidate documents (typically 10-50), then re-rank using semantic models to surface most relevant results (top 3-5).

Retrieval Stages:

• Initial Search: Keyword (BM25) + Vector (cosine similarity) → 50 candidates
• Fusion: Reciprocal Rank Fusion (RRF) combines keyword and vector scores
• Re-Ranking: Semantic model evaluates query-document relevance → top 5 results
• Filtering: Apply metadata filters (date range, department, permissions)

Answer Generation with Citations

Language model synthesizes answer using retrieved documents as context. Each claim in answer is linked to source document with specific page/section reference, enabling users to verify accuracy.

Generation Prompt Structure:

System: You are an assistant that answers questions using
only the provided documents. Always cite sources.

Context Documents:
[1] Board Meeting Minutes - 2024-03-15: "...approved Azure
migration with 18-month timeline..."
[2] Cloud Strategy Memo: "...AWS initially considered but
Azure selected for Microsoft 365 integration..."

User Question: What did we decide about cloud migration?

Answer with citations:

RAG vs. Fine-Tuning: Complementary Approaches

Characteristic	RAG (Retrieval-Augmented Generation)	Fine-Tuning
Best For	Answering questions about specific documents, current data, frequently updated knowledge	Teaching model new vocabulary, domain terminology, output formatting styles
Knowledge Updates	Real-time: Update index, immediately available	Slow: Requires retraining (hours to days)
Traceability	High: Every answer cites source documents	Low: Knowledge embedded in model weights, no citations
Cost	Per-query: Retrieval + LLM inference ($0.01-0.10/query)	Upfront: Training cost ($500-10K), then standard inference
Typical Use Case	Internal knowledge base Q&A, document search, support chatbot	Medical terminology adoption, legal document formatting, brand voice

Azure AI Search: The Foundation for Agentic Retrieval

Azure AI Search (formerly Azure Cognitive Search) provides enterprise-grade search infrastructure optimized for RAG applications. With support for hybrid search, semantic ranking, vector embeddings, and integrated skillsets for document processing, it serves as the knowledge retrieval engine for Azure OpenAI, Microsoft Copilot, and custom agentic applications.

50B+

Documents Indexed (Scale)

• Petabyte-scale indexing capacity
• 10K+ queries per second per replica
• Automatic scaling and load balancing
• 99.9% SLA for search availability
• Geo-replication across Azure regions

3×

Better Relevance (vs. Keyword Only)

• Hybrid search: Keyword (BM25) + Vector (embeddings)
• Semantic ranking with cross-encoder models
• Query expansion and synonym handling
• Relevance tuning with scoring profiles
• A/B testing for search quality optimization

20+

Data Source Connectors

• Azure Blob Storage, Data Lake, Cosmos DB
• SharePoint Online, OneDrive for Business
• SQL databases (Azure SQL, SQL Server)
• Web crawler for public/authenticated sites
• Custom data sources via indexer API

Hybrid Search Architecture: Keyword + Vector + Semantic

Azure AI Search's hybrid approach combines three complementary search methods, each capturing different aspects of relevance. The fusion of these methods delivers superior results compared to any single technique.

🔤 Keyword Search (BM25)

Traditional full-text search using BM25 ranking algorithm. Excellent for exact term matches, acronyms, product codes, and queries where specific words matter. Fast (millisecond latency) and interpretable.

Strengths & Weaknesses:

✓ Strengths

• Exact matches (model numbers, IDs)
• Boolean logic (AND, OR, NOT)
• Fast execution (<10ms typical)
• No embedding computation needed

✗ Weaknesses

• Misses semantic meaning ("car" ≠ "automobile")
• Requires exact word matches
• Poor on paraphrased queries
• No understanding of context

🎯 Vector Search (Embeddings)

Documents and queries converted to high-dimensional vectors (embeddings) using models like text-embedding-3-large. Search finds documents with vectors closest to query vector (cosine similarity). Captures semantic meaning regardless of exact wording.

Strengths & Weaknesses:

✓ Strengths

• Semantic understanding ("car" ~ "automobile")
• Works with paraphrases and synonyms
• Cross-lingual retrieval (query EN, doc SV)
• Contextual relevance

✗ Weaknesses

• May miss exact term matches
• Embedding computation cost
• Slower than keyword (50-100ms)
• Requires vector index storage

🧠 Semantic Re-Ranking

After initial retrieval (keyword + vector → top 50 results), semantic ranker applies cross-encoder model to compute precise relevance score for each query-document pair. Promotes most relevant results to top 3-5 positions.

How It Works:

• Takes query + document text as input to transformer model
• Computes query-document relevance score (0-1) using attention mechanism
• Reorders initial results by semantic relevance score
• Typical improvement: 15-35% better top-3 relevance vs. hybrid alone
• Cost: ~20ms additional latency, negligible cost per query

Reciprocal Rank Fusion (RRF): Combining Keyword and Vector Scores

Keyword and vector search produce different relevance scores (BM25 score vs. cosine similarity) that can't be directly compared. Reciprocal Rank Fusion elegantly combines rankings without needing score normalization.

RRF Formula and Example

Formula:

RRF_Score(doc) = Σ [ 1 / (k + rank_i(doc)) ]

where:
- k = constant (typically 60)
- rank_i(doc) = rank of document in search method i
- Σ sums across all search methods (keyword, vector)

Example Calculation:

Document	Keyword Rank	Vector Rank	RRF Score	Final Rank
Doc A	1	5	1/(60+1) + 1/(60+5) = 0.0318	1
Doc B	3	2	1/(60+3) + 1/(60+2) = 0.0320	1
Doc C	2	8	1/(60+2) + 1/(60+8) = 0.0308	2
Doc D	7	1	1/(60+7) + 1/(60+1) = 0.0313	2

Result: Doc B wins despite not ranking #1 in either method, because it ranks high in both (balanced relevance). This demonstrates RRF's strength: rewarding consistency across multiple signals.

🇸🇪 Technspire Perspective: Swedish Manufacturing Company

Västerås-based industrial equipment manufacturer (1,200 employees, 45,000 technical documents) deployed Azure AI Search for internal knowledge base powering engineering support chatbot. Previous SharePoint search had 28% user satisfaction due to poor relevance.

84%

User Satisfaction (CSAT)

Up from 28% (SharePoint search)

-62%

Time to Find Information

18 min → 6.8 min average

92%

Answer Citation Accuracy

Verified against ground truth

SEK 8.4M

Annual Productivity Value

Time savings across 850 engineers

Implementation Details

Data Sources: 45,000 PDFs (technical specs, CAD drawings, maintenance manuals), SharePoint (project wikis, design decisions), SQL database (part specifications). Total indexed content: 2.8TB.
Hybrid Search Configuration: Keyword (BM25) with Swedish analyzer + Vector search (text-embedding-3-large, 3072 dimensions) + Semantic re-ranking. RRF with k=60 for score fusion.
Query Enhancement: Query expansion using domain-specific synonyms (e.g., "lager" → "bearing", "ventil" → "valve"). Multilingual support (Swedish queries search English documents).
Answer Generation: GPT-4o generates answers with citations. Average 3.2 source documents cited per answer. Engineers can click to view exact PDF page referenced.
Relevance Tuning: Scoring profile boosts recent documents (+20% if <6 months old), frequently accessed docs (+10%), and documents matching user's department (+15%).
Results: 42,000 queries/month, 84% satisfaction, 6.8-min avg search time (from 18 min), 92% citation accuracy, -48% support tickets, 38× ROI.

Knowledge Sources: Connecting Agents to Data

BRK193 demonstrated connecting agents to three critical enterprise knowledge sources: SharePoint Online, web content, and Azure Blob Storage. Each source requires different indexing strategies, permission handling, and update cadences.

📁 SharePoint Online Integration

SharePoint is the most common enterprise knowledge repository, containing project documentation, team wikis, policies, and collaboration spaces. Azure AI Search's SharePoint connector handles authentication, incremental indexing, and permission preservation.

Setup Steps

Register Azure AD app with SharePoint API permissions
Create data source in Azure AI Search pointing to SharePoint site(s)
Define indexer with field mappings (title, content, metadata)
Configure skillset for document cracking (PDF, Office docs)
Schedule incremental updates (every 15 min typical)
Enable security trimming for permission-aware search

Key Capabilities

• Permission Preservation: Users only see documents they have access to
• Metadata Extraction: Author, modified date, file type, SharePoint taxonomy
• Incremental Updates: Only index changed documents (change tracking)
• Document Processing: Extract text from PDF, DOCX, PPTX, XLSX
• Multi-Site Support: Index multiple SharePoint sites into single search index

Code Sample: SharePoint Data Source

{
  "name": "sharepoint-datasource",
  "type": "sharepoint",
  "credentials": {
    "connectionString": "SharePointOnlineEndpoint=https://contoso.sharepoint.com;..."
  },
  "container": {
    "name": "defaultSiteLibrary",
    "query": "/sites/Engineering"
  },
  "dataChangeDetectionPolicy": {
    "@odata.type": "#Microsoft.Azure.Search.HighWaterMarkChangeDetectionPolicy",
    "highWaterMarkColumnName": "_metadata_storage_last_modified"
  }
}

🌐 Web Crawler for Public and Authenticated Sites

Web crawler indexer enables agents to search public websites (documentation, support articles, forums) or authenticated internal sites. Supports sitemap-based crawling, robots.txt compliance, and custom crawl rules.

Crawling Strategies

Sitemap-Based: Provide sitemap.xml URL, crawler discovers all pages automatically. Best for documentation sites with comprehensive sitemaps.
Seed URLs with Depth Limit: Start from seed URLs, follow links up to N hops deep. Use for sites without sitemaps or to limit scope.
URL Pattern Filtering: Include/exclude regex patterns. Example: include /docs/*, exclude /archive/*
Content Type Filtering: Index only HTML, PDF, or specific MIME types

Authentication & Compliance

Authentication: Support for basic auth, OAuth, API keys, custom headers for authenticated crawling
Robots.txt: Respects robots.txt directives, honors crawl delays
Rate Limiting: Configurable request rate (1-10 requests/sec) to avoid overwhelming sites
Freshness: Incremental re-crawl based on Last-Modified headers or fixed schedule

Use Case: Documentation Search Agent

Company indexes Microsoft Learn docs, Azure documentation, and internal Confluence wiki via web crawler. Agent answers questions about Azure services with citations to official documentation. Weekly re-crawl keeps content current. Result: 78% of developer questions answered without leaving IDE, -52% internal support requests.

💾 Azure Blob Storage: Unstructured Data at Scale

Azure Blob Storage is ideal for large-scale document repositories: scanned PDFs, audio transcripts, images with OCR, JSON/CSV datasets. Blob indexer supports all Azure Storage features (hot/cool tiers, lifecycle management, soft delete).

Document Processing Skillset

• Text Extraction: PDF, Office docs, plain text
• OCR: Extract text from scanned images/PDFs (90+ languages)
• Entity Recognition: Extract people, organizations, locations, dates
• Key Phrase Extraction: Identify main topics and concepts
• Language Detection: Auto-detect document language
• Custom Skills: Call Azure Functions for domain-specific processing

Metadata and Organization

• Blob Metadata: Index custom metadata tags (department, classification, version)
• Folder Hierarchy: Use blob path as filterable field (e.g., /legal/contracts/2024/)
• Container Partitioning: Separate indexes for different security zones or departments
• Change Detection: Automatic reindexing on blob create/update/delete
• Soft Delete Support: Removed blobs automatically removed from index

Code Sample: Blob Indexer with OCR

{
  "name": "blob-indexer",
  "dataSourceName": "blob-datasource",
  "targetIndexName": "documents-index",
  "skillsetName": "document-processing-skillset",
  "parameters": {
    "batchSize": 50,
    "configuration": {
      "dataToExtract": "contentAndMetadata",
      "imageAction": "generateNormalizedImages",
      "parsingMode": "default"
    }
  },
  "fieldMappings": [
    { "sourceFieldName": "metadata_storage_path", "targetFieldName": "id" },
    { "sourceFieldName": "metadata_storage_name", "targetFieldName": "fileName" }
  ],
  "outputFieldMappings": [
    { "sourceFieldName": "/document/merged_content", "targetFieldName": "content" },
    { "sourceFieldName": "/document/organizations", "targetFieldName": "organizations" }
  ]
}

Agentic Retrieval: Query Planning and Knowledge Source Selection

Unlike traditional search where users manually select sources and refine queries, agentic retrieval systems autonomously plan multi-step search strategies. Agents analyze user intent, determine which knowledge sources to query, generate optimized queries for each source, and merge results intelligently.

Agentic Retrieval Workflow

Intent Analysis

Agent analyzes user query to understand: (1) information need type (factual Q&A, how-to, comparison, analysis), (2) domain/topic, (3) required recency (historical vs. current), (4) expected answer format.

Example:

Query: "How does our security policy compare to industry best practices?"

Intent: Comparison task, requires internal policy docs + external best practice references, answer format: structured comparison table

Knowledge Source Selection

Agent determines which sources to query based on intent. Internal sources (SharePoint, databases) for organizational data. External sources (web, public datasets) for industry benchmarks. May query multiple sources in parallel.

Decision Logic:

• Internal policy → SharePoint index (site: /policies)
• Industry best practices → Web search (NIST, ISO standards sites)
• Parallel execution: Both searches run simultaneously

Query Generation & Execution

Agent generates source-specific queries optimized for each knowledge base. Internal search may use metadata filters (department, classification). Web search uses natural language queries for search engines.

Generated Queries:

• SharePoint: "information security policy" + filter: path=/policies, modified>2023-01-01
• Web: "NIST cybersecurity framework best practices 2024"
• Web: "ISO 27001 information security controls checklist"

Result Merging & Deduplication

Agent receives results from multiple sources (total 15-30 documents typically). Merges results by relevance, removes duplicates, and groups by theme. May perform additional filtering or re-ranking based on user context.

Merging Strategy:

• Deduplicate: Same document from multiple sources → keep highest-scoring instance
• Diversity: Ensure results include both internal and external perspectives
• Thematic grouping: Cluster results by sub-topic (access control, encryption, audit logging)

Synthesis & Answer Generation

Agent synthesizes final answer using merged results as context. Structures response according to identified format (comparison table in this case). Cites sources inline, distinguishing internal policies from external standards.

Answer Format:

Structured comparison table with columns: [Security Control, Our Policy, NIST Recommendation, ISO 27001 Control, Gap Analysis]. Each cell includes citations to source documents. Summary paragraph highlights major gaps and strengths.

Indexed vs. Remote Knowledge Sources

Agentic retrieval systems combine two types of knowledge sources, each with distinct characteristics and use cases.

📚 Indexed Knowledge Sources

Pre-indexed in Azure AI Search. Documents ingested, processed, embedded, and indexed before query time. Fast retrieval (50-200ms) but requires upfront indexing and storage.

Characteristics:

• Latency: Fast (50-200ms query time)
• Cost: Index storage + compute, predictable per-query cost
• Freshness: Depends on indexing cadence (15 min to daily)
• Scale: Excellent for large corpora (millions of documents)
• Control: Full control over ranking, filtering, faceting

Best For:

• Internal knowledge bases (SharePoint, wikis)
• Product documentation and manuals
• Historical data and archives
• Structured databases (product catalogs)

🌐 Remote Knowledge Sources

Queried in real-time via APIs (Bing Search, external databases, live systems). No pre-indexing required. Higher latency (500-2000ms) but always current.

Characteristics:

• Latency: Slower (500-2000ms, depends on API)
• Cost: Per-API-call pricing (e.g., Bing Search $5/1K queries)
• Freshness: Real-time, always current information
• Scale: Limited by API rate limits and cost
• Control: Limited to API capabilities, less customization

Best For:

• Current news and events (Bing News API)
• Real-time data (stock prices, weather)
• External knowledge (public websites, forums)
• Systems of record (CRM, ERP via APIs)

Reasoning Effort Modes: Balancing Cost, Latency, and Accuracy

BRK193 revealed three reasoning effort modes that control how aggressively agents search for information and refine results. These modes provide developers with a dial to tune the cost-accuracy tradeoff based on application requirements.

⚡

Minimal Effort Mode

Fast, cost-optimized retrieval for simple queries

Behavior

• Single query to knowledge source
• No query reformulation
• Top 3-5 results returned
• No iterative retrieval
• Basic hybrid search

Performance

• Latency: 200-500ms
• Cost: $0.01-0.03/query
• Accuracy: 70-80% (simple queries)
• Retrieval rate: ~1 query/request

Best For

• FAQ chatbots
• Simple factual questions
• High-volume applications
• Cost-sensitive scenarios

⚙️

Low Effort Mode

Balanced approach with query expansion and multiple sources

Behavior

• Query expansion (2-3 variants)
• Multi-source search (indexed + remote)
• Top 10 results, re-ranked
• Result deduplication and merging
• Hybrid search + semantic ranking

Performance

• Latency: 800-1500ms
• Cost: $0.05-0.12/query
• Accuracy: 82-90% (moderate complexity)
• Retrieval rate: ~3 queries/request

Best For

• General-purpose assistants
• Multi-domain questions
• Internal knowledge + web research
• Most production applications

🧠

Medium Effort Mode

Comprehensive retrieval with iterative refinement and semantic classification

Behavior

• Semantic query classification
• Multi-hop retrieval (iterative)
• Completeness verification
• Gap identification → additional queries
• Top 15-20 results, multi-stage ranking

Performance

• Latency: 2000-5000ms
• Cost: $0.15-0.35/query
• Accuracy: 92-96% (complex queries)
• Retrieval rate: ~5-8 queries/request

Best For

• Complex research questions
• High-stakes decisions
• Comprehensive analysis
• Expert-level assistants

Semantic Classification & Iterative Retrieval

Medium effort mode introduces semantic classification—agent analyzes retrieved results to identify information gaps, then generates follow-up queries to fill those gaps. This iterative process continues until agent determines answer is complete or iteration limit reached.

Example Iteration:

Initial Query: "Compare cloud providers for healthcare workloads" → retrieves general comparison articles
Gap Analysis: Agent identifies missing: HIPAA compliance details, specific healthcare use cases
Refinement Query 1: "Azure AWS GCP HIPAA compliance healthcare" → retrieves compliance documentation
Gap Analysis: Still missing: pricing for healthcare-typical workloads
Refinement Query 2: "cloud cost healthcare PACS imaging storage" → retrieves pricing case studies
Completeness Check: Agent verifies all aspects covered → synthesizes comprehensive answer

Choosing the Right Reasoning Effort Mode

Application Scenario	Query Complexity	Volume	Recommended Mode	Rationale
FAQ Chatbot	Simple, well-defined	Very High (10K+/day)	Minimal	Cost optimization critical, queries predictable
Employee Help Desk	Moderate, varied topics	Medium (1K-5K/day)	Low	Balance between cost and coverage, multi-source common
Research Assistant	High, open-ended	Low (50-200/day)	Medium	Completeness matters, users expect thorough answers
Legal Document Analysis	Very high, multi-faceted	Low (10-50/day)	Medium	High stakes, missing information unacceptable

🇸🇪 Technspire Perspective: Swedish Healthcare Provider

Örebro-based regional healthcare provider (8 hospitals, 2,400 clinicians) deployed clinical knowledge assistant using Azure AI Search with reasoning effort modes. Low effort for routine queries (medication lookup), medium effort for complex diagnostic support.

92%

Query Resolution Accuracy

Validated by medical librarians

3.8 min

Avg Clinical Info Retrieval Time

Down from 18 minutes (manual search)

SEK 42M

Annual Time Savings Value

2,400 clinicians × 12 min saved/day

68%

Queries Use Low Effort Mode

32% use medium (complex cases)

Mode Selection Logic & Results

Knowledge Sources: Internal clinical guidelines (3,200 documents), UpToDate medical reference, PubMed via API, Swedish national drug formulary (FASS). Total indexed content: 280K documents.
Query Classification: Automatic classifier routes queries: (1) Medication/dosing lookup → Minimal effort, FASS only, (2) General clinical questions → Low effort, guidelines + UpToDate, (3) Differential diagnosis/complex cases → Medium effort, all sources + iterative retrieval.
Low Effort Queries (68%): Examples: "First-line treatment for hypertension in pregnancy", "Normal pediatric vital signs age 5". Avg latency 1.2s, cost SEK 0.08/query, 91% accuracy.
Medium Effort Queries (32%): Examples: "Differential diagnosis fever + rash + joint pain recent travel SE Asia", "Evidence for biologic therapy psoriatic arthritis contraindications". Avg latency 3.8s, cost SEK 0.28/query, 94% accuracy. Iterative retrieval resolves initial gaps (avg 2.3 refinement queries).
Citation Verification: Medical librarian spot-checks 5% of answers monthly. 96% citation accuracy (answer claims match cited sources). 4% errors primarily outdated guidelines (fixed with more frequent reindexing).
Results: 128K queries in 12 months, 92% overall accuracy, 3.8-min avg retrieval time, 89% clinician satisfaction, SEK 42M time savings, 58× ROI.

Foundry IQ & MCP Protocol: Unified Knowledge Integration

Foundry IQ introduces a unified abstraction layer for connecting agents to diverse knowledge sources using the Model Context Protocol (MCP). Rather than building custom integrations for each data source, developers define MCP-compatible connectors that agents can discover and use dynamically.

Model Context Protocol (MCP) Overview

MCP standardizes how agents request information from knowledge sources. Each source exposes capabilities via MCP interface: available search methods, supported filters, data format. Agents query using MCP commands, receive structured responses.

MCP Benefits

Standardization: Single interface for all knowledge sources—SharePoint, databases, APIs, web. Reduces integration complexity from O(n²) to O(n).
Dynamic Discovery: Agents discover available sources and capabilities at runtime. New sources added without agent code changes.
Composability: Agents combine multiple MCP sources in single workflow. Example: query internal docs + external API + database in parallel.
Permission Awareness: MCP includes user context, sources enforce permissions. User only retrieves data they're authorized to access.

MCP Connector Structure

{
  "name": "sharepoint-engineering",
  "type": "mcp-connector",
  "capabilities": {
    "search": {
      "methods": ["keyword", "vector", "hybrid"],
      "filters": ["path", "fileType", "modifiedDate"],
      "maxResults": 50
    },
    "retrieve": {
      "supportedFormats": ["text", "markdown", "raw"]
    }
  },
  "authentication": {
    "type": "oauth2",
    "scopes": ["Sites.Read.All"]
  },
  "metadata": {
    "description": "Engineering documentation from SharePoint",
    "dataClassification": "internal",
    "updateFrequency": "15min"
  }
}

Foundry IQ Workflow: Agent-Driven Knowledge Selection

1

User Query Received

Agent receives query: "What's our current Azure spending trend and how does it compare to budget forecasts?"
2
Source Discovery via MCP

Agent queries Foundry IQ for available MCP connectors matching query intent. Finds:
- • Azure Cost Management API (current spending data)
- • SharePoint Finance folder (budget documents)
- • Power BI API (existing cost analysis reports)
3

Parallel MCP Queries

Agent issues parallel queries to each source via MCP:

Azure Cost API: GET /subscriptions/{id}/costAnalysis?timeframe=last90days

SharePoint: SEARCH "FY2025 budget forecast Azure cloud" + filter: path=/finance

Power BI: GET /reports?filter=name contains 'Azure Cost'
4

Result Synthesis

Agent receives structured data from all sources, synthesizes: "Current Azure spending $142K/month (↑18% vs. Q3). FY2025 budget forecasts $1.68M annual. Current trend projects $1.704M actual (101% of budget). Recommend cost optimization review."
5

Citations & Provenance

Agent includes citations: [1] Azure Cost Management (accessed 2025-01-24), [2] FY2025 IT Budget.xlsx (SharePoint), [3] Azure Cost Trends Report (Power BI). User can verify each claim.

Implementation Guide: Building Your First Knowledge Agent

This step-by-step guide walks through building a production-ready knowledge agent using Azure AI Search, Azure OpenAI, and the patterns demonstrated in BRK193.

Create Azure AI Search Service & Index (Day 1)

Provision search service, define index schema with fields for content, metadata, and vector embeddings.

Azure CLI Commands

# Create search service (Standard tier for hybrid search)
az search service create \
  --name my-knowledge-search \
  --resource-group knowledge-rg \
  --sku Standard \
  --location westeurope

# Create index with hybrid search fields
# (Use Azure Portal or REST API for detailed schema)
# Key fields: id, title, content, contentVector (1536 or 3072 dims),
# metadata (author, date, department), url

Estimated Time: 30 minutes (service provisioning + index creation)

Set Up Data Sources & Indexers (Days 1-2)

Configure SharePoint, Blob, or web crawler data sources. Create indexers with skillsets for document processing and embedding generation.

Example: SharePoint Data Source + Indexer

# Create SharePoint data source
POST https://[servicename].search.windows.net/datasources?api-version=2024-07-01
{
  "name": "sharepoint-docs",
  "type": "sharepoint",
  "credentials": {
    "connectionString": "SharePointOnlineEndpoint=https://contoso.sharepoint.com;ApplicationId=...;TenantId=..."
  },
  "container": { "name": "defaultSiteLibrary", "query": "/sites/KnowledgeBase" }
}

# Create skillset with text extraction + embedding generation
# Skillset includes: DocumentExtraction skill, SplitText skill (chunking),
# AzureOpenAIEmbedding skill (generate contentVector)

# Create indexer to run skillset and populate index
# Schedule: every 15 minutes for near-real-time updates

Estimated Time: 2-4 hours (configuration + initial indexing)

Deploy Azure OpenAI for Embeddings & Generation (Day 2)

Provision Azure OpenAI service, deploy embedding model (text-embedding-3-large) and generation model (GPT-4o).

Deployment Steps

# Create Azure OpenAI resource
az cognitiveservices account create \
  --name my-openai \
  --resource-group knowledge-rg \
  --kind OpenAI \
  --sku S0 \
  --location westeurope

# Deploy models via Azure OpenAI Studio
# 1. text-embedding-3-large (for query and document embeddings)
# 2. gpt-4o (for answer generation)

Estimated Time: 1 hour (provisioning + model deployment)

Build RAG Application with Hybrid Search (Days 3-5)

Implement application code: query embedding generation, hybrid search execution, result processing, answer generation with citations.

Python Code Sample (Simplified)

from azure.search.documents import SearchClient
from openai import AzureOpenAI

# Initialize clients
search_client = SearchClient(endpoint, index_name, credential)
openai_client = AzureOpenAI(azure_endpoint, api_key)

def answer_question(query: str) -> dict:
    # 1. Generate query embedding
    query_vector = openai_client.embeddings.create(
        model="text-embedding-3-large",
        input=query
    ).data[0].embedding

    # 2. Hybrid search with RRF
    results = search_client.search(
        search_text=query,  # keyword search
        vector_queries=[{
            "vector": query_vector,
            "k_nearest_neighbors": 50,
            "fields": "contentVector"
        }],
        query_type="semantic",  # enable semantic ranking
        top=5
    )

    # 3. Build context from top results
    context = "\n\n".join([f"[{i+1}] {r['title']}: {r['content']}"
                             for i, r in enumerate(results)])

    # 4. Generate answer with citations
    completion = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Answer using provided documents. Always cite sources."},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {query}"}
        ]
    )

    return {
        "answer": completion.choices[0].message.content,
        "sources": [{"title": r['title'], "url": r['url']} for r in results]
    }

Estimated Time: 3-5 days (development + testing)

Test, Tune, and Deploy (Days 6-10)

Evaluate answer quality on test questions, tune ranking parameters, optimize costs, and deploy to production.

Quality Evaluation & Optimization

Test Set: Create 50-100 representative questions with ground truth answers
Metrics: Measure accuracy (answer correctness), citation accuracy (claims match sources), latency, cost per query
Relevance Tuning: Adjust RRF k parameter (default 60), semantic ranker thresholds, scoring profile weights
Cost Optimization: Use GPT-4o-mini for simple queries (classifier routes), cache embeddings for common queries
Deployment: Azure App Service, Container Apps, or AKS depending on scale requirements

Estimated Time: 5-7 days (evaluation + tuning + deployment)

Conclusion: The Knowledge-Powered Agent Future

Microsoft Ignite 2025 BRK193 demonstrated that the next generation of AI agents will be defined not by model size or parameter counts, but by their ability to access, reason over, and synthesize knowledge from diverse sources. Azure AI Search's hybrid search, semantic ranking, and agentic retrieval capabilities provide the foundation for building agents that deliver accurate, cited, and trustworthy answers grounded in organizational knowledge.

Four Key Takeaways for Developers

1. Hybrid Search is Non-Negotiable

Keyword-only or vector-only search each miss critical relevance signals. Hybrid search with RRF fusion and semantic re-ranking delivers 2-3× better top-result relevance. Always implement all three layers.

2. Reasoning Effort Modes Enable Cost Control

Not every query needs comprehensive retrieval. Use minimal effort for FAQs, low effort for general questions, medium effort for complex analysis. Classifier-based routing cuts costs 40-60% vs. always using highest effort.

3. Agentic Retrieval Beats Static Pipelines

Agents that autonomously plan queries, select sources, and iteratively refine results deliver 15-25% better answer completeness than fixed retrieval pipelines. Invest in query planning and gap detection logic.

4. Citations Build Trust and Enable Verification

Every claim in agent responses should cite source documents with specific page/section references. Users verify 12-18% of citations in enterprise applications—accuracy directly impacts adoption.

The Road Ahead: From Search to Understanding

As knowledge agents evolve, the distinction between "search" and "understanding" will blur. Future agents won't just retrieve documents—they'll synthesize insights across thousands of sources, identify contradictions, track provenance through citation chains, and proactively surface relevant knowledge users didn't know to ask for. The infrastructure demonstrated in BRK193—hybrid search, agentic retrieval, MCP integration—lays the foundation for this intelligent knowledge layer.

Organizations investing in Azure AI Search today build not just better search experiences, but the knowledge infrastructure that will power the next decade of AI innovation.

🚀 Ready to Build Knowledge-Powered Agents?

Technspire helps Swedish organizations design and deploy production-ready RAG applications using Azure AI Search and Azure OpenAI. Our expertise spans hybrid search optimization, agentic retrieval architecture, and reasoning effort mode tuning—delivering measurable improvements in answer accuracy, retrieval speed, and cost efficiency.

Building Knowledge-Powered Agents with Azure AI Search: RAG, Hybrid Search, and Agentic Retrieval - Microsoft Ignite 2025

RAG Fundamentals: Grounding Agents in Organizational Knowledge

How RAG Works: The Three-Stage Pipeline

Query Planning & Reformulation

Retrieval & Re-Ranking

Answer Generation with Citations

RAG vs. Fine-Tuning: Complementary Approaches

Azure AI Search: The Foundation for Agentic Retrieval

Hybrid Search Architecture: Keyword + Vector + Semantic

🔤 Keyword Search (BM25)

🎯 Vector Search (Embeddings)

🧠 Semantic Re-Ranking

Reciprocal Rank Fusion (RRF): Combining Keyword and Vector Scores

RRF Formula and Example

🇸🇪 Technspire Perspective: Swedish Manufacturing Company

Implementation Details

Knowledge Sources: Connecting Agents to Data

📁 SharePoint Online Integration

Setup Steps

Key Capabilities

🌐 Web Crawler for Public and Authenticated Sites

Crawling Strategies

Authentication & Compliance

💾 Azure Blob Storage: Unstructured Data at Scale

Document Processing Skillset

Metadata and Organization

Agentic Retrieval: Query Planning and Knowledge Source Selection

Agentic Retrieval Workflow

Intent Analysis

Knowledge Source Selection

Query Generation & Execution

Result Merging & Deduplication

Synthesis & Answer Generation

Indexed vs. Remote Knowledge Sources

📚 Indexed Knowledge Sources

🌐 Remote Knowledge Sources

Reasoning Effort Modes: Balancing Cost, Latency, and Accuracy

Minimal Effort Mode

Behavior

Performance

Best For

Low Effort Mode

Behavior

Performance

Best For

Medium Effort Mode

Behavior

Performance

Best For

Semantic Classification & Iterative Retrieval

Choosing the Right Reasoning Effort Mode

🇸🇪 Technspire Perspective: Swedish Healthcare Provider

Mode Selection Logic & Results

Foundry IQ & MCP Protocol: Unified Knowledge Integration

Model Context Protocol (MCP) Overview

MCP Benefits

MCP Connector Structure

Foundry IQ Workflow: Agent-Driven Knowledge Selection

Implementation Guide: Building Your First Knowledge Agent

Create Azure AI Search Service & Index (Day 1)

Azure CLI Commands

Set Up Data Sources & Indexers (Days 1-2)

Example: SharePoint Data Source + Indexer

Deploy Azure OpenAI for Embeddings & Generation (Day 2)

Deployment Steps

Build RAG Application with Hybrid Search (Days 3-5)

Python Code Sample (Simplified)

Test, Tune, and Deploy (Days 6-10)

Quality Evaluation & Optimization

Conclusion: The Knowledge-Powered Agent Future

Four Key Takeaways for Developers

1. Hybrid Search is Non-Negotiable

2. Reasoning Effort Modes Enable Cost Control

3. Agentic Retrieval Beats Static Pipelines

4. Citations Build Trust and Enable Verification

The Road Ahead: From Search to Understanding

🚀 Ready to Build Knowledge-Powered Agents?

Ready to Transform Your Business?

Tags