Hybrid Search for Manufacturing Knowledge Bases on Azure

Manufacturing knowledge bases contain two kinds of signal at once. Exact part numbers and material codes behave like database keys and need verbatim matching. Descriptions of symptoms, failure modes, and procedural instructions behave like natural language and benefit from semantic understanding. Hybrid search, when tuned for the corpus, combines both. When left to defaults, it performs worse than either extreme.

Why Neither Search Mode Alone Works

Keyword search (BM25 on Azure AI Search) finds documents that contain the exact tokens in the query. Tokens are stemmed and analyzed; near-matches are missed. A query for "torque specification M12 bolt" retrieves documents mentioning those tokens. It does not retrieve a document titled "Fastener tightening procedure for flange assemblies" even if that document describes the exact procedure the user needs.

Vector search on dense embeddings captures semantic proximity. "Fastener tightening procedure" and "torque specification" live near each other in embedding space. A well-trained embedding model retrieves the relevant document. But vector search struggles with exact-match requirements: a query for part number PN-5428-B may return documents about similar-looking numbers because the token is diluted in a 1,536-dimensional space.

Manufacturing queries routinely combine both shapes. A technician types "torque spec PN-5428-B M12 assembly line 4" and needs both semantic understanding ("torque spec" is a procedure) and exact matching (PN-5428-B is a specific part, M12 is a size, "assembly line 4" is a location).

How Azure AI Search Combines Them

Azure AI Search executes both a keyword query and a vector query in parallel, then combines results using Reciprocal Rank Fusion (RRF). RRF weighs each result by its rank position in each ranked list; documents appearing high in both lists float to the top. The default RRF constant of 60 works well for most corpora.

// Hybrid query on Azure AI Search
POST https://{svc}.search.windows.net/indexes/mfg-kb/docs/search?api-version=2024-07-01

{
 "search": "torque spec PN-5428-B M12 assembly line 4",
 "queryType": "semantic",
 "semanticConfiguration": "mfg-semantic",
 "vectorQueries": [{
 "kind": "vector",
 "vector": [/* 1536 floats: embedding of the query string */],
 "fields": "contentVector",
 "k": 50,
 "weight": 1.0
 }],
 "top": 10,
 "select": "id,title,partNumbers,content,sourcePath"
}

The Three Levers That Matter

Vector candidate count (k). Controls how many vector candidates feed the RRF fusion. Low values (10) constrain to high-confidence vector matches; high values (100) cast a wider net. For manufacturing corpora, 50 is usually the right trade.
Field weights and boosts. Boost the title and part-number fields in the keyword half so they carry more weight. Use scoring profiles to double the weight of exact matches on filterable metadata.
Semantic ranker on top. After RRF fusion produces the top 50, the semantic ranker re-scores them using a deep model. This is often the largest single quality lever — more than vector tuning, more than weighting.

Scoring Profiles for Structured Boosts

Scoring profiles apply deterministic boosts based on field values. For manufacturing, the two most useful boosts are recency (newer revisions rank higher) and locality (documents from the user's plant rank higher):

// Scoring profile fragment
{
 "scoringProfiles": [{
 "name": "mfg-boost",
 "text": {
 "weights": {
 "title": 5.0,
 "partNumbers": 10.0,
 "content": 1.0
 }
 },
 "functions": [
 {
 "type": "freshness",
 "fieldName": "revisionDate",
 "boost": 3.0,
 "interpolation": "quadratic",
 "freshness": { "boostingDuration": "P365D" }
 },
 {
 "type": "tag",
 "fieldName": "plant",
 "boost": 2.0,
 "tag": { "tagsParameter": "userPlant" }
 }
 ],
 "functionAggregation": "sum"
 }]
}

The Semantic Ranker

After RRF fusion the top 50 candidates, Microsoft's semantic ranker re-scores each against the query using a deep model. The improvement over raw BM25 + vector RRF on a typical manufacturing evaluation set is 15–30% on MRR. Two operational notes:

Semantic re-ranking is metered. It is not free. Budget the queries you expect to semantic-rank and verify the quota in the Azure portal.
Semantic ranker accepts multi-field input. Configure title, content, and keyword fields. The ranker uses them jointly. Including part numbers in keyword fields gives them explicit weight beyond inclusion in the content text.

Measuring the Fusion

Before you commit to hybrid, measure each mode in isolation. Run your eval set three times: pure keyword, pure vector, hybrid. If hybrid underperforms either extreme on your specific queries, the tuning is off — typically because the embedding model does not handle the vocabulary well, or the keyword analyzer is stripping important tokens.

Typical manufacturing results on a well-tuned index:

Keyword-only recall@10: 55–70%
Vector-only recall@10: 65–80%
Hybrid RRF recall@10: 75–88%
Hybrid + semantic ranker recall@10: 85–93%

Common Tuning Mistakes

Embedding too coarse. Embedding the whole document produces a single dense vector representing an average of everything in it. Chunk first, embed each chunk, store all vectors.
Keyword analyzer too aggressive. Part numbers and material codes often get mangled by the default analyzer's stemming. Use the standard.lucene analyzer and rely on exact-match boost via scoring profiles.
Vector k mismatched to semantic re-rank top-n. If you semantic-rank only the top 10, setting vector k to 5 throws away half the candidates that would have re-ranked well. Always retrieve at least 50 candidates before semantic re-ranking.
No evaluation set. Tuning without measurement produces superstitions. Build the eval set before the first tuning decision.

Query Rewriting for Consumer Interfaces

Technicians type short, telegraphic queries. LLM-assisted query rewriting (ask a small model to expand the query before sending to the search service) improves recall on short or ambiguous queries by 10–20%. The pattern: lightweight LLM takes the raw user input, returns an expanded query with synonyms, stemmed variants, and related domain terms. That expanded query feeds the hybrid search. Cost is one small-model call per search, measured in milliseconds and fractions of a cent.

Pulling It All Together

Hybrid search on Azure AI Search is four moving parts cooperating: the keyword analyzer producing candidates on exact terms, the vector index producing candidates on semantic proximity, RRF fusion combining them, and the semantic ranker re-scoring the top 50. Each part has a handful of settings that matter. Tune them against a labelled eval set and the improvement over a default configuration is dramatic. Skip the eval set and the tuning becomes folklore. The platform rewards attention.