Full-text search finds documents that contain the words you typed. Semantic search finds documents that mean what you meant.
Neither is sufficient alone. A keyword search for "memory consolidation" misses articles that discuss the same concept under "replay buffer" or "episodic compression." A pure vector search finds topically adjacent documents but can miss exact phrase matches that matter in technical writing.
The right answer is hybrid: run both, then merge the ranked lists.
The Client
The OpenSearch client exposes three search modes: text, semantic, and hybrid.
import { Client } from '@opensearch-project/opensearch'
const client = new Client({
node: process.env['OPENSEARCH_URL'] ?? 'http://localhost:9200',
})
export interface SearchHit {
slug: string
title: string
description: string
tags: string[]
score: number
}
// Full-text BM25 search across blog post fields
export async function textSearch(query: string, size = 10): Promise<SearchHit[]> {
const response = await client.search({
index: 'blog-posts',
body: {
size,
query: {
multi_match: {
query,
fields: ['title^3', 'description^2', 'content', 'tags'],
type: 'best_fields',
fuzziness: 'AUTO',
},
},
highlight: {
fields: { content: { fragment_size: 150, number_of_fragments: 2 } },
},
},
})
return response.body.hits.hits.map(
(hit: { _source: SearchHit; _score: number }) => ({
...hit._source,
score: hit._score,
})
)
}
// k-NN vector search using pre-computed embeddings (semantic retrieval)
export async function semanticSearch(
embedding: number[],
size = 10,
filter?: { tags: string[] }
): Promise<SearchHit[]> {
const knnQuery: Record<string, unknown> = {
knn: {
content_embedding: {
vector: embedding,
k: size,
},
},
}
if (filter?.tags.length) {
knnQuery['filter'] = {
terms: { tags: filter.tags },
}
}
const response = await client.search({
index: 'blog-posts',
body: { size, query: knnQuery },
})
return response.body.hits.hits.map(
(hit: { _source: SearchHit; _score: number }) => ({
...hit._source,
score: hit._score,
})
)
}
// Hybrid: BM25 + vector re-rank (reciprocal rank fusion)
export async function hybridSearch(
query: string,
embedding: number[],
size = 10
): Promise<SearchHit[]> {
const [textResults, vectorResults] = await Promise.all([
textSearch(query, size * 2),
semanticSearch(embedding, size * 2),
])
const scores = new Map<string, number>()
const k = 60
textResults.forEach((hit, rank) => {
scores.set(hit.slug, (scores.get(hit.slug) ?? 0) + 1 / (k + rank + 1))
})
vectorResults.forEach((hit, rank) => {
scores.set(hit.slug, (scores.get(hit.slug) ?? 0) + 1 / (k + rank + 1))
})
const allHits = new Map<string, SearchHit>()
;[...textResults, ...vectorResults].forEach((h) => allHits.set(h.slug, h))
return [...scores.entries()]
.sort(([, a], [, b]) => b - a)
.slice(0, size)
.map(([slug]) => ({ ...allHits.get(slug)!, score: scores.get(slug)! }))
}
BM25 Text Search
The textSearch function issues a multi_match query across four fields with different weights. The title field is boosted 3× - a match in the title is much stronger signal than a match in the body:
query: {
multi_match: {
query,
fields: ['title^3', 'description^2', 'content', 'tags'],
type: 'best_fields',
fuzziness: 'AUTO',
},
}Fuzziness handles typos. A query for "opensearh" finds "opensearch" with edit distance 1.
k-NN Vector Search
The semanticSearch function queries the content_embedding field using a pre-computed dense vector. OpenSearch's k-NN plugin (backed by FAISS or NMSLIB) finds the k nearest neighbors in the embedding space:
knn: {
content_embedding: {
vector: embedding, // float32[1536] from an embedding model
k: size,
},
}The filter clause narrows the k-NN search to specific tags - useful when searching within a series or topic area.
Reciprocal Rank Fusion
Merging two ranked lists is a solved problem. Reciprocal rank fusion (RRF) assigns a score of 1 / (k + rank) to each result, where k=60 dampens the impact of low-rank results. Scores from both lists are summed:
textResults.forEach((hit, rank) => {
scores.set(hit.slug, (scores.get(hit.slug) ?? 0) + 1 / (k + rank + 1))
})
vectorResults.forEach((hit, rank) => {
scores.set(hit.slug, (scores.get(hit.slug) ?? 0) + 1 / (k + rank + 1))
})Documents appearing in both lists rank higher than documents appearing in only one. This is the same technique used in production hybrid search systems.
Index Design
The blog post index maps look like this:
{
"mappings": {
"properties": {
"slug": { "type": "keyword" },
"title": { "type": "text", "analyzer": "english" },
"description": { "type": "text", "analyzer": "english" },
"content": { "type": "text", "analyzer": "english" },
"tags": { "type": "keyword" },
"content_embedding": {
"type": "knn_vector",
"dimension": 1536,
"method": { "name": "hnsw", "engine": "faiss" }
}
}
}
}The HNSW index (Hierarchical Navigable Small World) gives sub-linear k-NN search at query time, with a build-time tradeoff between index size and recall.