Use case
Semantic search API for embeddings and grounded answers
Semantic search is usually the first production step before a RAG assistant. Teams need embeddings for retrieval, a generation model for answers and a way to compare model cost as the corpus grows. PEKPIK LLM gives search and RAG teams one OpenAI-compatible gateway for supported embedding and generation models.
Why teams search for this
Where PEKPIK fits
Good fit
- OKKnowledge-base search, help center search and internal document discovery.
- OKRAG products that need both retrieval and answer generation.
- OKTeams comparing embedding models before a large indexing job.
Check first
- !Embedding model changes can require a full re-index.
- !Chunk size, metadata and ranking often matter as much as model choice.
- !Search quality should be measured with real queries and expected documents.
OpenAI-compatible example
base_url swapfrom openai import OpenAI
client = OpenAI(
base_url="https://aiapiv2.pekpik.com/v1",
api_key="sk-...",
)
response = client.chat.completions.create(
model="claude-opus-4-7",
messages=[{"role": "user", "content": "Summarize this for a product team."}],
) Suggested rollout
- 01
Create a query set with expected matching documents.
- 02
Choose candidate embedding models and confirm vector dimensions.
- 03
Measure retrieval quality before adding answer generation.
- 04
Route final answer generation by complexity, cost and risk.
FAQ
Is semantic search the same as RAG?
No. Semantic search retrieves relevant content. RAG adds a generation step that uses retrieved content to produce an answer.
Should I optimize embedding cost or chat model cost first?
It depends on volume. Large corpora can make indexing cost important, while high query traffic can make generation cost dominate.