Use case
RAG LLM API for retrieval, embeddings and generation
RAG systems need more than a single chat model. A production retrieval-augmented generation stack usually needs embedding models, generation models, fallback choices, cost controls and a way to test model changes without rewriting the application. PEKPIK LLM gives RAG teams one OpenAI-compatible gateway for those model decisions.
Why teams search for this
Where PEKPIK fits
Good fit
- OKDocument search, help center assistants and internal knowledge copilots.
- OKTeams testing embeddings, reranking and chat models together.
- OKProducts that need to control RAG cost as document volume grows.
Check first
- !Changing embedding models can require re-indexing vectors.
- !RAG quality depends on chunking, metadata and retrieval strategy, not only the LLM.
- !Citation behavior and hallucination risk should be evaluated with real documents.
OpenAI-compatible example
base_url swapfrom openai import OpenAI
client = OpenAI(
base_url="https://aiapiv2.pekpik.com/v1",
api_key="sk-...",
)
response = client.chat.completions.create(
model="claude-opus-4-7",
messages=[{"role": "user", "content": "Summarize this for a product team."}],
) Suggested rollout
- 01
Choose an embedding model and confirm vector dimensions before indexing.
- 02
Build a query set from real user questions and expected source documents.
- 03
Evaluate retrieval quality before judging the generation model.
- 04
Compare final answers across candidate chat models and route by workload risk.
FAQ
Does PEKPIK replace a vector database for RAG?
No. PEKPIK provides model access through a gateway. You still need your own retrieval layer, vector database or search system.
Should RAG use the same model for every answer?
Not always. Many teams use lower-cost models for simple grounded answers and stronger models for complex synthesis or high-risk responses.