PEKPIK LLM Get a Key

Use case

Semantic search API for embeddings and grounded answers

Semantic search is usually the first production step before a RAG assistant. Teams need embeddings for retrieval, a generation model for answers and a way to compare model cost as the corpus grows. PEKPIK LLM gives search and RAG teams one OpenAI-compatible gateway for supported embedding and generation models.

Primary query
semantic search API
Related searches
embedding API for semantic search / RAG search API / vector search LLM API

Why teams search for this

Use embedding models for retrieval and chat models for grounded answers.
Evaluate retrieval quality before changing the generation model.
Plan cost separately for indexing, query embeddings and answer generation.
Keep model routing flexible as your corpus and query volume grow.

Where PEKPIK fits

Good fit

  • OKKnowledge-base search, help center search and internal document discovery.
  • OKRAG products that need both retrieval and answer generation.
  • OKTeams comparing embedding models before a large indexing job.

Check first

  • !Embedding model changes can require a full re-index.
  • !Chunk size, metadata and ranking often matter as much as model choice.
  • !Search quality should be measured with real queries and expected documents.

OpenAI-compatible example

base_url swap
from openai import OpenAI

client = OpenAI(
    base_url="https://aiapiv2.pekpik.com/v1",
    api_key="sk-...",
)

response = client.chat.completions.create(
    model="claude-opus-4-7",
    messages=[{"role": "user", "content": "Summarize this for a product team."}],
)

Suggested rollout

  1. 01

    Create a query set with expected matching documents.

  2. 02

    Choose candidate embedding models and confirm vector dimensions.

  3. 03

    Measure retrieval quality before adding answer generation.

  4. 04

    Route final answer generation by complexity, cost and risk.

FAQ

Is semantic search the same as RAG?

No. Semantic search retrieves relevant content. RAG adds a generation step that uses retrieved content to produce an answer.

Should I optimize embedding cost or chat model cost first?

It depends on volume. Large corpora can make indexing cost important, while high query traffic can make generation cost dominate.