Return to site

☕🐘 PLUG JAVA INTO PGVECTOR

· java

🔸 TL;DR

pgvector is not “a magic AI database”.

It is a PostgreSQL extension that lets you store embeddings and search by vector similarity.

In other words:

  1. ▪️ PostgreSQL stores your data
  2. ▪️ pgvector stores the semantic representation
  3. ▪️ Java / Spring sends documents and queries
  4. ▪️ Spring AI can hide most of the plumbing

That makes pgvector a very pragmatic first step for Java developers who want to build semantic search or RAG without introducing a dedicated vector database too early.

🔸 WHAT IS A VECTOR HERE?

In this context, a vector is just a list of numbers that represents the meaning of a piece of text.

Example:

"How to secure a REST API?"
→ [0.12, -0.44, 0.87, 0.03, ...]

Each number captures a tiny part of the meaning learned by the embedding model.

So when two texts have similar vectors, it usually means they are semantically close.

"How to secure a REST API?"
"How to protect an endpoint?"

Different words. Similar meaning. Close vectors.

That is why vector search is useful: it helps the application search by meaning, not only by exact keywords.

🔸 WHAT IS PGVECTOR?

When you transform text into an embedding, you get a vector:

"Spring Security OAuth2 tutorial"
→ [0.12, -0.44, 0.87, ...]

pgvector allows PostgreSQL to store that vector and ask questions like:

SELECT *
FROM documents
ORDER BY embedding <=> '[0.12, -0.44, 0.87]'
LIMIT 5;

The operator <=> means cosine distance.

So instead of asking:

“Does this text contain the exact keyword Spring?”

you can ask:

“What content is semantically close to this question?”

🔸 DATABASE SETUP

A minimal PostgreSQL setup can look like this:

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    embedding VECTOR(1536)
);

CREATE INDEX documents_embedding_idx
ON documents
USING hnsw (embedding vector_cosine_ops);

Important detail: 1536 depends on the embedding model you use.

Do not copy/paste that number blindly. Your database dimension must match your embedding model dimension.

🔸 SPRING AI DEPENDENCY

With Spring AI, you can use pgvector through the VectorStore abstraction:

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-vector-store-pgvector</artifactId>
</dependency>

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-starter-model-openai</artifactId>
</dependency>

The first dependency connects Spring AI to pgvector. The second one gives you an embedding model.

You can also use another embedding provider.

🔸 APPLICATION CONFIGURATION

Example with Spring Boot:

spring:
  datasource:
    url: jdbc:postgresql://localhost:5432/postgres
    username: postgres
    password: postgres

  ai:
    openai:
      api-key: ${OPENAI_API_KEY}

    vectorstore:
      pgvector:
        initialize-schema: true
        index-type: HNSW
        distance-type: COSINE_DISTANCE
        dimensions: 1536

For a POC, this is enough to start.

For production, you need to think about:

  1. ▪️ embedding model stability
  2. ▪️ schema migrations
  3. ▪️ index strategy
  4. ▪️ metadata filtering
  5. ▪️ cost of embedding generation
  6. ▪️ security of stored content

🔸 ADD DOCUMENTS FROM JAVA

Once configured, Spring AI lets you inject VectorStore:

@Service
public class KnowledgeBaseService {

    private final VectorStore vectorStore;

    public KnowledgeBaseService(VectorStore vectorStore) {
        this.vectorStore = vectorStore;
    }

    public void addContent() {
        List<Document> documents = List.of(
            new Document(
                "Spring Security protects endpoints with authentication and authorization.",
                Map.of("topic", "security")
            ),
            new Document(
                "pgvector stores embeddings directly inside PostgreSQL.",
                Map.of("topic", "database")
            )
        );

        vectorStore.add(documents);
    }
}

Behind the scenes:

  1. ▪️ the text is transformed into embeddings
  2. ▪️ the embeddings are stored in PostgreSQL
  3. ▪️ the metadata can be used later for filtering

🔸 SEARCH BY MEANING

Now you can search semantically:

public List<Document> search(String question) {
    return vectorStore.similaritySearch(
        SearchRequest.builder()
            .query(question)
            .topK(5)
            .similarityThreshold(0.75)
            .build()
    );
}

Example:

Question:
"How do I secure an API endpoint?"

The result may return content about:

authentication
authorization
Spring Security
OAuth2
JWT

Even if the exact words are not all present.

That is the real value.

Not keyword search. Semantic search.

🔸 SIMPLE REST ENDPOINT

A very small endpoint could look like this:

@RestController
@RequestMapping("/knowledge")
public class KnowledgeController {

    private final KnowledgeBaseService service;

    public KnowledgeController(KnowledgeBaseService service) {
        this.service = service;
    }

    @GetMapping("/search")
    public List<String> search(@RequestParam String q) {
        return service.search(q).stream()
            .map(Document::getText)
            .toList();
    }
}

Call it like this:

GET /knowledge/search?q=How do I protect a REST API?

And you get the closest documents according to meaning.

🔸 WHY JAVA DEVELOPERS SHOULD CARE

pgvector is interesting because it does not force you to leave the PostgreSQL world immediately.

You keep:

  1. ▪️ SQL
  2. ▪️ transactions
  3. ▪️ backups
  4. ▪️ joins
  5. ▪️ metadata
  6. ▪️ existing operational knowledge

And you add:

  1. ▪️ embeddings
  2. ▪️ similarity search
  3. ▪️ semantic retrieval
  4. ▪️ a foundation for RAG

🔸 TAKEAWAYS

  1. ▪️ pgvector adds vector search to PostgreSQL
  2. ▪️ embeddings are numbers representing semantic meaning
  3. ▪️ Spring AI provides a clean VectorStore abstraction
  4. ▪️ Java apps can add and search documents with very little code
  5. ▪️ the embedding dimension must match the model
  6. ▪️ pgvector is a great POC/default choice before adding a dedicated vector database
  7. ▪️ for serious production usage, indexing, filtering, data privacy, and embedding lifecycle matter a lot

pgvector is not “AI in the database”.

It is more pragmatic than that:

PostgreSQL keeps your data. pgvector helps your app find meaning inside it. 🧠

#Java #SpringBoot #SpringAI #PostgreSQL #pgvector #VectorDatabase #RAG #AIEngineering #BackendDevelopment #SoftwareArchitecture

Go further with Java certification:

Java👇

Spring👇

SpringBook👇

JavaBook👇