AI  

Sentence Transformers: Architecture, Working Principles, and Practical Examples

Introduction

Traditional NLP systems treat text as sequences of words, making it difficult to capture semantic meaning. Sentence Transformers solve this problem by converting sentences, paragraphs, or documents into dense vector representations (embeddings) that preserve semantic relationships.

These embeddings allow machines to perform semantic search, clustering, similarity comparison, recommendation, and retrieval-augmented generation (RAG) efficiently.

What Is a Sentence Transformer?

A Sentence Transformer is a neural network model designed to convert sentences, paragraphs, or documents into dense numerical vectors (embeddings) that capture semantic meaning.

These embeddings allow machines to understand how similar two pieces of text are in meaning, not just in words.

Simple Example

Sentence 1: "How do I reset my password?"
Sentence 2: "What is the process to change my account password?" 

Though the words differ, their meanings are almost the same. A sentence transformer converts both into vectors that are very close in vector space.

How Sentence Transformers Work

Sentence Transformers are typically built using Transformer architectures such as:

  • BERT

  • RoBERTa

  • DistilBERT

  • MiniLM

Key Steps

  1. Tokenization

  2. Transformer Encoding

  3. Pooling (mean/max/CLS pooling)

  4. Fixed size embedding output

Example Output

"How do I reset my password?"
→ [0.021, -0.932, 0.118, ..., 0.441]

This vector can be stored, compared, indexed, and searched efficiently.

Common Use Cases

Use CaseDescription
Semantic SearchFind documents based on meaning
Document ClusteringGroup similar documents
ChatbotsMatch user queries to intents
Recommendation SystemsRecommend similar content
Duplicate DetectionIdentify near-duplicate text

Examples

Installing Sentence Transformers

pip install sentence-transformers 

The above command downloads Sentence Transformers python SDK framework from UKB Lab.

The library provides:

  • Sentence Transformer class

  • Utilities for encoding text

  • Training, fine-tuning, evaluation helpers

  • Pooling strategies (mean, CLS, max pooling, etc.)

It does NOT download any pretrained sentence embedding model. Models are fetched on demand, for example:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")

At this point:

  • The model is downloaded from Hugging Face Hub

  • Cached locally (usually in ~/.cache/huggingface/)

  • Reused automatically in future runs

Some popular embedding models are

OpenAI

  • text-embedding-3-small

  • text-embedding-3-large

Hugging Face / Open Source

  • sentence-transformers/all-MiniLM-L6-v2

  • BAAI/bge-large-en

  • intfloat/e5-large-v2

Local (Ollama)

  • nomic-embed-text

  • mxbai-embed-large

Basic Sentence Embedding Example

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-MiniLM-L6-v2")

sentences = [
"How do I reset my password?",
"What is the process to change my account password?",
"The weather is sunny today"
]
embeddings = model.encode(sentences)
print(embeddings.shape)
print(embeddings[0])

Output

(3, 384)
[ 0.0214, -0.1189, 0.4471, ... , -0.0321 ]

What happens internally:

  1. Text → tokens

  2. Tokens → transformer layers

  3. Contextual representations → pooled

  4. Final fixed-size vectors returned

Result:

  • One vector per sentence

  • Same length for all sentences

So, the output

  • 3- Number of sentences

  • 384-Size of each embedding vector