Linear Algebra
I always thought linear algebra was for game developers and data scientists—people who rotated 3D models or trained neural networks. Then I worked on a search feature that used embeddings, and I had to understand what "cosine similarity" meant and why two vectors being "close" corresponded to two documents being related. Then I built a recommendation system and realized that matrix factorization was the engine behind "users who liked X also liked Y." Linear algebra wasn't abstract math—it was the language for working with structured data at scale.
This post covers linear algebra through the lens of practical engineering: vectors, matrices, transformations, and the applications you'll actually encounter.
Vectors
A vector is an ordered list of numbers. That's it. No need for physics arrows or abstract spaces—a vector is a point in n-dimensional space, or equivalently, a direction with magnitude.
const userRating = [4, 5, 2, 0, 3] // ratings for 5 movies
const embedding = [0.23, -0.87, 0.45, 0.12] // a word's position in semantic space
const pixel = [255, 128, 0] // RGB colorEach of these is a vector. The number of elements is the vector's dimension. A 3D vector lives in 3D space. A 768-dimensional embedding lives in 768-dimensional space (we can't visualize it, but the math works the same).
Vector Operations
Addition: Add corresponding elements. Geometrically, it's placing one arrow at the tip of another.
const a = [1, 2, 3]
const b = [4, 5, 6]
const sum = a.map((val, i) => val + b[i]) // [5, 7, 9]Scalar multiplication: Multiply every element by a number. Scales the vector's magnitude.
const scaled = a.map((val) => val * 2) // [2, 4, 6]Magnitude (length): The Euclidean distance from the origin.
function magnitude(v) {
return Math.sqrt(v.reduce((sum, val) => sum + val * val, 0))
}
magnitude([3, 4]) // 5 (the 3-4-5 triangle)Unit vector (normalization): A vector with magnitude 1, pointing in the same direction. Used when you care about direction but not magnitude.
function normalize(v) {
const mag = magnitude(v)
return v.map((val) => val / mag)
}The Dot Product
The dot product of two vectors is the sum of their element-wise products.
function dot(a, b) {
return a.reduce((sum, val, i) => sum + val * b[i], 0)
}
dot([1, 2, 3], [4, 5, 6]) // 1×4 + 2×5 + 3×6 = 32The dot product has a geometric interpretation:
a · b = |a| × |b| × cos(θ)
Where θ is the angle between the vectors. This means:
- Parallel vectors (same direction): dot product is maximized (cos 0° = 1).
- Perpendicular vectors: dot product is 0 (cos 90° = 0).
- Opposite vectors: dot product is negative (cos 180° = -1).
Cosine Similarity
Cosine similarity is the dot product of two normalized vectors. It measures how similar two directions are, regardless of magnitude.
function cosineSimilarity(a, b) {
return dot(a, b) / (magnitude(a) * magnitude(b))
}
// Result ranges from -1 (opposite) to 1 (identical direction)
cosineSimilarity([1, 0], [0, 1]) // 0 — perpendicular, unrelated
cosineSimilarity([1, 1], [2, 2]) // 1 — same direction, identical
cosineSimilarity([1, 0], [-1, 0]) // -1 — oppositeThis is the foundation of semantic search. When you convert documents to embedding vectors (using models like OpenAI's text-embedding-ada-002), semantically similar documents have high cosine similarity. A search query is also converted to an embedding, and the closest document embeddings are returned as results.
const queryEmbedding = embed("how to deploy a Next.js app")
const docEmbeddings = documents.map((doc) => ({
doc,
similarity: cosineSimilarity(queryEmbedding, doc.embedding),
}))
const results = docEmbeddings
.sort((a, b) => b.similarity - a.similarity)
.slice(0, 10)This is how vector databases (Pinecone, Weaviate, pgvector) work under the hood: they store vectors and efficiently find the nearest neighbors by cosine similarity (or Euclidean distance).
Matrices
A matrix is a rectangular grid of numbers—a 2D array. An m×n matrix has m rows and n columns.
// A 2×3 matrix
const A = [
[1, 2, 3],
[4, 5, 6],
]A vector is a special case: a matrix with one column (column vector) or one row (row vector).
Matrix Multiplication
Matrix multiplication is the core operation in linear algebra. To multiply matrix A (m×n) by matrix B (n×p), each element of the result is a dot product of a row from A and a column from B.
function matMul(A, B) {
const rows = A.length
const cols = B[0].length
const inner = B.length
const result = Array.from({ length: rows }, () => new Array(cols).fill(0))
for (let i = 0; i < rows; i++) {
for (let j = 0; j < cols; j++) {
for (let k = 0; k < inner; k++) {
result[i][j] += A[i][k] * B[k][j]
}
}
}
return result
}Key property: Matrix multiplication is NOT commutative. A × B ≠ B × A in general. But it IS associative: (A × B) × C = A × (B × C).
Why it matters: Matrix multiplication is the computational bottleneck in neural networks, graphics rendering, and scientific computing. GPUs are essentially matrix multiplication accelerators—they can multiply large matrices orders of magnitude faster than CPUs because of their massively parallel architecture.
Transformations
Matrices represent linear transformations—operations that map vectors to new vectors while preserving straight lines and the origin.
2D Transformations
Every 2D transformation can be expressed as a 2×2 matrix (or 3×3 for translations using homogeneous coordinates):
Scaling:
[sx 0 ] × [x] = [sx × x]
[0 sy] [y] [sy × y]
const scale2x = [
[2, 0],
[0, 2],
]
// Doubles both x and y coordinatesRotation by angle θ:
[cos θ -sin θ] × [x] = [x cos θ - y sin θ]
[sin θ cos θ] [y] [x sin θ + y cos θ]
function rotationMatrix(degrees) {
const rad = (degrees * Math.PI) / 180
return [
[Math.cos(rad), -Math.sin(rad)],
[Math.sin(rad), Math.cos(rad)],
]
}Reflection across the x-axis:
[1 0] × [x] = [ x]
[0 -1] [y] [-y]
Why this matters for engineers: CSS transform uses matrix operations internally. transform: rotate(45deg) scale(2) computes the product of a rotation matrix and a scaling matrix. Canvas 2D and WebGL explicitly use transformation matrices. Three.js and every 3D engine represent object positions, rotations, and scales as 4×4 matrices.
Composing Transformations
The power of matrices is composition: applying multiple transformations in sequence is just multiplying their matrices.
First rotate 45°, then scale 2x:
Combined = Scale × Rotate
One matrix multiplication replaces any number of sequential transformations. This is why game engines and renderers pre-compute combined transformation matrices—applying one matrix per vertex is much faster than applying multiple transformations sequentially.
Eigenvalues and Eigenvectors
An eigenvector of a matrix A is a vector that, when multiplied by A, only changes in magnitude—not direction. The scaling factor is the eigenvalue.
A × v = λ × v
v is the eigenvector
λ (lambda) is the eigenvalue
Geometric intuition: Most vectors change direction when you apply a transformation. Eigenvectors are the special directions that remain unchanged—they just get stretched or shrunk.
Why this matters:
-
Principal Component Analysis (PCA): PCA finds the eigenvectors of the data's covariance matrix. The eigenvector with the largest eigenvalue is the direction of greatest variance in the data. This is used for dimensionality reduction—compressing 768-dimensional embeddings to 50 dimensions while preserving the most information.
-
Google's PageRank: PageRank models the web as a matrix where each link is a connection. The dominant eigenvector of this matrix gives the "importance" of each page. Google's original algorithm was essentially computing an eigenvector of the web graph.
-
Stability analysis: In control systems and simulations, eigenvalues determine whether a system is stable (eigenvalues < 1), oscillating, or diverging.
You don't need to compute eigenvalues by hand. But knowing that "finding the most important directions in data" is an eigenvalue problem helps you understand PCA, SVD, and many ML techniques at a conceptual level.
Practical Applications
Embeddings and Similarity Search
Modern AI represents text, images, and other data as high-dimensional vectors (embeddings). Similarity between items is measured by the distance between their vectors.
The embedding model compresses semantic meaning into a vector. "King" and "queen" are close. "King" and "banana" are far apart. Retrieval-Augmented Generation (RAG) systems use this: your documents are embedded and stored, queries are embedded and matched against them.
Recommendation Systems
Collaborative filtering via matrix factorization: represent users and items as a matrix of ratings (with many missing values). Factor this sparse matrix into two lower-rank matrices: one for users and one for items.
Ratings matrix (users × items):
Item1 Item2 Item3 Item4
User1 [ 5 3 ? 1 ]
User2 [ 4 ? ? 1 ]
User3 [ 1 1 5 4 ]
User4 [ ? 1 4 ? ]
≈ UserMatrix × ItemMatrix^T
User vectors and item vectors are in the same space.
User1's predicted rating for Item3 = dot(User1_vector, Item3_vector)
Netflix, Spotify, and Amazon all use variations of this. The "latent factors" discovered by factorization correspond to abstract qualities—genre preferences, tempo preferences, style preferences—that the model learns without being told about them.
3D Graphics and Game Engines
Every 3D object's position, rotation, and scale is a 4×4 transformation matrix. Rendering a scene means multiplying each vertex by a chain of matrices:
Model Matrix × View Matrix × Projection Matrix = Final Position
Model: object's position/rotation in the world
View: camera's position/orientation
Projection: perspective (things farther away appear smaller)
WebGL, Three.js, and Unity all expose these as matrix operations. Understanding them helps when debugging camera positioning, object transforms, or shader math.
Image Processing
Images are matrices of pixel values. A grayscale image is a 2D matrix. A color image is three 2D matrices (R, G, B channels) or equivalently a 3D tensor.
Convolution—the core operation in CNNs (Convolutional Neural Networks)—is a matrix operation: sliding a small filter matrix across the image matrix, computing dot products at each position. Edge detection, blurring, sharpening—all are convolutions with specific filter matrices.
Sharpen filter: Edge detection filter:
[ 0 -1 0] [-1 -1 -1]
[-1 5 -1] [-1 8 -1]
[ 0 -1 0] [-1 -1 -1]
Key Concepts Summary
| Concept | What it does | Where it shows up |
|---|---|---|
| Vector dot product | Measures alignment between vectors | Similarity search, recommendations |
| Cosine similarity | Direction-based similarity (ignores magnitude) | Embeddings, NLP, search |
| Matrix multiplication | Composes linear transformations | Neural networks, graphics, physics |
| Transformation matrices | Rotate, scale, translate objects | 3D graphics, CSS transforms, robotics |
| Eigenvalues/eigenvectors | Find principal directions in data | PCA, PageRank, stability analysis |
| Matrix factorization | Decompose into lower-rank components | Recommendations, compression, topic modeling |
| Convolution | Local pattern matching in matrices | Image processing, CNNs |
The Pragmatic Takeaway
Linear algebra is the language of structured, multi-dimensional data. Once you see that language, you see it everywhere: database tables are matrices, embeddings are vectors, similarity is a dot product, recommendations are factorizations, and rendering is matrix multiplication.
You don't need to solve systems of equations by hand or prove theorems about vector spaces. You need three things:
-
Geometric intuition for vectors: Two vectors pointing in the same direction are similar. Orthogonal vectors are unrelated. This is the foundation of semantic search and embeddings.
-
Understanding that matrices are transformations: A matrix takes a vector and maps it to a new vector. Multiplying matrices composes transformations. This is the foundation of graphics, neural networks, and data processing.
-
Awareness of the toolbox: When someone says "we used PCA to reduce dimensionality" or "the recommendation engine uses matrix factorization," you know what they mean at a conceptual level, even if you let numpy or TensorFlow handle the computation.
Linear algebra turns "magic" into machinery. The embedding model isn't doing something mysterious—it's mapping text to vectors in a space where geometry represents meaning. The recommendation system isn't using black magic—it's factoring a sparse matrix into dense representations. Understanding the math doesn't mean doing the math—it means having the right mental model to design, debug, and communicate.