Algorithm Analysis
Exploring how the Gale-Shapley algorithm for stable matching optimizes resource allocation and pairing in Amazon's vast ecosystem of products, services, and logistics.
The Stable Matching Problem is a fundamental algorithm in combinatorial optimization that addresses the challenge of finding a stable matching between two equally sized sets of elements, each with their own preferences.
A matching is considered "stable" when there are no two elements that would prefer each other over their current partners.
// C++ implementation of the Gale-Shapley algorithm
#include <iostream>
#include <vector>
#include <unordered_map>
#include <queue>
class StableMatching {
private:
std::vector<std::vector<int>> proposerPrefs;
std::vector<std::vector<int>> receiverPrefs;
std::vector<std::vector<int>> receiverRanking;
int n; // number of elements in each set
public:
StableMatching(const std::vector<std::vector<int>>& propPrefs,
const std::vector<std::vector<int>>& recvPrefs) {
proposerPrefs = propPrefs;
receiverPrefs = recvPrefs;
n = proposerPrefs.size();
// Precompute rankings for O(1) preference comparisons
receiverRanking.resize(n, std::vector<int>(n));
for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
receiverRanking[i][receiverPrefs[i][j]] = j;
}
}
}
std::vector<int> findStableMatching() {
std::vector<int> matches(n, -1); // Stores receiver->proposer matches
std::vector<int> nextProposal(n, 0); // Next preference to propose to
std::queue<int> freeProposers;
// Initially, all proposers are free
for (int i = 0; i < n; i++) {
freeProposers.push(i);
}
// While there are free proposers
while (!freeProposers.empty()) {
int proposer = freeProposers.front();
freeProposers.pop();
// Get next preferred receiver for this proposer
int receiver = proposerPrefs[proposer][nextProposal[proposer]++];
// If receiver is free, match them
if (matches[receiver] == -1) {
matches[receiver] = proposer;
}
// If receiver prefers new proposer over current match
else if (receiverRanking[receiver][proposer] <
receiverRanking[receiver][matches[receiver]]) {
// Current match becomes free again
freeProposers.push(matches[receiver]);
matches[receiver] = proposer;
}
// If receiver rejects, proposer remains free
else {
freeProposers.push(proposer);
}
}
return matches;
}
};
Amazon uses stable matching principles to optimize the assignment of products to warehouse locations based on complex variables:
Amazon Web Services applies stable matching concepts for cloud resource allocation:
Where n is the number of participants in each set
Improvement in storage utilization
Reduction in delivery time variance
Algorithm Analysis
Exploring how Depth-First Search and Breadth-First Search algorithms power critical pathfinding, optimization, and discovery operations across Amazon's vast digital and physical infrastructure.
Graph traversal algorithms are fundamental to solving complex problems involving connected structures. Amazon uses these algorithms extensively to optimize operations across its global network.
DFS explores as far as possible along each branch before backtracking, using a stack data structure (or recursion).
BFS explores all neighbors at the present depth before moving to vertices at the next depth level, using a queue data structure.
// C++ implementations of DFS and BFS
#include <iostream>
#include <vector>
#include <queue>
#include <unordered_set>
class GraphTraversal {
private:
std::vector<std::vector<int>> adjacencyList;
int vertices;
public:
GraphTraversal(int v) {
vertices = v;
adjacencyList.resize(v);
}
void addEdge(int src, int dest) {
adjacencyList[src].push_back(dest);
}
// Depth-First Search implementation
void DFS(int startVertex) {
std::vector<bool> visited(vertices, false);
DFSUtil(startVertex, visited);
}
void DFSUtil(int vertex, std::vector<bool>& visited) {
// Mark current node as visited
visited[vertex] = true;
std::cout << vertex << " ";
// Recursively visit all adjacent vertices
for (int adjacent : adjacencyList[vertex]) {
if (!visited[adjacent]) {
DFSUtil(adjacent, visited);
}
}
}
// Breadth-First Search implementation
void BFS(int startVertex) {
// Mark all vertices as not visited
std::vector<bool> visited(vertices, false);
// Create a queue for BFS
std::queue<int> queue;
// Mark the current node as visited and enqueue it
visited[startVertex] = true;
queue.push(startVertex);
while (!queue.empty()) {
// Dequeue a vertex and print it
startVertex = queue.front();
std::cout << startVertex << " ";
queue.pop();
// Get all adjacent vertices of the dequeued vertex
for (int adjacent : adjacencyList[startVertex]) {
// If adjacent is not visited, mark it visited and enqueue it
if (!visited[adjacent]) {
visited[adjacent] = true;
queue.push(adjacent);
}
}
}
}
};
Amazon uses tree traversal algorithms to optimize product browsing and search:
Amazon's automated fulfillment centers rely on graph algorithms for robot navigation:
Amazon's global supply chain leverages graph traversal for logistics:
Amazon's "Customers who bought this also bought" feature:
For both BFS/DFS where V = vertices, E = edges
Increase in picking efficiency using BFS pathfinding
Improvement in category navigation using hybrid approach
Algorithm Analysis
Exploring how Dijkstra's algorithm, Bellman-Ford, and other shortest path algorithms form the backbone of Amazon's global logistics network, enabling efficient delivery routing, warehouse navigation, and optimizing the entire supply chain.
Shortest path algorithms solve the fundamental problem of finding the most efficient path between two points in a weighted graph. Amazon leverages these algorithms extensively to optimize its vast logistics and delivery operations, saving millions in operational costs.
Dijkstra's algorithm finds the shortest path from a source node to all other nodes in a graph with non-negative edge weights, making it ideal for road networks and delivery planning.
Bellman-Ford handles graphs with negative edge weights and detects negative cycles, making it suitable for more complex routing problems with varying constraints.
// C++ implementation of Dijkstra's algorithm
#include <iostream>
#include <vector>
#include <queue>
#include <limits>
class ShortestPath {
private:
int vertices;
std::vector<std::vector<std::pair<int, int>>> adjacencyList;
public:
ShortestPath(int v) {
vertices = v;
adjacencyList.resize(v);
}
void addEdge(int u, int v, int weight) {
adjacencyList[u].push_back({v, weight});
adjacencyList[v].push_back({u, weight}); // For undirected graph
}
std::vector<int> dijkstra(int source) {
// Min-heap priority queue to extract vertex with minimum distance
std::priority_queue<std::pair<int, int>,
std::vector<std::pair<int, int>>,
std::greater<std::pair<int, int>>> pq;
// Vector to store distances from source to all vertices
std::vector<int> distance(vertices, std::numeric_limits<int>::max());
// Distance from source to itself is 0
distance[source] = 0;
pq.push({0, source});
while (!pq.empty()) {
int dist = pq.top().first;
int u = pq.top().second;
pq.pop();
// Skip if we've already found a better path
if (dist > distance[u]) continue;
// Examine all neighbors of current vertex
for (auto edge : adjacencyList[u]) {
int v = edge.first;
int weight = edge.second;
// If we found a shorter path to v through u
if (distance[u] + weight < distance[v]) {
distance[v] = distance[u] + weight;
pq.push({distance[v], v});
}
}
}
return distance;
}
// Bellman-Ford algorithm implementation
std::pair<std::vector<int>, bool> bellmanFord(int source) {
std::vector<int> distance(vertices, std::numeric_limits<int>::max());
distance[source] = 0;
// Relax all edges V-1 times
for (int i = 0; i < vertices - 1; i++) {
for (int u = 0; u < vertices; u++) {
for (auto edge : adjacencyList[u]) {
int v = edge.first;
int weight = edge.second;
if (distance[u] != std::numeric_limits<int>::max() &&
distance[u] + weight < distance[v]) {
distance[v] = distance[u] + weight;
}
}
}
}
// Check for negative weight cycles
bool hasNegativeCycle = false;
for (int u = 0; u < vertices; u++) {
for (auto edge : adjacencyList[u]) {
int v = edge.first;
int weight = edge.second;
if (distance[u] != std::numeric_limits<int>::max() &&
distance[u] + weight < distance[v]) {
hasNegativeCycle = true;
break;
}
}
}
return {distance, hasNegativeCycle};
}
};
Amazon uses sophisticated shortest path algorithms to optimize delivery routes for its fleet of vehicles:
Inside Amazon's massive fulfillment centers, shortest path algorithms guide both human pickers and robots:
Amazon uses shortest path algorithms at the macro level to optimize its entire distribution network:
Increase in packages delivered per driver-hour
Annual reduction in fuel costs from optimized routes
Decrease in carbon emissions per package delivered
Algorithm Analysis
Exploring how advanced range query algorithms power Amazon's database systems, inventory management, pricing strategies, and recommendation engines, enabling efficient processing of billions of queries across petabytes of data.
Range queries involve retrieving data within specific intervals or boundaries. Amazon's scale demands specialized algorithms that can efficiently search, filter, and analyze massive datasets without examining every record. These algorithms create optimized data structures that support logarithmic or better time complexity for range operations.
Segment trees provide efficient solutions for range queries and updates through a hierarchical tree structure that partitions the data into segments.
R-Trees organize multi-dimensional data for efficient spatial range queries, making them ideal for geographical and product attribute searches.
// C++ implementation of a Segment Tree for range sum queries
#include <iostream>
#include <vector>
#include <cmath>
class RangeQueryProcessor {
private:
std::vector<int> segmentTree;
int n;
// Build segment tree from input array
void buildSegmentTree(const std::vector<int>& arr, int node, int start, int end) {
if (start == end) {
// Leaf node will have a single element
segmentTree[node] = arr[start];
return;
}
int mid = start + (end - start) / 2;
// Recursively build left and right children
buildSegmentTree(arr, 2*node+1, start, mid);
buildSegmentTree(arr, 2*node+2, mid+1, end);
// Internal node will hold the sum of both of its children
segmentTree[node] = segmentTree[2*node+1] + segmentTree[2*node+2];
}
// Helper function to perform range sum query
int rangeQuerySum(int node, int start, int end, int left, int right) {
if (left > end || right < start) {
// Range represented by a node is completely outside the queried range
return 0;
}
if (left <= start && right >= end) {
// Range represented by a node is completely inside the queried range
return segmentTree[node];
}
// Range represented by a node is partially inside and partially outside the queried range
int mid = start + (end - start) / 2;
int leftSum = rangeQuerySum(2*node+1, start, mid, left, right);
int rightSum = rangeQuerySum(2*node+2, mid+1, end, left, right);
return leftSum + rightSum;
}
// Helper function to update a value
void updateValue(int node, int start, int end, int idx, int val) {
if (start == end) {
// Leaf node
segmentTree[node] = val;
return;
}
int mid = start + (end - start) / 2;
if (idx <= mid) {
// If idx is in the left child, recurse on the left child
updateValue(2*node+1, start, mid, idx, val);
} else {
// If idx is in the right child, recurse on the right child
updateValue(2*node+2, mid+1, end, idx, val);
}
// Update current node based on children
segmentTree[node] = segmentTree[2*node+1] + segmentTree[2*node+2];
}
public:
RangeQueryProcessor(const std::vector<int>& arr) {
n = arr.size();
// Height of segment tree
int height = (int)(ceil(log2(n)));
// Maximum size of segment tree
int maxSize = 2 * (int)pow(2, height) - 1;
// Allocate memory for segment tree
segmentTree.resize(maxSize, 0);
// Build the segment tree
buildSegmentTree(arr, 0, 0, n-1);
}
// Function to get sum of range [left, right]
int queryRange(int left, int right) {
if (left < 0 || right >= n || left > right) {
std::cout << "Invalid range" << std::endl;
return -1;
}
return rangeQuerySum(0, 0, n-1, left, right);
}
// Function to update a value at position idx
void update(int idx, int val) {
if (idx < 0 || idx >= n) {
std::cout << "Invalid position" << std::endl;
return;
}
updateValue(0, 0, n-1, idx, val);
}
// Simulate Amazon's parallel range query processing
int parallelRangeQuery(int left, int right, int numThreads) {
if (left < 0 || right >= n || left > right) {
return -1;
}
// Range size to process
int rangeSize = right - left + 1;
// Simple case: not worth parallelizing
if (rangeSize <= 1000 || numThreads <= 1) {
return rangeQuerySum(0, 0, n-1, left, right);
}
// Divide range into equal subranges for parallel processing
int subRangeSize = rangeSize / numThreads;
std::vector<int> results(numThreads, 0);
// In a real implementation, these would be processed in parallel
// Here we simulate the parallel processing
for (int i = 0; i < numThreads; i++) {
int subLeft = left + i * subRangeSize;
int subRight = (i == numThreads - 1) ? right : subLeft + subRangeSize - 1;
results[i] = rangeQuerySum(0, 0, n-1, subLeft, subRight);
}
// Combine results from all threads
int totalSum = 0;
for (int result : results) {
totalSum += result;
}
return totalSum;
}
};
Amazon's DynamoDB uses specialized range query algorithms to deliver consistent single-digit millisecond performance:
Amazon's competitive pricing system uses range query optimization to analyze market positions:
Amazon's fulfillment centers leverage advanced range query algorithms for inventory tracking:
Amazon's product search engine relies on range queries to power the shopping experience:
Percentage of range queries completed in under 50ms
Reduction in compute resources compared to brute-force scanning
Annual savings from optimized query processing
Algorithm Analysis
Exploring how Amazon adapts PageRank algorithms and web crawling technologies to power its product search engine, recommendations, and marketplace intelligence systems, organizing billions of products and customer interactions.
Originally developed by Google's founders, PageRank revolutionized search by ranking web pages based on their link structure. Amazon has adapted these principles to its e-commerce domain, creating sophisticated ranking systems that determine product visibility, recommendations, and marketplace dynamics.
PageRank assigns importance to items in a network based on the quantity and quality of connections pointing to them, using recursive probability distribution.
Crawling systematically explores and indexes information by following connections between items, building a comprehensive knowledge graph for searching.
// C++ implementation of Amazon-style ProductRank algorithm
#include <iostream>
#include <vector>
#include <unordered_map>
#include <cmath>
class ProductRank {
private:
int numProducts;
double dampingFactor;
double convergenceThreshold;
int maxIterations;
// Graph representation: outgoing edges from each product
std::vector<std::vector<std::pair<int, double>>> productGraph;
// Reverse graph for incoming connections
std::vector<std::vector<std::pair<int, double>>> reverseGraph;
// Additional product metadata
std::vector<double> conversionRate;
std::vector<double> customerRating;
std::vector<double> recency;
public:
ProductRank(int n, double d = 0.85, double thresh = 1e-8, int maxIter = 100)
: numProducts(n), dampingFactor(d), convergenceThreshold(thresh), maxIterations(maxIter) {
productGraph.resize(numProducts);
reverseGraph.resize(numProducts);
conversionRate.resize(numProducts, 1.0);
customerRating.resize(numProducts, 3.0); // Default 3-star rating
recency.resize(numProducts, 1.0);
}
void addRelationship(int fromProduct, int toProduct, double weight) {
productGraph[fromProduct].push_back({toProduct, weight});
reverseGraph[toProduct].push_back({fromProduct, weight});
}
void setProductMetrics(int productId, double conversion,
double rating, double rec) {
conversionRate[productId] = conversion;
customerRating[productId] = rating;
recency[productId] = rec;
}
std::vector<double> computeProductRank() {
// Initialize ranks
std::vector<double> ranks(numProducts, 1.0 / numProducts);
std::vector<double> newRanks(numProducts, 0.0);
double diff = 1.0;
int iterations = 0;
// Iterative computation until convergence
while (diff > convergenceThreshold && iterations < maxIterations) {
// Reset new ranks
for (int i = 0; i < numProducts; i++) {
newRanks[i] = 0.0;
}
// Compute contribution from each product
for (int i = 0; i < numProducts; i++) {
// Sum of outgoing edge weights
double totalWeight = 0.0;
for (const auto& edge : productGraph[i]) {
totalWeight += edge.second;
}
// Distribute rank to connected products
if (totalWeight > 0) {
for (const auto& edge : productGraph[i]) {
int toProduct = edge.first;
double weight = edge.second;
newRanks[toProduct] += ranks[i] * weight / totalWeight;
}
}
}
// Apply damping factor
double leaked = 0.0;
for (int i = 0; i < numProducts; i++) {
newRanks[i] = dampingFactor * newRanks[i];
leaked += (1.0 - dampingFactor) / numProducts;
}
// Redistribute leaked probability
for (int i = 0; i < numProducts; i++) {
newRanks[i] += leaked;
}
// Check for convergence
diff = 0.0;
for (int i = 0; i < numProducts; i++) {
diff += std::abs(newRanks[i] - ranks[i]);
}
diff /= numProducts;
// Update ranks
ranks.swap(newRanks);
iterations++;
}
// Combine with other ranking signals
for (int i = 0; i < numProducts; i++) {
double qualityScore = 0.5 * customerRating[i] / 5.0;
double performanceScore = 0.3 * conversionRate[i];
double freshnessScore = 0.2 * recency[i];
// Final score is a weighted combination
ranks[i] = 0.6 * ranks[i] + 0.4 * (qualityScore + performanceScore + freshnessScore);
}
return ranks;
}
// Simple crawler simulation for product catalog
void crawlProductCatalog(int startProductId, int maxProducts) {
std::queue<int> frontier;
std::unordered_set<int> visited;
frontier.push(startProductId);
visited.insert(startProductId);
while (!frontier.empty() && visited.size() < maxProducts) {
int currentProduct = frontier.front();
frontier.pop();
// Process product (extract metadata, update index, etc.)
// Add related products to frontier
for (const auto& relation : productGraph[currentProduct]) {
int relatedProduct = relation.first;
if (visited.find(relatedProduct) == visited.end()) {
frontier.push(relatedProduct);
visited.insert(relatedProduct);
}
}
}
}
};
Amazon's A9 search engine uses PageRank-inspired algorithms to rank products in search results:
Amazon deploys sophisticated web crawlers to monitor the e-commerce landscape:
The "Customers who bought this also bought" feature combines collaborative filtering with PageRank concepts:
Third-party sellers benefit from PageRank-inspired seller rankings and market analysis:
Improvement in click-through rate on search results
Annual revenue directly attributed to recommendations
Percentage of active catalog analyzed daily by crawlers
Algorithm Analysis
Exploring how Amazon leverages Best First Search algorithms, A* variants, and efficient heap data structures to power its personalized recommendation systems, delivering tailored product suggestions that drive customer engagement and increase revenue across billions of interactions daily.
Amazon's recommendation systems face a computational challenge: finding the most relevant products from a catalog of over 350 million items in milliseconds. Best First Search algorithms and efficient heap implementations allow Amazon to intelligently navigate this vast product space, prioritizing the most promising recommendations without exhaustively evaluating every possibility.
Best First Search is a search algorithm that explores a graph by expanding the most promising node according to a specified evaluation function. A* extends this by combining actual cost and estimated cost to goal.
Heaps provide efficient priority queue operations, crucial for maintaining and retrieving the highest-priority product recommendations in real-time.
// C++ implementation of Amazon-style recommendation system using Best First Search and heaps
#include <iostream>
#include <vector>
#include <queue>
#include <unordered_map>
#include <unordered_set>
#include <algorithm>
class ProductRecommender {
private:
struct Product {
int id;
std::string category;
double price;
double rating;
int salesRank;
std::vector<int> relatedProducts;
};
struct CustomerProfile {
int id;
std::unordered_map<std::string, double> categoryPreferences;
std::vector<int> purchaseHistory;
std::vector<int> viewHistory;
double averagePurchasePrice;
};
struct RecommendationCandidate {
int productId;
double gScore; // Known relevance based on history
double hScore; // Estimated future interest
double fScore; // Combined score (g + h)
bool operator<(const RecommendationCandidate& other) const {
return fScore < other.fScore; // For max-heap (highest priority first)
}
};
std::unordered_map<int, Product> productCatalog;
std::unordered_map<int, CustomerProfile> customerProfiles;
// Collaborative filtering model (simplified)
std::unordered_map<int, std::unordered_map<int, double>> productSimilarities;
// Cache for computed recommendations
std::unordered_map<int, std::vector<int>> recommendationCache;
// Calculate known relevance (g score) based on customer history
double calculateGScore(int customerId, int productId) {
const CustomerProfile& profile = customerProfiles[customerId];
const Product& product = productCatalog[productId];
double score = 0.0;
// Category preference match
if (profile.categoryPreferences.find(product.category) != profile.categoryPreferences.end()) {
score += profile.categoryPreferences.at(product.category) * 0.4;
}
// Price match (closer to average purchase price = higher score)
double priceDiff = std::abs(product.price - profile.averagePurchasePrice);
double priceMatch = std::max(0.0, 1.0 - priceDiff / profile.averagePurchasePrice);
score += priceMatch * 0.2;
// Rating influence
score += (product.rating / 5.0) * 0.3;
// Sales rank influence (normalized)
score += (1.0 / (1.0 + product.salesRank * 0.001)) * 0.1;
return score;
}
// Calculate heuristic (h score) based on predicted future interest
double calculateHScore(int customerId, int productId) {
const CustomerProfile& profile = customerProfiles[customerId];
double score = 0.0;
// Collaborative filtering signal
for (int purchasedProduct : profile.purchaseHistory) {
if (productSimilarities[purchasedProduct].find(productId) !=
productSimilarities[purchasedProduct].end()) {
score += productSimilarities[purchasedProduct][productId] * 0.6;
}
}
// Recency boost for products related to recently viewed items
if (!profile.viewHistory.empty()) {
for (int i = 0; i < std::min(5, (int)profile.viewHistory.size()); i++) {
int recentViewedProduct = profile.viewHistory[profile.viewHistory.size() - 1 - i];
if (productSimilarities[recentViewedProduct].find(productId) !=
productSimilarities[recentViewedProduct].end()) {
// Higher weight for more recently viewed products
double recencyWeight = 0.4 * (1.0 - i * 0.1);
score += productSimilarities[recentViewedProduct][productId] * recencyWeight;
}
}
}
return score;
}
public:
// Constructor and initialization methods omitted for brevity
// Generate personalized recommendations using Best First Search
std::vector<int> getRecommendations(int customerId, int k = 10, bool useCache = true) {
// Check cache first if enabled
if (useCache && recommendationCache.find(customerId) != recommendationCache.end()) {
return recommendationCache[customerId];
}
// Ensure customer exists
if (customerProfiles.find(customerId) == customerProfiles.end()) {
return {};
}
const CustomerProfile& profile = customerProfiles[customerId];
// Max heap priority queue for Best First Search
std::priority_queue<RecommendationCandidate> pq;
// Track visited products to avoid duplicates
std::unordered_set<int> visited;
// Track products in final recommendation list
std::unordered_set<int> recommended;
// Track category diversity
std::unordered_map<std::string, int> categoryCount;
// Seed the search with recent view history or purchase history
std::vector<int> seedProducts;
if (!profile.viewHistory.empty()) {
// Use last 3 viewed products as seeds
for (int i = 0; i < std::min(3, (int)profile.viewHistory.size()); i++) {
seedProducts.push_back(profile.viewHistory[profile.viewHistory.size() - 1 - i]);
}
} else if (!profile.purchaseHistory.empty()) {
// Use last 3 purchased products as seeds
for (int i = 0; i < std::min(3, (int)profile.purchaseHistory.size()); i++) {
seedProducts.push_back(profile.purchaseHistory[profile.purchaseHistory.size() - 1 - i]);
}
}
// Initialize search with seed products and their related products
for (int seedProduct : seedProducts) {
if (productCatalog.find(seedProduct) != productCatalog.end()) {
for (int relatedProduct : productCatalog[seedProduct].relatedProducts) {
if (visited.find(relatedProduct) == visited.end() &&
std::find(profile.purchaseHistory.begin(),
profile.purchaseHistory.end(),
relatedProduct) == profile.purchaseHistory.end()) {
double gScore = calculateGScore(customerId, relatedProduct);
double hScore = calculateHScore(customerId, relatedProduct);
RecommendationCandidate candidate;
candidate.productId = relatedProduct;
candidate.gScore = gScore;
candidate.hScore = hScore;
candidate.fScore = gScore + hScore;
pq.push(candidate);
visited.insert(relatedProduct);
}
}
}
}
// Best First Search to find top recommendations
std::vector<int> recommendations;
const int MAX_CATEGORIES_PER_RECOMMENDATION = 3;
const int MAX_CANDIDATES_TO_EXPLORE = 100; // Limit exploration for performance
int candidatesExplored = 0;
while (!pq.empty() && recommendations.size() < k && candidatesExplored < MAX_CANDIDATES_TO_EXPLORE) {
RecommendationCandidate current = pq.top();
pq.pop();
candidatesExplored++;
if (recommended.find(current.productId) != recommended.end()) {
continue; // Skip if already recommended
}
const Product& product = productCatalog[current.productId];
// Apply diversity constraints
if (categoryCount[product.category] >= MAX_CATEGORIES_PER_RECOMMENDATION) {
continue; // Skip to maintain category diversity
}
// Add to recommendations
recommendations.push_back(current.productId);
recommended.insert(current.productId);
categoryCount[product.category]++;
// Explore related products
for (int relatedProduct : product.relatedProducts) {
if (visited.find(relatedProduct) == visited.end() &&
recommended.find(relatedProduct) == recommended.end() &&
std::find(profile.purchaseHistory.begin(),
profile.purchaseHistory.end(),
relatedProduct) == profile.purchaseHistory.end()) {
double gScore = calculateGScore(customerId, relatedProduct);
double hScore = calculateHScore(customerId, relatedProduct);
RecommendationCandidate candidate;
candidate.productId = relatedProduct;
candidate.gScore = gScore;
candidate.hScore = hScore;
candidate.fScore = gScore + hScore;
pq.push(candidate);
visited.insert(relatedProduct);
}
}
}
// Update cache
recommendationCache[customerId] = recommendations;
return recommendations;
}
// Additional methods for updating similarities, customer profiles, etc.
};
Amazon's homepage uses Best First Search to create a tailored entry point for every customer:
Amazon's bundle recommendations leverage modified Best First Search with custom heuristics:
Amazon's marketing emails use heap-prioritized recommendations to maximize engagement:
Amazon's mobile app uses memory-efficient heap implementations for on-device recommendations:
Percentage of Amazon sales directly attributed to personalized recommendations
Reduction in processing time compared to exhaustive methods
Higher product browsing depth with personalized recommendations
Fibonacci heap implementation allows updates in O(1) amortized time when new user actions occur
Epsilon-greedy approach occasionally injects serendipitous recommendations to avoid filter bubbles
Heuristic functions continuously improve through machine learning from customer interactions
Dynamically adjusts search breadth based on available compute resources and request urgency
Algorithm Analysis
Exploring how specialized tree data structures power Amazon's product search engine, enabling lightning-fast lookups, auto-completion, and faceted navigation across hundreds of millions of products with millisecond response times.
Amazon's product catalog contains over 350 million items, with customers executing billions of search queries daily. Tree-based data structures are fundamental to Amazon's ability to deliver relevant results at scale, balancing search speed, memory efficiency, and result quality through specialized hierarchical organizations.
Tries optimize prefix-based operations, making them ideal for Amazon's search auto-completion and predictive search features.
B+ Trees provide optimized disk-based indexing for Amazon's catalog database, balancing search performance with storage efficiency.
// C++ implementation of Amazon-style product search using tries and B+ trees
#include <iostream>
#include <vector>
#include <unordered_map>
#include <string>
#include <memory>
#include <algorithm>
class ProductCatalogSearch {
private:
// Trie node structure for auto-completion
struct TrieNode {
bool isEndOfWord;
std::unordered_map<char, std::shared_ptr<TrieNode>> children;
std::vector<int> productIds; // Products associated with this prefix
int searchFrequency; // How often this prefix is searched
TrieNode() : isEndOfWord(false), searchFrequency(0) {}
};
// B+ Tree node structure for range-based searches
struct BPlusTreeNode {
bool isLeaf;
std::vector<double> keys; // e.g., price points
std::vector<std::shared_ptr<BPlusTreeNode>> children;
std::vector<std::vector<int>> productIds; // Only in leaf nodes
std::shared_ptr<BPlusTreeNode> nextLeaf; // For efficient range queries
BPlusTreeNode(bool leaf) : isLeaf(leaf) {}
};
// Category tree node for hierarchical navigation
struct CategoryNode {
int categoryId;
std::string categoryName;
std::vector<std::shared_ptr<CategoryNode>> subcategories;
std::vector<int> productIds;
std::unordered_map<std::string, std::vector<std::string>> attributes;
CategoryNode(int id, const std::string& name)
: categoryId(id), categoryName(name) {}
};
// Root nodes for our tree structures
std::shared_ptr<TrieNode> autoCompleteTrieRoot;
std::shared_ptr<TrieNode> spellCheckTrieRoot;
std::shared_ptr<BPlusTreeNode> priceIndexRoot;
std::shared_ptr<CategoryNode> categoryTreeRoot;
// Product database
std::unordered_map<int, std::unordered_map<std::string, std::string>> productDatabase;
// Insert a term into the auto-complete trie
void insertIntoTrie(
const std::string& term,
int productId,
std::shared_ptr<TrieNode> root) {
std::shared_ptr<TrieNode> current = root;
for (char c : term) {
// Convert to lowercase for case-insensitive search
char lc = std::tolower(c);
// Create node if it doesn't exist
if (current->children.find(lc) == current->children.end()) {
current->children[lc] = std::make_shared<TrieNode>();
}
current = current->children[lc];
// Add product to this prefix node
if (std::find(current->productIds.begin(), current->productIds.end(), productId)
== current->productIds.end()) {
current->productIds.push_back(productId);
}
}
// Mark end of word
current->isEndOfWord = true;
}
// Insert a product into the B+ tree by price
void insertIntoBPlusTree(double price, int productId) {
// B+ tree insertion logic
// 1. Traverse to appropriate leaf node
// 2. Insert key in sorted order
// 3. If node overflows, split and propagate up
// 4. Update parent pointers
// Implementation omitted for brevity
}
// Find products by prefix in trie
std::vector<int> findProductsByPrefix(
const std::string& prefix,
std::shared_ptr<TrieNode> root,
int limit = 10) {
std::shared_ptr<TrieNode> current = root;
// Traverse to prefix node
for (char c : prefix) {
char lc = std::tolower(c);
if (current->children.find(lc) == current->children.end()) {
return {}; // Prefix not found
}
current = current->children[lc];
}
// Collect products for this prefix
std::vector<int> result = current->productIds;
// Update search frequency for this prefix
current->searchFrequency++;
// Limit results
if (result.size() > limit) {
result.resize(limit);
}
return result;
}
// Find products in price range using B+ tree
std::vector<int> findProductsByPriceRange(double minPrice, double maxPrice) {
std::vector<int> result;
// B+ tree range query logic
// 1. Find leaf node containing minPrice
// 2. Collect all products within range
// 3. Follow leaf node pointers until maxPrice is exceeded
// Implementation omitted for brevity
return result;
}
// Find products by category traversal
std::vector<int> findProductsByCategory(int categoryId) {
// Find category node by ID
// Recursively collect products from this category and all subcategories
// Implementation omitted for brevity
return std::vector<int>();
}
// Generate spell check suggestions using edit distance in trie
std::vector<std::string> generateSpellingSuggestions(const std::string& query) {
// Trie-based spelling correction
// 1. Find nodes within edit distance threshold
// 2. Return complete words from those nodes
// 3. Sort by frequency and edit distance
// Implementation omitted for brevity
return std::vector<std::string>();
}
public:
ProductCatalogSearch() {
// Initialize tree roots
autoCompleteTrieRoot = std::make_shared<TrieNode>();
spellCheckTrieRoot = std::make_shared<TrieNode>();
priceIndexRoot = std::make_shared<BPlusTreeNode>(true);
categoryTreeRoot = std::make_shared<CategoryNode>(0, "Root");
}
// Add a product to all indices
void indexProduct(int productId, const std::unordered_map<std::string, std::string>& attributes) {
// Store in database
productDatabase[productId] = attributes;
// Add to tries for title, brand, etc.
if (attributes.find("title") != attributes.end()) {
insertIntoTrie(attributes.at("title"), productId, autoCompleteTrieRoot);
insertIntoTrie(attributes.at("title"), productId, spellCheckTrieRoot);
}
if (attributes.find("brand") != attributes.end()) {
insertIntoTrie(attributes.at("brand"), productId, autoCompleteTrieRoot);
}
// Add to B+ tree by price
if (attributes.find("price") != attributes.end()) {
try {
double price = std::stod(attributes.at("price"));
insertIntoBPlusTree(price, productId);
} catch (...) {
// Handle price conversion error
}
}
// Add to category tree
if (attributes.find("category_id") != attributes.end()) {
try {
int categoryId = std::stoi(attributes.at("category_id"));
// Logic to insert into proper category tree location
// Implementation omitted for brevity
} catch (...) {
// Handle category ID conversion error
}
}
}
// Comprehensive product search function
struct SearchResult {
std::vector<int> productIds;
std::vector<std::string> spellingSuggestions;
std::unordered_map<std::string, std::vector<std::string>> facets;
};
SearchResult search(const std::string& query,
double minPrice = 0.0,
double maxPrice = std::numeric_limits<double>::max(),
int categoryId = -1) {
SearchResult result;
// 1. Get products matching text query from trie
std::vector<int> textMatchProducts = findProductsByPrefix(query, autoCompleteTrieRoot, 1000);
// 2. Get products in price range from B+ tree
std::vector<int> priceRangeProducts = findProductsByPriceRange(minPrice, maxPrice);
// 3. Get products in category from category tree
std::vector<int> categoryProducts;
if (categoryId >= 0) {
categoryProducts = findProductsByCategory(categoryId);
}
// 4. Intersect results from different tree structures
// This is a simplified approach; in practice, Amazon uses more sophisticated ranking
std::unordered_set<int> productSet;
if (!query.empty()) {
productSet.insert(textMatchProducts.begin(), textMatchProducts.end());
}
if (minPrice > 0 || maxPrice < std::numeric_limits<double>::max()) {
if (productSet.empty()) {
productSet.insert(priceRangeProducts.begin(), priceRangeProducts.end());
} else {
// Filter existing results by price
std::unordered_set<int> priceSet(priceRangeProducts.begin(), priceRangeProducts.end());
for (auto it = productSet.begin(); it != productSet.end();) {
if (priceSet.find(*it) == priceSet.end()) {
it = productSet.erase(it);
} else {
++it;
}
}
}
}
if (categoryId >= 0) {
if (productSet.empty()) {
productSet.insert(categoryProducts.begin(), categoryProducts.end());
} else {
// Filter existing results by category
std::unordered_set<int> categorySet(categoryProducts.begin(), categoryProducts.end());
for (auto it = productSet.begin(); it != productSet.end();) {
if (categorySet.find(*it) == categorySet.end()) {
it = productSet.erase(it);
} else {
++it;
}
}
}
}
// 5. Generate spell check suggestions if results are sparse
if (productSet.size() < 5 && !query.empty()) {
result.spellingSuggestions = generateSpellingSuggestions(query);
}
// 6. Convert set to vector for final results
result.productIds.insert(result.productIds.end(), productSet.begin(), productSet.end());
// 7. Generate facets for navigation (simplified)
// In practice, Amazon computes facets based on the result set
// Implementation omitted for brevity
return result;
}
// Autocomplete suggestions based on prefix
std::vector<std::string> getAutocompleteSuggestions(const std::string& prefix, int limit = 10) {
// Find products matching the prefix
std::vector<int> matchingProducts = findProductsByPrefix(prefix, autoCompleteTrieRoot, 100);
// Extract and rank suggestions
std::unordered_map<std::string, int> suggestions;
for (int productId : matchingProducts) {
if (productDatabase.find(productId) != productDatabase.end() &&
productDatabase[productId].find("title") != productDatabase[productId].end()) {
std::string title = productDatabase[productId]["title"];
suggestions[title]++;
}
}
// Convert to vector for sorting
std::vector<std::pair<std::string, int>> suggestionPairs;
for (const auto& pair : suggestions) {
suggestionPairs.push_back(pair);
}
// Sort by frequency
std::sort(suggestionPairs.begin(), suggestionPairs.end(),
[](const std::pair<std::string, int>& a, const std::pair<std::string, int>& b) {
return a.second > b.second;
});
// Extract top suggestions
std::vector<std::string> result;
for (size_t i = 0; i < suggestionPairs.size() && i < limit; i++) {
result.push_back(suggestionPairs[i].first);
}
return result;
}
};
Amazon's lightning-fast search suggestions rely on optimized trie structures to predict user intent:
Amazon's extensive product taxonomy is powered by specialized tree structures:
Amazon's forgiving search experience relies on trie-based spelling correction:
Amazon's powerful filtering system uses specialized tree indices for fast constraint application:
Average response time for catalog search queries using tree-based indexing
Increase in conversion rate from intelligent auto-completion and spelling correction
Percentage of active catalog searchable within milliseconds using tree indices
Partitioning massive tree structures across server clusters based on prefix or category hierarchies
Specialized node layouts optimized for CPU cache performance, reducing memory access latency by 47%
Disk-optimized tree structures with incremental updates to maintain performance during catalog changes
Algorithm that anticipates tree traversal paths and preloads nodes before they're needed
Algorithm Analysis
Exploring how Amazon leverages topological sorting algorithms to manage complex dependencies across its microservices architecture, supply chain operations, and software deployment systems, enabling reliable execution of interdependent processes at unprecedented scale.
Topological sorting is a fundamental algorithm for ordering the vertices of a directed acyclic graph (DAG) such that for every directed edge u→v, vertex u comes before v in the ordering. This makes it the perfect algorithm for solving dependency resolution problems, where certain tasks must be completed before others can begin.
At Amazon's scale, with millions of interdependent services, products, and operations, efficiently resolving dependencies is critical for maintaining system reliability and operational efficiency.
Kahn's algorithm uses a breadth-first approach to topological sorting, incrementally removing nodes with no incoming edges.
This approach uses depth-first search to build the topological ordering in reverse, adding vertices to the result after all their dependencies have been processed.
// C++ implementation of dependency resolution using topological sort
#include <iostream>
#include <vector>
#include <queue>
#include <unordered_map>
#include <string>
class DependencyResolver {
private:
// Graph representation
std::unordered_map<std::string, std::vector<std::string>> adjacencyList;
std::unordered_map<std::string, int> inDegree;
std::vector<std::string> services;
public:
// Add a service to the dependency graph
void addService(const std::string& serviceName) {
if (adjacencyList.find(serviceName) == adjacencyList.end()) {
adjacencyList[serviceName] = std::vector<std::string>();
inDegree[serviceName] = 0;
services.push_back(serviceName);
}
}
// Add a dependency between services (dependent depends on dependency)
void addDependency(const std::string& dependency, const std::string& dependent) {
// Ensure both services exist
addService(dependency);
addService(dependent);
// Add the edge dependency -> dependent
adjacencyList[dependency].push_back(dependent);
// Increment in-degree of the dependent service
inDegree[dependent]++;
}
// Resolve dependencies using Kahn's topological sort algorithm
std::pair<std::vector<std::string>, bool> resolveDependencies() {
std::vector<std::string> result;
std::queue<std::string> zeroInDegree;
// Find all services with no dependencies (in-degree = 0)
for (const auto& service : services) {
if (inDegree[service] == 0) {
zeroInDegree.push(service);
}
}
// Process the services
while (!zeroInDegree.empty()) {
// Get the next service with no dependencies
std::string current = zeroInDegree.front();
zeroInDegree.pop();
// Add to result order
result.push_back(current);
// Process all services that depend on the current service
for (const auto& dependent : adjacencyList[current]) {
// Decrement in-degree of dependent services
inDegree[dependent]--;
// If all dependencies are satisfied, add to queue
if (inDegree[dependent] == 0) {
zeroInDegree.push(dependent);
}
}
}
// Check if all services were included in the result
bool isValidResolution = (result.size() == services.size());
return {result, isValidResolution};
}
// Find and report circular dependencies
std::vector<std::string> findCircularDependencies() {
std::vector<std::string> circularDependencies;
std::unordered_map<std::string, bool> visited;
std::unordered_map<std::string, bool> inStack;
for (const auto& service : services) {
if (!visited[service]) {
detectCycle(service, visited, inStack, circularDependencies);
}
}
return circularDependencies;
}
private:
bool detectCycle(const std::string& service,
std::unordered_map<std::string, bool>& visited,
std::unordered_map<std::string, bool>& inStack,
std::vector<std::string>& circularDependencies) {
visited[service] = true;
inStack[service] = true;
for (const auto& dependent : adjacencyList[service]) {
if (!visited[dependent]) {
if (detectCycle(dependent, visited, inStack, circularDependencies)) {
if (std::find(circularDependencies.begin(), circularDependencies.end(), service) == circularDependencies.end()) {
circularDependencies.push_back(service);
}
return true;
}
} else if (inStack[dependent]) {
// Cycle detected
if (std::find(circularDependencies.begin(), circularDependencies.end(), service) == circularDependencies.end()) {
circularDependencies.push_back(service);
}
return true;
}
}
inStack[service] = false;
return false;
}
};
Amazon's distributed architecture relies on topological sorting to manage service dependencies:
Amazon's infrastructure-as-code service uses topological sorting to manage resource dependencies:
Amazon's fulfillment network leverages topological sorting for complex logistics operations:
Amazon's internal build systems use topological sorting to optimize continuous integration/deployment:
First-time success rate for complex service deployments using topological ordering
Reduction in recovery time during outages through optimized service restart sequencing
Increase in build parallelism through intelligent dependency analysis
Grouping services into logical layers to simplify dependency management and increase system resilience
Runtime resolution of service dependencies based on system health, load, and availability zones
Proactive identification and refactoring of problematic circular dependencies before they cause production issues
Incorporating time-based constraints when resolving dependencies for time-sensitive operations
Algorithm Analysis
Exploring how Amazon leverages network flow algorithms like Ford-Fulkerson to optimize customer traffic distribution across its digital platforms, physical facilities, and service systems, ensuring optimal resource utilization while maintaining exceptional customer experience even during peak demand periods.
Amazon handles billions of customer interactions daily across its websites, mobile apps, fulfillment centers, and customer service channels. Network flow algorithms provide the mathematical foundation for intelligently distributing this massive traffic load, treating the flow of customers, orders, and requests as a network optimization problem.
The Ford-Fulkerson algorithm finds the maximum flow in a flow network by iteratively identifying augmenting paths and increasing flow until no more paths exist.
The Edmonds-Karp algorithm implements Ford-Fulkerson using breadth-first search to find the shortest augmenting path, improving efficiency for Amazon's large-scale systems.
// C++ implementation of Ford-Fulkerson algorithm for Amazon's traffic management
#include <iostream>
#include <vector>
#include <queue>
#include <limits>
#include <string>
#include <unordered_map>
class TrafficFlowManager {
private:
struct Node {
std::string id;
std::string type; // "entry", "server", "service", "fulfillment", etc.
int currentLoad;
int maxCapacity;
Node(std::string _id, std::string _type, int _maxCapacity)
: id(_id), type(_type), currentLoad(0), maxCapacity(_maxCapacity) {}
};
int numNodes;
std::vector<Node> nodes;
std::vector<std::vector<int>> capacity; // Capacity matrix
std::vector<std::vector<int>> flow; // Flow matrix
// Map node IDs to indices
std::unordered_map<std::string, int> nodeIndex;
// Source and sink indices
int source;
int sink;
public:
TrafficFlowManager() : numNodes(0), source(-1), sink(-1) {}
// Add a node to the network
void addNode(const std::string& id, const std::string& type, int maxCapacity) {
nodeIndex[id] = numNodes;
nodes.push_back(Node(id, type, maxCapacity));
// Resize matrices to include new node
capacity.resize(numNodes + 1, std::vector<int>(numNodes + 1, 0));
flow.resize(numNodes + 1, std::vector<int>(numNodes + 1, 0));
for (int i = 0; i < numNodes; i++) {
capacity[i].resize(numNodes + 1, 0);
flow[i].resize(numNodes + 1, 0);
}
numNodes++;
// Set source and sink if appropriate
if (type == "entry" && source == -1) {
source = numNodes - 1;
}
if (type == "service" && sink == -1) {
sink = numNodes - 1;
}
}
// Add a connection between nodes with a given capacity
void addConnection(const std::string& fromId, const std::string& toId, int connectionCapacity) {
int u = nodeIndex[fromId];
int v = nodeIndex[toId];
capacity[u][v] = connectionCapacity;
}
// Set custom source and sink nodes
void setSourceAndSink(const std::string& sourceId, const std::string& sinkId) {
source = nodeIndex[sourceId];
sink = nodeIndex[sinkId];
}
// BFS to find an augmenting path (Edmonds-Karp implementation)
bool bfs(std::vector<int>& parent) {
std::vector<bool> visited(numNodes, false);
std::queue<int> q;
q.push(source);
visited[source] = true;
parent[source] = -1;
while (!q.empty()) {
int u = q.front();
q.pop();
for (int v = 0; v < numNodes; v++) {
// If there's remaining capacity and not visited
if (!visited[v] && capacity[u][v] > flow[u][v]) {
q.push(v);
parent[v] = u;
visited[v] = true;
}
}
}
// If we reached sink in BFS
return visited[sink];
}
// Ford-Fulkerson algorithm with Edmonds-Karp path finding
int computeMaximumFlow() {
if (source == -1 || sink == -1) {
std::cerr << "Source or sink not defined" << std::endl;
return -1;
}
int maxFlow = 0;
std::vector<int> parent(numNodes);
// Augment flow while there is a path from source to sink
while (bfs(parent)) {
// Find the maximum flow through the augmenting path
int pathFlow = std::numeric_limits<int>::max();
// Find bottleneck capacity
for (int v = sink; v != source; v = parent[v]) {
int u = parent[v];
pathFlow = std::min(pathFlow, capacity[u][v] - flow[u][v]);
}
// Update residual capacities and reverse edges
for (int v = sink; v != source; v = parent[v]) {
int u = parent[v];
flow[u][v] += pathFlow;
flow[v][u] -= pathFlow; // For residual graph
}
maxFlow += pathFlow;
}
return maxFlow;
}
// Get current flow distribution
std::vector<std::pair<std::string, std::string>> getCurrentFlowDistribution() {
std::vector<std::pair<std::string, std::string>> flowDistribution;
for (int u = 0; u < numNodes; u++) {
for (int v = 0; v < numNodes; v++) {
if (flow[u][v] > 0) {
flowDistribution.push_back({nodes[u].id, nodes[v].id});
}
}
}
return flowDistribution;
}
// Adapt to changing traffic conditions
void updateCapacity(const std::string& fromId, const std::string& toId, int newCapacity) {
int u = nodeIndex[fromId];
int v = nodeIndex[toId];
capacity[u][v] = newCapacity;
// Ensure flow doesn't exceed new capacity
if (flow[u][v] > capacity[u][v]) {
flow[u][v] = capacity[u][v];
}
}
// Simulate traffic distribution during peak load
void simulatePeakTraffic(int expectedTraffic) {
int maxCapacity = computeMaximumFlow();
std::cout << "Maximum traffic capacity: " << maxCapacity << std::endl;
std::cout << "Expected peak traffic: " << expectedTraffic << std::endl;
if (expectedTraffic > maxCapacity) {
std::cout << "Warning: Expected traffic exceeds capacity by "
<< (expectedTraffic - maxCapacity) << " units" << std::endl;
// Implement overflow handling strategies
// ...
} else {
std::cout << "Network can handle expected traffic" << std::endl;
// Calculate utilization percentage
double utilization = (double)expectedTraffic / maxCapacity * 100.0;
std::cout << "Utilization: " << utilization << "%" << std::endl;
}
}
};
Amazon Web Services uses network flow algorithms to optimize global traffic distribution across its massive infrastructure:
Amazon's global website and app infrastructure leverages flow algorithms to handle massive traffic spikes:
Amazon's customer service system uses Ford-Fulkerson principles to optimally route customer inquiries:
Amazon's massive logistics network uses flow algorithms to manage order processing capacity:
Daily web requests processed with optimized flow distribution
Reduction in server requirements through optimized traffic flow
Customer requests processed without perceptible delay during peak periods
Incremental flow updates that adjust to changing traffic patterns without full recalculation
ML-enhanced flow models that anticipate traffic spikes before they occur and pre-allocate resources
Enhanced algorithms that balance throughput, latency, and cost considerations simultaneously
Nested flow networks that optimize at global, regional, and local levels for better scalability
Algorithm Analysis
Exploring how Amazon leverages Skip Lists to power efficient search indexing across its massive catalog, enabling blazing-fast lookups and range queries while maintaining exceptional flexibility for frequent updates in high-throughput environments.
Skip Lists are probabilistic data structures that provide an elegant alternative to balanced trees, offering similar performance characteristics with simpler implementation and maintenance. At Amazon's scale, these properties make Skip Lists particularly valuable for high-throughput search indexing where millions of updates occur alongside billions of queries daily.
The structure combines the simplicity of linked lists with logarithmic search time by maintaining multiple layers of linked lists, with each higher layer "skipping" over elements to accelerate search operations.
Skip Lists organize elements in a hierarchy of linked lists, allowing searches to skip unnecessary comparisons by traversing at higher levels first.
Skip Lists offer several key advantages that make them ideal for Amazon's dynamic search infrastructure.
// C++ implementation of a Skip List optimized for search indexing
#include <iostream>
#include <vector>
#include <random>
#include <string>
#include <memory>
#include <limits>
class SearchIndexSkipList {
private:
struct SkipNode {
std::string key; // Search term or product ID
std::string value; // Document reference or metadata
int score; // Relevance score for ranking
std::vector<SkipNode*> forward; // Pointers to nodes at each level
SkipNode(const std::string& k, const std::string& v, int s, int level)
: key(k), value(v), score(s), forward(level + 1, nullptr) {}
};
static constexpr int MAX_LEVEL = 16; // Maximum level for skip list
static constexpr float P = 0.5f; // Probability for level promotion
SkipNode* head; // Head of the skip list
int currentLevel; // Current maximum level in the skip list
int size; // Number of elements in the skip list
// Random level generator with p=0.5 probability
int randomLevel() {
static std::random_device rd;
static std::mt19937 gen(rd());
static std::uniform_real_distribution<float> dis(0.0f, 1.0f);
int level = 0;
while (dis(gen) < P && level < MAX_LEVEL - 1) {
level++;
}
return level;
}
public:
SearchIndexSkipList() : currentLevel(0), size(0) {
// Create head node with maximum level
head = new SkipNode("", "", std::numeric_limits<int>::min(), MAX_LEVEL);
}
~SearchIndexSkipList() {
// Clean up all nodes
SkipNode* current = head;
while (current) {
SkipNode* next = current->forward[0];
delete current;
current = next;
}
}
// Insert a key-value pair with relevance score
void insert(const std::string& key, const std::string& value, int score) {
std::vector<SkipNode*> update(MAX_LEVEL + 1, nullptr);
SkipNode* current = head;
// Find the position to insert at each level
for (int i = currentLevel; i >= 0; i--) {
while (current->forward[i] &&
(current->forward[i]->key < key ||
(current->forward[i]->key == key && current->forward[i]->score < score))) {
current = current->forward[i];
}
update[i] = current;
}
// Move to the next node at level 0
current = current->forward[0];
// If key exists with same score, update value
if (current && current->key == key && current->score == score) {
current->value = value;
return;
}
// Generate random level for new node
int newLevel = randomLevel();
// Update skip list's level if necessary
if (newLevel > currentLevel) {
for (int i = currentLevel + 1; i <= newLevel; i++) {
update[i] = head;
}
currentLevel = newLevel;
}
// Create and link the new node
SkipNode* newNode = new SkipNode(key, value, score, newLevel);
for (int i = 0; i <= newLevel; i++) {
newNode->forward[i] = update[i]->forward[i];
update[i]->forward[i] = newNode;
}
size++;
}
// Search for a specific key
std::pair<std::string, int> search(const std::string& key) {
SkipNode* current = head;
// Search from top level down
for (int i = currentLevel; i >= 0; i--) {
while (current->forward[i] && current->forward[i]->key < key) {
current = current->forward[i];
}
}
// Move to the potential match
current = current->forward[0];
// Check if we found the key
if (current && current->key == key) {
return {current->value, current->score};
}
return {"", -1}; // Not found
}
// Find top k results within a range
std::vector<std::pair<std::string, int>> rangeQuery(
const std::string& startKey,
const std::string& endKey,
int topK) {
std::vector<std::pair<std::string, int>> results;
SkipNode* current = head;
// Find the starting position
for (int i = currentLevel; i >= 0; i--) {
while (current->forward[i] && current->forward[i]->key < startKey) {
current = current->forward[i];
}
}
// Move to the first potential match
current = current->forward[0];
// Collect results within range
while (current && current->key <= endKey && results.size() < topK) {
results.push_back({current->value, current->score});
current = current->forward[0];
}
// Sort by score (descending)
std::sort(results.begin(), results.end(),
[](const std::pair<std::string, int>& a,
const std::pair<std::string, int>& b) {
return a.second > b.second;
});
return results;
}
// Prefix search implementation (for auto-complete)
std::vector<std::pair<std::string, int>> prefixSearch(const std::string& prefix, int limit) {
SkipNode* current = head;
std::vector<std::pair<std::string, int>> results;
// Find the first key >= prefix
for (int i = currentLevel; i >= 0; i--) {
while (current->forward[i] && current->forward[i]->key < prefix) {
current = current->forward[i];
}
}
current = current->forward[0];
// Collect all keys with the given prefix
while (current && results.size() < limit) {
// Check if current key has the prefix
if (current->key.size() >= prefix.size() &&
current->key.substr(0, prefix.size()) == prefix) {
results.push_back({current->key, current->score});
} else if (current->key.substr(0, prefix.size()) > prefix) {
// We've moved past potential matches
break;
}
current = current->forward[0];
}
// Sort by score (descending)
std::sort(results.begin(), results.end(),
[](const std::pair<std::string, int>& a,
const std::pair<std::string, int>& b) {
return a.second > b.second;
});
return results;
}
// Get current size of the skip list
int getSize() const {
return size;
}
};
Amazon's product catalog search utilizes Skip Lists to enable fast lookups and complex filtering:
Amazon's DynamoDB utilizes Skip List structures for its sorted index operations:
Amazon's monitoring and IoT systems use Skip Lists for time-series data management:
Amazon's recommendation engines leverage Skip Lists for ranked item retrieval:
Reduction in p99 latency for range-based catalog searches using Skip List indexes
Average time to update search indexes with new products or price changes
Improvement in RAM utilization compared to previous index structures
Specialized lock-free Skip List implementation enabling hundreds of thousands of concurrent operations per second
Memory-optimized variants that reduce pointer overhead through run-length encoding of skip pointers
Adaptive level distribution that optimizes skip list height based on actual access patterns
Disk-backed implementations providing durability while maintaining logarithmic access performance