Build a fast Auto Complete API

3 min readAug 16, 2024

Creating an efficient autocomplete API that can handle a million rows of customer data and return suggestions quickly is a challenging task. This blog post will walk you through setting up such an API using FastAPI, incorporating fuzzy matching, and optimizing data structures for fast retrieval.

Building an Autocomplete API with FastAPI

Prerequisites

Ensure you have the following installed:

Python 3.7+
FastAPI
Uvicorn
Pandas
RapidFuzz (for fuzzy matching)

You can install these packages using pip:

pip install fastapi uvicorn pandas rapidfuzz

Step 1: Setting Up the Data

Assume you have a CSV file containing customer IDs and names. We’ll load this data using Pandas and structure it for quick access.

import pandas as pd

# Load customer data
df = pd.read_csv('customers.csv')  # Assume 'customers.csv' has columns 'customer_id' and 'name'
customer_data = df.to_dict(orient='records')  # Convert to a list of dictionaries
customer_names = df['name'].tolist()  # List of customer names for fuzzy matching

Step 2: Implementing Fuzzy Matching

We’ll use RapidFuzz for fuzzy matching. It provides a fast and efficient way to find similar strings.

from rapidfuzz import process

def get_fuzzy_matches(query, choices, limit=10, score_cutoff=75):
    # Get matches with a score above the score_cutoff
    results = process.extract(query, choices, limit=limit, score_cutoff=score_cutoff)
    return results

Step 3: Creating the FastAPI Application

Now, let’s create the FastAPI application with an endpoint for autocomplete. This endpoint will return both the customer name and ID.

from fastapi import FastAPI, HTTPException
from typing import List, Dict

app = FastAPI()

@app.get("/autocomplete/", response_model=List[Dict[str, str]])
async def autocomplete(query: str):
    if not query:
        raise HTTPException(status_code=400, detail="Query parameter is required")
    
    matches = get_fuzzy_matches(query, customer_names)
    result = []
    for match, score in matches:
        customer = next((item for item in customer_data if item["name"] == match), None)
        if customer:
            result.append({"customer_id": customer["customer_id"], "name": customer["name"]})
    return result

Step 4: Running the API

You can run the FastAPI application using Uvicorn.

uvicorn autocomplete_api:app --reload

This command will start the server, and you can access the API at http://127.0.0.1:8000/autocomplete/?query=<customer_name>.

Step 5: Testing the API

You can test the API using a web browser or a tool like curl or Postman. For example:

curl "http://127.0.0.1:8000/autocomplete/?query=John"

This should return a list of dictionaries containing customer IDs and names that closely match “John”.

Optimizing Performance

Data Structures

Preprocessing: Preprocess the data to create a dictionary that maps customer names to their corresponding IDs. This allows for faster lookups when retrieving customer details.

customer_map = {customer['name']: customer['customer_id'] for customer in customer_data}

Efficient Search: Use efficient data structures like tries or prefix trees for fast prefix-based searches. However, for simplicity and given the use of fuzzy matching, RapidFuzz is suitable for this task.

Caching

Implement caching to store frequently accessed results. This can significantly reduce response times for common queries.

Pagination

Implement pagination to handle large result sets efficiently. Modify the limit parameter in the get_fuzzy_matches function and return paginated results.

Conclusion

Building an autocomplete API with FastAPI that incorporates fuzzy matching and returns customer IDs and names can greatly enhance user experience by providing relevant suggestions quickly. By structuring data efficiently and leveraging libraries like RapidFuzz, you can handle large datasets effectively. Further optimizations, such as using advanced data structures or integrating with databases like Elasticsearch, can enhance performance even more.

About — The GenAI POD — GenAI Experts

GenAIPOD is a specialized consulting team of VerticalServe, helping clients with GenAI Architecture, Implementations etc.

VerticalServe Inc — Niche Cloud, Data & AI/ML Premier Consulting Company, Partnered with Google Cloud, Confluent, AWS, Azure…50+ Customers and many success stories..

Website: http://www.VerticalServe.com

Contact: contact@verticalserve.com