Build a fast Auto Complete API
Creating an efficient autocomplete API that can handle a million rows of customer data and return suggestions quickly is a challenging task. This blog post will walk you through setting up such an API using FastAPI, incorporating fuzzy matching, and optimizing data structures for fast retrieval.
Building an Autocomplete API with FastAPI
Prerequisites
Ensure you have the following installed:
- Python 3.7+
- FastAPI
- Uvicorn
- Pandas
- RapidFuzz (for fuzzy matching)
You can install these packages using pip:
pip install fastapi uvicorn pandas rapidfuzz
Step 1: Setting Up the Data
Assume you have a CSV file containing customer IDs and names. We’ll load this data using Pandas and structure it for quick access.
import pandas as pd
# Load customer data
df = pd.read_csv('customers.csv') # Assume 'customers.csv' has columns 'customer_id' and 'name'
customer_data = df.to_dict(orient='records') # Convert to a list of dictionaries
customer_names = df['name'].tolist() # List of customer names for fuzzy matching
Step 2: Implementing Fuzzy Matching
We’ll use RapidFuzz for fuzzy matching. It provides a fast and efficient way to find similar strings.
from rapidfuzz import process
def get_fuzzy_matches(query, choices, limit=10, score_cutoff=75):
# Get matches with a score above the score_cutoff
results = process.extract(query, choices, limit=limit, score_cutoff=score_cutoff)
return results
Step 3: Creating the FastAPI Application
Now, let’s create the FastAPI application with an endpoint for autocomplete. This endpoint will return both the customer name and ID.
from fastapi import FastAPI, HTTPException
from typing import List, Dict
app = FastAPI()
@app.get("/autocomplete/", response_model=List[Dict[str, str]])
async def autocomplete(query: str):
if not query:
raise HTTPException(status_code=400, detail="Query parameter is required")
matches = get_fuzzy_matches(query, customer_names)
result = []
for match, score in matches:
customer = next((item for item in customer_data if item["name"] == match), None)
if customer:
result.append({"customer_id": customer["customer_id"], "name": customer["name"]})
return result
Step 4: Running the API
You can run the FastAPI application using Uvicorn.
uvicorn autocomplete_api:app --reload
This command will start the server, and you can access the API at http://127.0.0.1:8000/autocomplete/?query=<customer_name>
.
Step 5: Testing the API
You can test the API using a web browser or a tool like curl
or Postman. For example:
curl "http://127.0.0.1:8000/autocomplete/?query=John"
This should return a list of dictionaries containing customer IDs and names that closely match “John”.
Optimizing Performance
Data Structures
Preprocessing: Preprocess the data to create a dictionary that maps customer names to their corresponding IDs. This allows for faster lookups when retrieving customer details.
customer_map = {customer['name']: customer['customer_id'] for customer in customer_data}
Efficient Search: Use efficient data structures like tries or prefix trees for fast prefix-based searches. However, for simplicity and given the use of fuzzy matching, RapidFuzz is suitable for this task.
Caching
Implement caching to store frequently accessed results. This can significantly reduce response times for common queries.
Pagination
Implement pagination to handle large result sets efficiently. Modify the limit
parameter in the get_fuzzy_matches
function and return paginated results.
Conclusion
Building an autocomplete API with FastAPI that incorporates fuzzy matching and returns customer IDs and names can greatly enhance user experience by providing relevant suggestions quickly. By structuring data efficiently and leveraging libraries like RapidFuzz, you can handle large datasets effectively. Further optimizations, such as using advanced data structures or integrating with databases like Elasticsearch, can enhance performance even more.
About — The GenAI POD — GenAI Experts
GenAIPOD is a specialized consulting team of VerticalServe, helping clients with GenAI Architecture, Implementations etc.
VerticalServe Inc — Niche Cloud, Data & AI/ML Premier Consulting Company, Partnered with Google Cloud, Confluent, AWS, Azure…50+ Customers and many success stories..
Website: http://www.VerticalServe.com
Contact: contact@verticalserve.com