Implementing "fuzzy search" in a Flutter app
Typos shouldn't stop users from finding results. Learn how to make that happen on the example of Bunny Search š°š
So one fact that everyone associates with me at this point is that besides loving Flutter, Iām also obsessed with cats. But thatās only one part of the story, as I have a soft spot for all animals in general. Thatās why Iām very conscious about what cosmetics I use, since many brands still practice testing on animals. There is a whole community dedicated to classifying brands with ācruelty-free (CF)ā status, as well as many official certifications such as PETA and Leaping Bunny. The problem, though, is that there are many nuances in defining a brand as CF, and to help users in their search, Iāve developed an app called Bunny Search. The name comes from the symbol you can find on the packaging of cosmetics, since many CF certifications have a bunny as their symbol.Ā I told the story behind developing Bunny Search as a pet project quite some time ago, you can check it out here. Yes, I remember that this is a newsletter about Flutter š So letās talk about it.
One thing that I mentioned in that article is that Iāve tried to implement āfuzzy searchā, but didnāt succeed. I tried again, and this time everything worked, so letās dive into implementation.
Search is the key feature in my app, as the user searches for the brand name to find details about it. A very common problem for humans while typing is, well, typos. In a regular search, which in its simplest form is just a contains function, if the user makes a single mistake in their query, then the results wonāt contain what they were looking for. Hence, the solution to this is to implement approximate string matching (commonly known as fuzzy search) as opposed to exact string matching. A long time ago, in one of the work projects, Iād already implemented this feature, so I knew exactly what I wanted to doā¦
Finding edit distance
Many algorithms exist to target this problem, and they usually involve counting the edit distance between the source and the target string. Edit distance is the amount of operations required to perform, which include substitution, insertion, and deletion, to transform one string into another. Sometimes also transposition, but we will talk about it later.
For example:
cot
ācat
: the edit distance will be 1 since we only need to substitute o for a to transformcot
tocat
.cot
ācats
: the edit distance is 2, since in addition to one substitution, we also need one insertion at the end (the s letter). And yes, I have to insert š everywhere š¤Ŗ
And so on. Of course, using these 3 operations, we can come up with many different ways to transform the source string into the target string. The goal of the efficient edit distance calculating algorithm is to find the minimum edit distance. If youāre into hacking algorithms, you can stop here and go to LeetCode to try yourself at this task š
Implementing the Levenshtein Distance algorithm
One of the popular algorithms is the Levenshtein distance, named after the mathematicianĀ Vladimir Levenshtein. The idea is basically the same that we have just discussed, finding the minimal edit distance by performing insertion, deletion, and substitution on the source & target strings. The most optimal way to implement this algorithm is via the dynamic programming approach. Iām not going to go through it here because it has been very well explained in many YouTube videos, for example here or here. Off-topic: I generally prefer text, such as articles or the docs to grok technical concepts, but for algorithm explanations, I feel like videos do a better job of demonstrating. If I ever start YouTube, these are the types of videos Iād probably like to createā¦
Anyway, in a nutshell, the idea is to construct such a matrix:

We start with the base cases, and we arrive at the minimum edit distance. So for each position, we calculate the minimum distance of transforming the strings, and by the end, we have the minimum distance for the whole string. If you want, you can take a look at the code, or scroll further and see the result, well, at the end. But the vanilla implementation in Dart looks like this:
static int levenshteinDistance(String a, String b) {
if (a == b) {
return 0;
}
if (a.isEmpty) {
return b.length;
}
if (b.isEmpty) {
return a.length;
}
int aLength = a.length;
int bLength = b.length;
List<List<int>> matrix = List.generate(
bLength + 1,
(i) => List.generate(aLength + 1, (j) => 0, growable: false),
growable: false,
);
for (int i = 0; i <= aLength; i++) {
matrix[0][i] = i;
}
for (int i = 0; i <= bLength; i++) {
matrix[i][0] = i;
}
for (int i = 1; i <= bLength; i++) {
for (int j = 1; j <= aLength; j++) {
int cost = (a[j - 1] == b[i - 1]) ? 0 : 1;
matrix[i][j] = _min(
matrix[i - 1][j] + 1, // deletion
matrix[i][j - 1] + 1, // insertion
matrix[i - 1][j - 1] + cost, // substitution
);
}
}
return matrix[bLength][aLength];
}
Great, but what to do with this minimal distance in our search? Because strings can be of various lengths, I decided to allow a different amount of mistakes depending on the length of the query. So, if the query.length
> 7, I allow 4 typos, otherwise 2. Of course, this is quite arbitrary, but I decided to start with this and refine it from here if required. So with this in mind, the actual search function can look like this:
static List<BrandEntity> searchSync(SearchQuery query) {
final searchTerm = query.query.toLowerCase();
// Filter & map the results
final filtered = query.allBrands.expand<SearchResult>((brand) {
final title = brand.title.toLowerCase();
if (title == searchTerm) { // Shortcut 1
return [SearchResult(0, brand)];
}
if (title.contains(searchTerm)) { // Shortcut 2
return [SearchResult(1, brand)];
}
final maxDistance = title.length > 7 ? 4 : 2;
final nameDistance = levenshteinDistance(title, searchTerm, maxDistance);
if (nameDistance <= maxDistance) {
return [SearchResult(nameDistance, brand)];
} else {
return [];
}
}).toList();
// Sort by edit distance
filtered.sort((a, b) {
return a.distance.compareTo(b.distance);
});
return filtered.map((result) => result.brand).toList();
}
Iām not really a fan of the code formatting of Substack, so if you want, you can jump straight into the source code.
Step 1:
Firstly, there are 2 shortcuts:
If the source and the target match exactly, obviously the edit distance is 0, just return it.
If the target contains the source, we donāt really care about the exact edit distance, so just return 1. Moreover, even if the target string fully contains the source string, but itself is very long, we could run into a situation, when per the algorithm the edit distance > 4, so it wonāt even get into the results.
Step 2:
If those conditions are not fulfilled, then count the Levenshtein distance, and if itās less than maxDistance
, include that brand in the search result.
Step 3:
Finally, sort by relevance (the less the edit distance, the more relevant) and return the result.
But if you look closely, you can see that I pass the maxDistance
to levenshteinDistance
function. I do that because if you think about it, the target strings can be very long, and if we have already calculated, that the minimum edit distance will be greater than the maximum amount of typos we allow, then there is no reason for us to continue calculating the whole matrix, hence we can return early. For that, we can find the minimum item in each row, and once that exceeds the max distance, just return it. So our updated function can look like this:
static int levenshteinDistance(String a, String b, int maxDistance) {
if (a == b) {
return 0;
}
if (a.isEmpty) {
return b.length <= maxDistance ? b.length : maxDistance + 1;
}
if (b.isEmpty) {
return a.length <= maxDistance ? a.length : maxDistance + 1;
}
int aLength = a.length;
int bLength = b.length;
List<List<int>> matrix = List.generate(
bLength + 1,
(i) => List.generate(aLength + 1, (j) => 0, growable: false),
growable: false,
);
for (int i = 0; i <= aLength; i++) {
matrix[0][i] = i;
}
for (int i = 0; i <= bLength; i++) {
matrix[i][0] = i;
}
for (int i = 1; i <= bLength; i++) {
// 1. Init our `minRowValue`
int minRowValue = maxDistance + 1;
for (int j = 1; j <= aLength; j++) {
int cost = (a[j - 1] == b[i - 1]) ? 0 : 1;
matrix[i][j] = _min(
matrix[i - 1][j] + 1, // deletion
matrix[i][j - 1] + 1, // insertion
matrix[i - 1][j - 1] + cost, // substitution
);
// 2. Find the current row min
minRowValue = min(minRowValue, matrix[i][j]);
}
// 3. If it's already larger than `maxDistance`, return early
if (minRowValue > maxDistance) {
return maxDistance + 1;
}
}
return matrix[bLength][aLength];
}
Now our search function should perform even better when long strings are involved. If you want to see it in action, you can check out the source code and play around with tests.
We have one last step to do. Remember, how in the beginning I said that the previous time I tried to implement this feature, I didnāt succeed? My database has around 6k records in it, which isnāt even that big, but running the Levenshtein distance algorithm with the complexity O(n*m) (where n & m are source and target string lengths) on that dataset still caused some jank.
So I offset it all to an Isolateā¦ and it consistently took a whopping 10 seconds to perform a search. Obviously, the users would never wait that long, so it wasnāt an option. I donāt exactly know why it happened, but it was before Flutter 2.8, which updated to use Dartās Isolate Groups and promised performance gains, so maybe thatās related. Anyway, Iām happy that now it works, and even though for smaller datasets you might get away with calculating everything on the main Isolate, Iād rather stay safe and avoid jank by using the convenient compute
function. So finally, our end function looks like this:
Future<List<BrandEntity>> search({
required List<BrandEntity> allBrands,
required String query,
}) => compute(searchSync, SearchQuery(allBrands, query));
Before wrapping this up, here are a couple of things that could be even better:
The vanilla Levenshtein distance doesnāt consider an operation called transposition. This is when two adjacent symbols could be swapped places and this would be considered a single operation, e.g.
ab
āba
= 1. But the DamerauāLevenshtein distance does. It is a more complex algorithm, and you can read more about it in the good old Wikipedia.Currently, this test fails:
test(
'Typo in Long Brand Name (Multiple characters off): Anastasia as Anastsia',
() {
var query = SearchQuery(makeupBrands, 'Anastsia');
var results = SearchService.searchSync(query).map((b) => b.title).toList();
expect(results, contains(matches('Anastasia Beverly Hills')));
});
This is because the whole string āAnastasia Beverly Hillsā is very long, and it reaches max distance. A better logic would be to also ignore extra symbols at the beginning and the end of the string, as long as part of it matches the query. I will look into it š
Also, I wanted to mention that I tried working with ChatGPT, and while it gave a solid implementation of the vanilla Levenshtein distance algorithm, it made many mistakes in explaining how it works, analyzing examples, and optimizing. So I wouldnāt fully trust it with algorithms just yet.
So to summarize, to implement āfuzzy searchā, follow these steps:
Implement a minimum edit distance searching algorithm, such as Levenshtein distance.
Define the maximum allowed typos based on the string length and sort the filtered list based on that.
Offload the computation to another
Isolate
via the utilitycompute
function.
And hereās a small list of completely random things to read, not exactly related to todayās topic, but may be interesting to you:
My article about the
Set
data structure inDart
. Iāve never continued the series, but now I think maybe I should?Article from Slava Egorov about microbenchmarking Dart. Itās like reading a detective story but about programming.
New official docs on Isolates with step-by-step explanations of more complex use cases.
I hope you found this issue interesting and see you next time! If youāve found any errors or have suggestions, theyāre very welcome āŗļø And letās connect on Twitter (X š)
-Daria š
P.S. - how do you prefer to digest content about algorithms & data structures, via text or via videos? š
Hi, Daria...Lovely article. Came to your Substack after watching your talk at Flutter Con 24; Thanks for putting this up. Also, i wanted to say, like the Youtube style videos you mentioned, this article would really pop if you had a gif demonstrating the accuracy of the search in helping users as you help us arrive at the solution, even more preferably using your app