Benchmark: Search - Intl.Collator vs localeCompare vs normalize

Script Preparation code:

var subject = 'Éclate lécrou, déchire la feuille, repara la máquina, recalienta la sopa, decora la habitación, despierta, spettacolo, überlegen, piękny, szczęście, υπέροχος.';
var needle = 'on';
var options = {
  	usage: 'search',
	sensitivity: 'base'
};

var collator = new Intl.Collator(undefined, options);

function normalizeText(text) {
  return text.normalize('NFD').replace(/[\u0300-\u036f]/g, "").toLowerCase();
}

function searchWithNormalizedText(string, query) {
  const nString = normalizeText(string);
  const nQuery = normalizeText(query);
  return nString.includes(nQuery);
}

Tests:

Intl.Collator
collator.compare(subject.toLowerCase(), needle.toLowerCase());
localeCompare
subject.toLowerCase().localeCompare(needle.toLowerCase(), undefined, options);
normalized
searchWithNormalizedText(subject, needle);

Rendered benchmark preparation results:

Suite status: <idle, ready to run>

Previous results

Test case name	Result
Intl.Collator
localeCompare
normalized

Fastest: N/A

Slowest: N/A

Latest run results:

Run details: (Test run date: 21 days ago)

User agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/148.0.0.0 Safari/537.36

Browser/OS: Chrome 148 on Windows

View result in a separate tab

Test name	Executions per second
Intl.Collator	2060662.6 Ops/sec
localeCompare	211109.2 Ops/sec
normalized	597183.1 Ops/sec

Autogenerated LLM Summary (model llama3.2:3b, generated one year ago):

LLMs can make mistakes. Check important info.

I'll break down the provided benchmark and its various components.

**Benchmark Overview**

The provided benchmark is designed to compare the performance of three approaches for searching substrings in text: using an `Intl.Collator` object, the `localeCompare()` method, and a custom normalization function. The goal is to determine which approach is the fastest.

**Approaches Compared**

1. **Intl.Collator**: This approach uses the `Intl.Collator` API to compare two strings. The `Collator` object is created with specific options (`usage` and `sensitivity`) that affect how the comparison is performed.
2. **localeCompare()**: This approach uses the `localeCompare()` method of a string, which performs a locale-dependent comparison. In this case, the `options` parameter is used to customize the behavior.
3. **Normalized Search**: This approach normalizes both the input text and query using a custom function (`normalizeText`) before performing a substring search.

**Pros and Cons**

1. **Intl.Collator**:
	* Pros: Fast and efficient, especially for cases where collation is necessary (e.g., sorting or categorization).
	* Cons: May not perform well on short strings or simple searches.
2. **localeCompare()**:
	* Pros: Widely supported and relatively fast, but may vary in performance depending on the locale and string content.
	* Cons: Can be slower than `Intl.Collator` for specific use cases, especially when handling non-ASCII characters.
3. **Normalized Search**:
	* Pros: Customizable and flexible, allowing for different normalization strategies or even alternative search algorithms.
	* Cons: May introduce additional overhead due to the normalization step.

**Library and Syntax**

The `Intl.Collator` library is part of the ECMAScript Internationalization API (ECMAScript 2015). It provides a way to compare strings in a culturally sensitive manner, taking into account factors like language, region, and script.

No special JavaScript features or syntax are used in this benchmark. However, it's worth noting that `Intl.Collator` relies on the ECMAScript Internationalization API, which may not be supported by older browsers or environments.

**Other Alternatives**

Some alternative approaches for substring searching could include:

1. **Regular expressions**: Using a regular expression engine (e.g., ` RegExp` in JavaScript) to search for substrings.
2. **String manipulation libraries**: Utilizing specialized libraries like `string-search` or `needle-find` that provide optimized string matching algorithms.
3. **Hash tables or data structures**: Implementing a custom hash table or data structure to store and query substrings efficiently.

Keep in mind that these alternatives may have their own trade-offs and performance characteristics, which would need to be evaluated on a case-by-case basis.

Related benchmarks:

Search - Intl.Collator vs localeCompare vs normalize (version: 0)

Comparing performance of: Intl.Collator vs localeCompare vs normalized

Created: one year ago by: Registered User

Jump to the latest result

Intl.Collator

localeCompare

normalized

Suite status: <idle, ready to run>

Fastest: N/A

Slowest: N/A

Autogenerated LLM Summary (model llama3.2:3b, generated one year ago):