Benchmark: eliminate duplicates from an array of strings

Tests:

Set
const str = 'Hello world nice to meet you hello again world'; console.log('unique words:', [...new Set(str.toLowerCase().split(/\s+/))]);
Lookup
const str = 'Hello world nice to meet you hello again world'; const words = {}; for ( const word of str.toLowerCase().split(/\s+/)) { words[word] = word; } console.log('unique words:', Object.keys(words));
filter
const str = 'Hello world nice to meet you hello again world'; console.log('unique words:',str.toLowerCase().split(/\s+/).filter((v, i, m) => i === m.indexOf(v)));
lookup let
const str = 'Hello world nice to meet you hello again world'; const words = {}; for ( let word of str.toLowerCase().split(/\s+/)) { words[word] = word; } console.log('unique words:', Object.keys(words));
regex
const str = 'Hello world nice to meet you hello again world'; console.log('unique words:',str.replace(/(\b\S.+\b)(?=.*\1)/g, "").trim());

Rendered benchmark preparation results:

Suite status: <idle, ready to run>

Previous results

Test case name	Result
Set
Lookup
filter
lookup let
regex

Fastest: N/A

Slowest: N/A

Latest run results:

No previous run results

This benchmark does not have any results yet. Be the first one to run it!

Autogenerated LLM Summary (model llama3.2:3b, generated one year ago):

LLMs can make mistakes. Check important info.

Let's dive into the world of JavaScript microbenchmarks and explore what's being tested on this provided JSON.

**Benchmark Definition**

The benchmark definition is quite straightforward: it involves eliminating duplicates from an array of strings. The script preparation code is null, which means that the test starts with a clean slate, without any existing code.

**Test Cases**

There are four individual test cases:

1. **Set**: This test uses the `new Set()` constructor to eliminate duplicates. A set in JavaScript is an unordered collection of unique values.
2. **Lookup**: This test uses an object literal (`{}`) and iterates over the string using a for...of loop, storing each word as a key-value pair. The `Object.keys()` method is then used to get an array of unique keys (i.e., words).
3. **Filter**: This test uses the Array.prototype.filter() method to eliminate duplicates. It filters out elements that are not equal to the first occurrence of each element.
4. **Lookup let**: Similar to the Lookup test, but uses a for...of loop with `let` keyword instead of const.
5. **Regex**: This test uses regular expressions (regex) to eliminate duplicates. The regex pattern `(\\b\\S.+\\b)(?=.*\\1)` matches any word that appears twice in the string.

**Pros and Cons**

Here's a brief summary of each approach:

* **Set**: Pros: concise, efficient. Cons: may not be suitable for very large datasets or complex data structures.
* **Lookup**: Pros: flexible, can handle larger datasets. Cons: uses more memory due to object creation.
* **Filter**: Pros: simple, easy to understand. Cons: may be slower than Set or Lookup for very large datasets.
* **Lookup let**: Similar pros and cons as Lookup.
* **Regex**: Pros: flexible, can handle complex patterns. Cons: slower than other approaches, may not be suitable for very large datasets.

**Libraries Used**

None explicitly mentioned in the benchmark definition, but some libraries might be implicitly used due to built-in JavaScript features:

* `Set` is a built-in JavaScript object.
* `Object.keys()` is a built-in JavaScript method.

**Special JS Features or Syntax**

None explicitly mentioned in the benchmark definition. However, it's worth noting that the regex pattern `(\\b\\S.+\\b)(?=.*\\1)` uses some advanced regex features:

* `\b` matches word boundaries.
* `\S` matches any non-whitespace character.
* `.+` matches one or more characters (including whitespace).
* `(?=.*\\1)` is a positive lookahead assertion that checks if the current position is followed by the same pattern again.

**Other Alternatives**

Some alternative approaches to eliminate duplicates from an array of strings could be:

* Using a `Map` instead of an object literal for Lookup.
* Using a custom implementation with a data structure like a trie or a suffix tree.
* Using a library like Lodash's `uniqBy()` function.

However, the Set and Filter approaches are likely to be the most efficient and straightforward options for this particular benchmark.

Related benchmarks:

eliminate duplicates from an array of strings (version: 1)

eliminate duplicates from an array of strings

Comparing performance of: Set vs Lookup vs filter vs lookup let vs regex

Created: 4 years ago by: Registered User

Jump to the latest result

Set

Lookup

filter

lookup let

regex

Suite status: <idle, ready to run>

Fastest: N/A

Slowest: N/A

No previous run results

Autogenerated LLM Summary (model llama3.2:3b, generated one year ago):