Benchmark: Set vs Filter for unique (Array of Objects)

Script Preparation code:

const ids = Array.from({
    length: 500
}, () => Math.random().toString(16).slice(2));
var array = [...ids.map((id) => ([{
                id: id,
                content: "foo"
            }, {
                id: id,
                content: "bar"
            }, {
                id: id,
                content: "baz"
            }]))].flat();

Tests:

Filter + Set
const seen = new Set(); const deduplicatedArray1 = array.filter(item => { if (seen.has(item.id)) { return false; } seen.add(item.id); return true; });
Array from Map
const deduplicatedArray2 = [ ...new Map(array.map(item => [item.id, item])).values() ];
Naive Filter
const deduplicatedArray3 = array.filter((item, index, self) => index === self.findIndex(obj => obj.id === item.id) );

Rendered benchmark preparation results:

Suite status: <idle, ready to run>

Previous results

Test case name	Result
Filter + Set
Array from Map
Naive Filter

Fastest: N/A

Slowest: N/A

Latest run results:

Run details: (Test run date: one year ago)

User agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36

Browser/OS: Chrome 132 on Windows

View result in a separate tab

Test name	Executions per second
Filter + Set	4552.1 Ops/sec
Array from Map	4001.5 Ops/sec
Naive Filter	12.7 Ops/sec

Autogenerated LLM Summary (model gpt-4o-mini, generated one year ago):

LLMs can make mistakes. Check important info.

The benchmark titled "Set vs Filter for unique (Array of Objects)" compares three different approaches to deduplicate an array of objects based on a unique identifier (in this case, `id`). The benchmark evaluates the performance of each method in terms of how many executions can be performed per second.

### Methods Compared:

1. **Filter + Set**:
   - **Implementation**: 
     ```javascript
     const seen = new Set();
     const deduplicatedArray1 = array.filter(item => {
       if (seen.has(item.id)) {
         return false;
       }
       seen.add(item.id);
       return true;
     });
     ```
   - **Description**: This method uses a `Set` to keep track of seen `id`s while using the `Array.prototype.filter` function to retain only unique objects.
   - **Pros**:
     - Efficient for large datasets due to constant time complexity (O(1)) for `Set` operations like `add` and `has`.
   - **Cons**:
     - Extra memory usage due to the `Set` storage, which may be a consideration if memory is limited.

2. **Array from Map**:
   - **Implementation**:
     ```javascript
     const deduplicatedArray2 = [
       ...new Map(array.map(item => [item.id, item])).values()
     ];
     ```
   - **Description**: This method uses a `Map` to create key-value pairs where keys are the `id`s and values are the respective objects. The `Map` inherently maintains unique keys.
   - **Pros**:
     - Creates a unique list of objects directly, leveraging JavaScript's `Map` object, which ensures uniqueness.
   - **Cons**:
     - Slightly more complex in terms of understanding compared to using a `Set`. Also has a memory footprint similar to the first approach.

3. **Naive Filter**:
   - **Implementation**:
     ```javascript
     const deduplicatedArray3 = array.filter((item, index, self) =>
       index === self.findIndex(obj => obj.id === item.id)
     );
     ```
   - **Description**: This method uses `filter` in combination with a check that uses `findIndex` to identify the first occurrence of each `id`.
   - **Pros**:
     - More straightforward and easier to read since it relies only on the `filter` method.
   - **Cons**:
     - Performs poorly in larger datasets due to O(n^2) complexity resulting from nested looping (one for filter and another for findIndex). This leads to significantly reduced performance compared to the other methods.

### Performance Results:
The benchmark results indicate that the "Filter + Set" method is the fastest, capable of executing over 32,000 times per second, while "Array from Map" performs reasonably well at about 22,000 executions per second. The "Naive Filter" method performs poorly at around 105 executions per second, which highlights the inefficiency of its approach.

### Alternatives:
Other alternatives could include:
- Using a simple loop with an object to track seen `id`s, similar to how the `Set` operates but using plain JavaScript objects. This method offers a similar performance characteristic to `Set` while typically being more familiar to developers.
  
- Unique Array methods in libraries like Lodash (e.g., `_.uniqBy()`), which can provide utility and potentially optimize performance or increase readability depending on the use case. However, these libraries may introduce dependency management overhead and performance variations based on their own implementation.

Overall, developers should carefully consider the trade-offs between performance, memory usage, readability, and maintainability when choosing an approach to deduplicate objects in an array.

Related benchmarks:

Set vs Filter for unique (Array of Objects) (version: 1)

Comparing performance of: Filter + Set vs Array from Map vs Naive Filter

Created: one year ago by: Guest

Jump to the latest result

Filter + Set

Array from Map

Naive Filter

Suite status: <idle, ready to run>

Fastest: N/A

Slowest: N/A

Autogenerated LLM Summary (model gpt-4o-mini, generated one year ago):