Benchmark: split string by newlines (fixed)

Script Preparation code:

var NEWLINE = /\x0d\x0a|[\x0a\x0d\u2028\u2029]/gu;

function splitRegex(str) {
  const strings = [];

  NEWLINE.lastIndex = 0;
  let prevIndex = 0;
  let match;

  while (match = NEWLINE.exec(str)) {
    if (match.index > prevIndex) strings.push(str.substring(prevIndex, match.index));
    strings.push(match[0]);
    prevIndex = match.index + match[0].length;
    NEWLINE.lastIndex = prevIndex;
  }

  if (prevIndex < str.length) strings.push(str.substring(prevIndex, str.length));

  return strings;
}

function splitManual(str) {
	const strings = [];

	const cache1 = { i: -1 };
	const cache2 = { i: -1 };
	const cache3 = { i: -1 };
	const cache4 = { i: -1 };

	let position = 0;
	while (position < str.length) {
		if (cache1.i < 0) {
			const i = str.indexOf('\n', position);
			cache1.i = i < 0 ? str.length : i;
		}
		if (cache2.i < 0) {
			const i = str.indexOf('\r', position);
			cache2.i = i < 0 ? str.length : i;
		}
		if (cache3.i < 0) {
			const i = str.indexOf('\u2028', position);
			cache3.i = i < 0 ? str.length : i;
		}
		if (cache4.i < 0) {
			const i = str.indexOf('\u2029', position);
			cache4.i = i < 0 ? str.length : i;
		}

		let winner = cache1;
		if (cache2.i < winner.i) winner = cache2;
		if (cache3.i < winner.i) winner = cache3;
		if (cache4.i < winner.i) winner = cache4;

		const index = winner.i;
		if (index === str.length) {
			strings.push(str.substring(position, index));
			break;
		}

		let length = 1;
		if (winner === cache2 && str[index + 1] === '\n') {
			cache1.i = -1;
			length = 2;
		}

		if (index > position) {
			strings.push(str.substring(position, index));
		}
		strings.push(str.substring(index, index + length));

		position = index + length;
		winner.i = -1;
	}

	return strings;
}

var NO_NEWLINE = 'foobarbaz!'.repeat(1000);
var LAST_NEWLINE = 'foobarbaz!'.repeat(1000) + '\n';
var LAST_NEWLINE_WIN32 = 'foobarbaz!'.repeat(1000) + '\r\n';
var LAST_NEWLINE_UNICODE = 'foobarbaz!'.repeat(1000) + '\u2029';
var MANY_NEWLINES = 'foobarbaz\n'.repeat(1000);
var MANY_NEWLINES_WIN32 = 'foobarbaz\r\n'.repeat(1000);
var MANY_NEWLINES_UNICODE = 'foobarbaz\u2029'.repeat(1000);

// Side-effect to prevent code from being optimized away
window.result = 0;
var register = (results) => { window.result += (results[results.length - 1] || '').slice(-1).charCodeAt(0); };

Tests:

regex
register(splitRegex(NO_NEWLINE)); register(splitRegex(LAST_NEWLINE)); register(splitRegex(LAST_NEWLINE_WIN32)); register(splitRegex(LAST_NEWLINE_UNICODE)); register(splitRegex(MANY_NEWLINES)); register(splitRegex(MANY_NEWLINES_WIN32)); register(splitRegex(MANY_NEWLINES_UNICODE));
manual
register(splitManual(NO_NEWLINE)); register(splitManual(LAST_NEWLINE)); register(splitManual(LAST_NEWLINE_WIN32)); register(splitManual(LAST_NEWLINE_UNICODE)); register(splitManual(MANY_NEWLINES)); register(splitManual(MANY_NEWLINES_WIN32)); register(splitManual(MANY_NEWLINES_UNICODE));

Rendered benchmark preparation results:

Suite status: <idle, ready to run>

Previous results

Test case name	Result
regex
manual

Fastest: N/A

Slowest: N/A

Latest run results:

No previous run results

This benchmark does not have any results yet. Be the first one to run it!

Autogenerated LLM Summary (model llama3.2:3b, generated one year ago):

LLMs can make mistakes. Check important info.

Let's break down the provided benchmark definition and test cases to understand what is being tested.

**Benchmark Definition**

The benchmark definition is a JSON object that provides metadata about the benchmark, including its name, description, script preparation code, and HTML preparation code (which is empty in this case). The script preparation code defines two functions:

1. `splitRegex`: This function takes a string as input and splits it into substrings using a regular expression that matches newline characters (`\x0d\x0a`, `\x0a`, `\u2028`, and `\u2029`). The function returns an array of substrings.
2. `splitManual`: This function takes a string as input and splits it into substrings manually by iterating over the string and checking for specific newline characters in a cache.

The HTML preparation code is not used in this benchmark, but it's likely included to provide some side effect or context to the benchmark.

**Individual Test Cases**

There are two test cases:

1. `regex`: This test case runs the `splitRegex` function with different inputs: `NO_NEWLINE`, `LAST_NEWLINE`, `LAST_NEWLINE_WIN32`, `LAST_NEWLINE_UNICODE`, and `MANY_NEWLINES`. The results of each execution are measured using a `register` function.
2. `manual`: This test case runs the `splitManual` function with the same inputs as the `regex` test case.

**What is being tested?**

The benchmark tests two approaches to splitting a string into substrings based on newline characters:

1. **Regex-based approach (`splitRegex`)**: This approach uses regular expressions to match and split the input string.
2. **Manual approach (`splitManual`)**: This approach manually checks for specific newline characters in the input string.

**Options compared**

The two test cases compare the performance of these two approaches:

* `regex`: Tests the regular expression-based approach with various inputs.
* `manual`: Tests the manual approach with various inputs.

**Pros and Cons**

Here are some pros and cons of each approach:

**Regex-based approach (`splitRegex`)**

Pros:

* More efficient than manual approach for large inputs
* Can handle multiple newline characters simultaneously
* Robust against edge cases

Cons:

* May be slower for small inputs due to overhead of regular expressions
* Requires careful tuning of regular expression patterns

**Manual approach (`splitManual`)**

Pros:

* Faster for small inputs due to simpler logic
* Less memory usage compared to regex-based approach

Cons:

* More prone to errors and edge cases
* Limited to handling only specific newline characters

**Device, browser, and operating system variations**

The benchmark results show that the performance differences between the two approaches vary depending on the device (Desktop vs. not specified), browser (Firefox 110), and operating system (Mac OS X 10.15). This suggests that there may be platform-specific factors affecting the performance of these approaches.

**Conclusion**

In summary, this benchmark tests two approaches to splitting a string into substrings based on newline characters: a regular expression-based approach (`splitRegex`) and a manual approach (`splitManual`). The results highlight the pros and cons of each approach, including their efficiency, robustness, and simplicity.

Related benchmarks:

split string by newlines (fixed) (version: 0)

Fastest way to split a string by newlines

Comparing performance of: regex vs manual

Created: 3 years ago by: Guest

Jump to the latest result

regex

manual

Suite status: <idle, ready to run>

Fastest: N/A

Slowest: N/A

No previous run results

Autogenerated LLM Summary (model llama3.2:3b, generated one year ago):