Toggle navigation
MeasureThat.net
Create a benchmark
Tools
Feedback
FAQ
Register
Log In
split string by newlines (fixed)
(version: 0)
Fastest way to split a string by newlines
Comparing performance of:
regex vs manual
Created:
3 years ago
by:
Guest
Jump to the latest result
Script Preparation code:
var NEWLINE = /\x0d\x0a|[\x0a\x0d\u2028\u2029]/gu; function splitRegex(str) { const strings = []; NEWLINE.lastIndex = 0; let prevIndex = 0; let match; while (match = NEWLINE.exec(str)) { if (match.index > prevIndex) strings.push(str.substring(prevIndex, match.index)); strings.push(match[0]); prevIndex = match.index + match[0].length; NEWLINE.lastIndex = prevIndex; } if (prevIndex < str.length) strings.push(str.substring(prevIndex, str.length)); return strings; } function splitManual(str) { const strings = []; const cache1 = { i: -1 }; const cache2 = { i: -1 }; const cache3 = { i: -1 }; const cache4 = { i: -1 }; let position = 0; while (position < str.length) { if (cache1.i < 0) { const i = str.indexOf('\n', position); cache1.i = i < 0 ? str.length : i; } if (cache2.i < 0) { const i = str.indexOf('\r', position); cache2.i = i < 0 ? str.length : i; } if (cache3.i < 0) { const i = str.indexOf('\u2028', position); cache3.i = i < 0 ? str.length : i; } if (cache4.i < 0) { const i = str.indexOf('\u2029', position); cache4.i = i < 0 ? str.length : i; } let winner = cache1; if (cache2.i < winner.i) winner = cache2; if (cache3.i < winner.i) winner = cache3; if (cache4.i < winner.i) winner = cache4; const index = winner.i; if (index === str.length) { strings.push(str.substring(position, index)); break; } let length = 1; if (winner === cache2 && str[index + 1] === '\n') { cache1.i = -1; length = 2; } if (index > position) { strings.push(str.substring(position, index)); } strings.push(str.substring(index, index + length)); position = index + length; winner.i = -1; } return strings; } var NO_NEWLINE = 'foobarbaz!'.repeat(1000); var LAST_NEWLINE = 'foobarbaz!'.repeat(1000) + '\n'; var LAST_NEWLINE_WIN32 = 'foobarbaz!'.repeat(1000) + '\r\n'; var LAST_NEWLINE_UNICODE = 'foobarbaz!'.repeat(1000) + '\u2029'; var MANY_NEWLINES = 'foobarbaz\n'.repeat(1000); var MANY_NEWLINES_WIN32 = 'foobarbaz\r\n'.repeat(1000); var MANY_NEWLINES_UNICODE = 'foobarbaz\u2029'.repeat(1000); // Side-effect to prevent code from being optimized away window.result = 0; var register = (results) => { window.result += (results[results.length - 1] || '').slice(-1).charCodeAt(0); };
Tests:
regex
register(splitRegex(NO_NEWLINE)); register(splitRegex(LAST_NEWLINE)); register(splitRegex(LAST_NEWLINE_WIN32)); register(splitRegex(LAST_NEWLINE_UNICODE)); register(splitRegex(MANY_NEWLINES)); register(splitRegex(MANY_NEWLINES_WIN32)); register(splitRegex(MANY_NEWLINES_UNICODE));
manual
register(splitManual(NO_NEWLINE)); register(splitManual(LAST_NEWLINE)); register(splitManual(LAST_NEWLINE_WIN32)); register(splitManual(LAST_NEWLINE_UNICODE)); register(splitManual(MANY_NEWLINES)); register(splitManual(MANY_NEWLINES_WIN32)); register(splitManual(MANY_NEWLINES_UNICODE));
Rendered benchmark preparation results:
Suite status:
<idle, ready to run>
Run tests (2)
Previous results
Fork
Test case name
Result
regex
manual
Fastest:
N/A
Slowest:
N/A
Latest run results:
No previous run results
This benchmark does not have any results yet. Be the first one
to run it!
Autogenerated LLM Summary
(model
llama3.2:3b
, generated one year ago):
Let's break down the provided benchmark definition and test cases to understand what is being tested. **Benchmark Definition** The benchmark definition is a JSON object that provides metadata about the benchmark, including its name, description, script preparation code, and HTML preparation code (which is empty in this case). The script preparation code defines two functions: 1. `splitRegex`: This function takes a string as input and splits it into substrings using a regular expression that matches newline characters (`\x0d\x0a`, `\x0a`, `\u2028`, and `\u2029`). The function returns an array of substrings. 2. `splitManual`: This function takes a string as input and splits it into substrings manually by iterating over the string and checking for specific newline characters in a cache. The HTML preparation code is not used in this benchmark, but it's likely included to provide some side effect or context to the benchmark. **Individual Test Cases** There are two test cases: 1. `regex`: This test case runs the `splitRegex` function with different inputs: `NO_NEWLINE`, `LAST_NEWLINE`, `LAST_NEWLINE_WIN32`, `LAST_NEWLINE_UNICODE`, and `MANY_NEWLINES`. The results of each execution are measured using a `register` function. 2. `manual`: This test case runs the `splitManual` function with the same inputs as the `regex` test case. **What is being tested?** The benchmark tests two approaches to splitting a string into substrings based on newline characters: 1. **Regex-based approach (`splitRegex`)**: This approach uses regular expressions to match and split the input string. 2. **Manual approach (`splitManual`)**: This approach manually checks for specific newline characters in the input string. **Options compared** The two test cases compare the performance of these two approaches: * `regex`: Tests the regular expression-based approach with various inputs. * `manual`: Tests the manual approach with various inputs. **Pros and Cons** Here are some pros and cons of each approach: **Regex-based approach (`splitRegex`)** Pros: * More efficient than manual approach for large inputs * Can handle multiple newline characters simultaneously * Robust against edge cases Cons: * May be slower for small inputs due to overhead of regular expressions * Requires careful tuning of regular expression patterns **Manual approach (`splitManual`)** Pros: * Faster for small inputs due to simpler logic * Less memory usage compared to regex-based approach Cons: * More prone to errors and edge cases * Limited to handling only specific newline characters **Device, browser, and operating system variations** The benchmark results show that the performance differences between the two approaches vary depending on the device (Desktop vs. not specified), browser (Firefox 110), and operating system (Mac OS X 10.15). This suggests that there may be platform-specific factors affecting the performance of these approaches. **Conclusion** In summary, this benchmark tests two approaches to splitting a string into substrings based on newline characters: a regular expression-based approach (`splitRegex`) and a manual approach (`splitManual`). The results highlight the pros and cons of each approach, including their efficiency, robustness, and simplicity.
Related benchmarks:
string split by length: substring vs match
lastIndexOf vs split vs regex
lastIndexOf vs split vs regex v2
split vs splitstring
Comments
Confirm delete:
Do you really want to delete benchmark?