Toggle navigation
MeasureThat.net
Create a benchmark
Tools
Feedback
FAQ
Register
Log In
DomParser vs Regex for HOCR
(version: 2)
Comparing performance of:
DOMParser vs Regex
Created:
3 years ago
by:
Registered User
Jump to the latest result
Tests:
DOMParser
console.time("test"); const hocr = `<div class='ocr_page' id='page_1' title='image ""; bbox 0 0 2460 3486; ppageno 0'>\n <div class='ocr_carea' id='block_1_1' title="bbox 3 0 2459 3486">\n <p class='ocr_par' id='par_1_1' lang='deu' title="bbox 0 0 2459 3486">\n <span class='ocr_line' id='line_1_1' title="bbox 11 0 2052 19; baseline -0.004 -1; x_size 25.174164; x_descenders 5.1741633; x_ascenders 5">\n <span class='ocrx_word' id='word_1_1' title='bbox 229 0 261 24; x_wconf 10'>111</span>\n <span class='ocrx_word' id='word_1_2' title='bbox 1351 0 1639 24; x_wconf 34'>EEE</span>\n </span>\n <span class='ocr_line' id='line_1_2' title="bbox 990 100 2355 217; baseline -0.012 -50; x_size 76; x_descenders 19; x_ascenders 25">\n <span class='ocrx_word' id='word_1_3' title='bbox 990 152 1125 217; x_wconf 24'>ER</span>\n <span class='ocrx_word' id='word_1_4' title='bbox 1438 123 1630 181; x_wconf 92'>ARD®</span>\n <span class='ocrx_word' id='word_1_5' title='bbox 1686 102 1880 212; x_wconf 70'>Cr</span>\n <span class='ocrx_word' id='word_1_6' title='bbox 1977 115 2258 163; x_wconf 15'>EEE</span>\n </span>\n <span class='ocr_line' id='line_1_3' title="bbox 448 168 2357 296; baseline -0.007 -50.027; x_size 40.362125; x_descenders 8.210145; x_ascenders 8.3506098">\n <span class='ocrx_word' id='word_1_7' title='bbox 448 168 628 275; x_wconf 35'>USE</span>\n <span class='ocrx_word' id='word_1_8' title='bbox 627 198 764 252; x_wconf 0'>TE</span>\n <span class='ocrx_word' id='word_1_9' title='bbox 778 168 924 275; x_wconf 0'>Ldlhe</span>\n <span class='ocrx_word' id='word_1_10' title='bbox 1045 210 1128 296; x_wconf 39'>At</span>\n <span class='ocrx_word' id='word_1_11' title='bbox 1352 199 1916 249; x_wconf 5'>ee</span>\n <span class='ocrx_word' id='word_1_12' title='bbox 2315 199 2357 249; x_wconf 0'>E</span>\n </span>\n <span class='ocr_line' id='line_1_4' title="bbox 716 253 2067 317; baseline -0.022 1; x_size 67.038216; x_descenders 11.038215; x_ascenders 24">\n <span class='ocrx_word' id='word_1_13' title='bbox 716 283 839 317; x_wconf 22'>RS]</span>\n <span class='ocrx_word' id='word_1_14' title='bbox 925 260 959 314; x_wconf 0'>a1</span>\n <span class='ocrx_word' id='word_1_15' title='bbox 992 253 1045 302; x_wconf 7'>5</span>\n <span class='ocrx_word' id='word_1_16' title='bbox 1713 255 2067 293; x_wconf 92'>BEITRAGSSERVICE</span>\n </span>\n <span class='ocr_line' id='line_1_5' title="bbox 452 264 866 360; baseline -0.058 -15; x_size 66; x_descenders 13; x_ascenders 19">\n <span class='ocrx_word' id='word_1_17' title='bbox 452 280 825 360; x_wconf 90'>rundfunkbeitrag</span>\n <span class='ocrx_word' id='word_1_18' title='bbox 829 264 866 339; x_wconf 0'>\\)</span>\n </span>\n <span class='ocr_line' id='line_1_6' title="bbox 1713 489 2083 516; baseline -0.011 0; x_size 32.55394; x_descenders 6.5539398; x_ascenders 7">\n <span class='ocrx_word' id='word_1_19' title='bbox 1713 491 1761 516; x_wconf 96'>Sie</span>\n <span class='ocrx_word' id='word_1_20' title='bbox 1772 489 1918 516; x_wconf 96'>erreichen</span>\n <span class='ocrx_word' id='word_1_21' title='bbox 1932 495 1989 514; x_wconf 96'>uns</span>\n <span class='ocrx_word' id='word_1_22' title='bbox 2000 489 2083 513; x_wconf 96'>unter</span>\n </span>\n <span class='ocr_line' id='line_1_7' title="bbox 485 523 2108 587; baseline -0.013 -19; x_size 40.362125; x_descenders 8.210145; x_ascenders 8.3506098">\n <span class='ocrx_word' id='word_1_23' title='bbox 485 544 521 568; x_wconf 90'>07</span>\n <span class='ocrx_word' id='word_1_24' title='bbox 532 544 610 568; x_wconf 72'>2FC7</span>\n <span class='ocrx_word' id='word_1_25' title='bbox 622 542 747 568; x_wconf 60'>F7B11C</span>\n <span class='ocrx_word' id='word_1_26' title='bbox 759 542 829 567; x_wconf 39'>DO21</span>\n <span class='ocrx_word' id='word_1_27' title='bbox 845 541 925 566; x_wconf 34'>A8BSD</span>\n <span class='ocrx_word' id='word_1_28' title='bbox 1030 568 1055 578; x_wconf 26'>...</span>\n <span class='ocrx_word' id='word_1_29' title='bbox 1712 526 1828 587; x_wconf 46'>Teen,</span>\n <span class='ocrx_word' id='word_1_30' title='bbox 1844 526 1870 586; x_wconf 0'>1</span>\n <span class='ocrx_word' id='word_1_31' title='bbox 1873 526 1926 585; x_wconf 24'>Er</span>\n <span class='ocrx_word' id='word_1_32' title='bbox 1938 525 1996 584; x_wconf 48'>=</span>\n <span class='ocrx_word' id='word_1_33' title='bbox 2004 524 2038 583; x_wconf 0'>ae</span>\n <span class='ocrx_word' id='word_1_34' title='bbox 2069 523 2108 582; x_wconf 78'>2</span>\n </span>\n <span class='ocr_line' id='line_1_8' title="bbox 326 563 2121 633; baseline -0.007 -11; x_size 29; x_descenders 4; x_ascenders 11">\n <span class='ocrx_word' id='word_1_35' title='bbox 326 563 378 632; x_wconf 91'>P</span>\n <span class='ocrx_word' id='word_1_36' title='bbox 400 577 454 608; x_wconf 95'>DV</span>\n <span class='ocrx_word' id='word_1_37' title='bbox 482 584 517 608; x_wconf 94'>06</span>\n <span class='ocrx_word' id='word_1_38' title='bbox 545 583 608 612; x_wconf 96'>0,70</span>\n <span class='ocrx_word' id='word_1_39' title='bbox 648 582 783 609; x_wconf 93'>Deutsche</span>\n <span class='ocrx_word' id='word_1_40' title='bbox 791 584 853 609; x_wconf 93'>Post</span>\n <span class='ocrx_word' id='word_1_41' title='bbox 871 564 903 627; x_wconf 6'>oo</span>\n <span class='ocrx_word' id='word_1_42' title='bbox 965 573 1070 633; x_wconf 53'>IE:</span>\n <span class='ocrx_word' id='word_1_43' title='bbox 1713 567 1843 617; x_wconf 27'>Bean</span>\n <span class='ocrx_word' id='word_1_44' title='bbox 1882 598 1921 612; x_wconf 94'>aus</span>\n <span class='ocrx_word' id='word_1_45' title='bbox 1929 592 1976 611; x_wconf 96'>dem</span>\n <span class='ocrx_word' id='word_1_46' title='bbox 1985 592 2008 610; x_wconf 77'>dt.</span>\n <span class='ocrx_word' id='word_1_47' title='bbox 2020 590 2121 620; x_wconf 95'>Festnetz,</span>\n </span>\n <span class='ocr_line' id='line_1_9' title="bbox 325 615 2195 693; baseline -0.008 -43; x_size 27; x_descenders 5; x_ascenders 8">\n <span class='ocrx_word' id='word_1_48' title='bbox 325 641 337 651; x_wconf 90'>*</span>\n <span class='ocrx_word' id='word_1_49' title='bbox 346 641 418 666; x_wconf 80'>4557</span>\n <span class='ocrx_word' id='word_1_50' title='bbox 429 642 440 651; x_wconf 80'>*</span>\n <span class='ocrx_word' id='word_1_51' title='bbox 452 642 575 666; x_wconf 78'>0137861</span>\n <span class='ocrx_word' id='word_1_52' title='bbox 579 623 588 670; x_wconf 78'>*</span>\n <span class='ocrx_word' id='word_1_53' title='bbox 947 623 1045 693; x_wconf 35'>Se</span>\n <span class='ocrx_word' id='word_1_54' title='bbox 1712 622 1739 640; x_wconf 93'>60</span>\n <span class='ocrx_word' id='word_1_55' title='bbox 1747 619 1865 640; x_wconf 89'>Cent/Anruf</span>\n <span class='ocrx_word' id='word_1_56' title='bbox 1874 625 1911 639; x_wconf 96'>aus</span>\n <span class='ocrx_word' id='word_1_57' title='bbox 1921 619 1960 638; x_wconf 96'>den</span>\n <span class='ocrx_word' id='word_1_58' title='bbox 1969 619 1994 638; x_wconf 96'>dt.</span>\n <span class='ocrx_word' id='word_1_59' title='bbox 2005 615 2195 638; x_wconf 90'>Mobilfunknetzen)</span>\n </span>\n <span class='ocr_line' id='line_1_10' title="bbox 326 668 1922 699; baseline -0.002 0; x_size 36.55394; x_descenders 6.5539398; x_ascenders 11">\n <span class='ocrx_word' id='word_1_60' title='bbox 326 680 418 699; x_wconf 10'>*.nDns</span>\n <span class='ocrx_word' id='word_1_61' title='bbox 429 681 440 692; x_wconf 72'>*</span>\n <span class='ocrx_word' id='word_1_62' title='bbox 996 678 1065 693; x_wconf 23'>ne:</span>\n <span class='ocrx_word' id='word_1_63' title='bbox 1712 670 1922 697; x_wconf 92'>Servicezeiten</span>\n </span>\n <span class='ocr_line' id='line_1_11' title="bbox 1712 703 2114 740; baseline -0.01 -7; x_size 34; x_descenders 8; x_ascenders 7">\n <span class='ocrx_word' id='word_1_64' title='bbox 1712 708 1820 740; x_wconf 93'>Montag</span>\n <span class='ocrx_word' id='word_1_65' title='bbox 1834 721 1842 724; x_wconf 93'>-</span>\n <span class='ocrx_word' id='word_1_66' title='bbox 1854 706 1953 738; x_wconf 95'>Freitag</span>\n <span class='ocrx_word' id='word_1_67' title='bbox 1966 706 1982 730; x_wconf 92'>7</span>\n <span class='ocrx_word' id='word_1_68' title='bbox 1995 719 2003 722; x_wconf 92'>-</span>\n <span class='ocrx_word' id='word_1_69' title='bbox 2018 706 2050 730; x_wconf 96'>19</span>\n <span class='ocrx_word' id='word_1_70' title='bbox 2063 703 2114 730; x_wconf 96'>Uhr</span>\n </span>\n <span class='ocr_line' id='line_1_12' title="bbox 1713 762 1923 790; baseline -0.01 0; x_size 32.55394; x_descenders 6.5539398; x_ascenders 7">\n <span class='ocrx_word' id='word_1_71' title='bbox 1713 762 1923 790; x_wconf 96'>Postanschrift</span>\n </span>\n <span class='ocr_line' id='line_1_13' title="bbox 3 796 2124 860; baseline -0.013 -11.04; x_size 32.208996; x_descenders 6.2089958; x_ascenders 8">\n <span class='ocrx_word' id='word_1_72' title='bbox 3 819 45 860; x_wconf 29'>er</span>\n <span class='ocrx_word' id='word_1_73' title='bbox 1712 800 1780 826; x_wconf 96'>ARD</span>\n <span class='ocrx_word' id='word_1_74' title='bbox 1791 799 1853 825; x_wconf 96'>ZDF</span>\n <span class='ocrx_word' id='word_1_75' title='bbox 1867 796 2124 824; x_wconf 96'>Deutschlandradio</span>\n </span>\n <span class='ocr_line' id='line_1_14' title="bbox 1713 832 2126 868; baseline -0.01 -6; x_size 33; x_descenders 7; x_ascenders 8">\n <span class='ocrx_word' id='word_1_76' title='bbox 1713 834 1946 868; x_wconf 92'>Beitragsservice,</span>\n <span class='ocrx_word' id='word_1_77' title='bbox 1959 834 2049 859; x_wconf 96'>50656</span>\n <span class='ocrx_word' id='word_1_78' title='bbox 2063 832 2126 858; x_wconf 96'>Köln</span>\n </span>\n <span class='ocr_line' id='line_1_15' title="bbox 1711 890 2067 922; baseline -0.014 -3; x_size 33; x_descenders 7; x_ascenders 8">\n <span class='ocrx_word' id='word_1_79' title='bbox 1711 893 1779 919; x_wconf 92'>Web</span>\n <span class='ocrx_word' id='word_1_80' title='bbox 1791 890 2067 922; x_wconf 92'>rundfunkbeitrag.de</span>\n </span>\n <span class='ocr_line' id='line_1_16' title="bbox 1712 948 1998 976; baseline -0.014 0; x_size 32.208996; x_descenders 6.2089958; x_ascenders 8">\n <span class='ocrx_word' id='word_1_81' title='bbox 1712 950 1811 976; x_wconf 96'>Datum</span>\n <span class='ocrx_word' id='word_1_82' title='bbox 1834 948 1998 974; x_wconf 96'>26.06.2018</span>\n </span>\n <span class='ocr_line' id='line_1_17' title="bbox 1712 1005 2039 1038; baseline -0.012 -6; x_size 32; x_descenders 7; x_ascenders 7">\n <span class='ocrx_word' id='word_1_83' title='bbox 1712 1007 1974 1038; x_wconf 92'>Beitragsnummer</span>\n <span class='ocrx_word' id='word_1_84' title='bbox 1985 1005 2039 1029; x_wconf 96'>250</span>\n </span>\n <span class='ocr_line' id='line_1_18' title="bbox 325 1295 1363 1338; baseline 0.005 -12; x_size 39; x_descenders 8; x_ascenders 7">\n <span class='ocrx_word' id='word_1_85' title='bbox 325 1295 484 1335; x_wconf 96'>Zahlung</span>\n <span class='ocrx_word' id='word_1_86' title='bbox 499 1297 563 1328; x_wconf 93'>der</span>\n <span class='ocrx_word' id='word_1_87' title='bbox 577 1298 924 1337; x_wconf 92'>Rundfunkbeiträge</span>\n <span class='ocrx_word' id='word_1_88' title='bbox 937 1315 949 1321; x_wconf 92'>-</span>\n <span class='ocrx_word' id='word_1_89' title='bbox 964 1300 1284 1338; x_wconf 92'>Beitragsnummer</span>\n <span class='ocrx_word' id='word_1_90' title='bbox 1297 1300 1363 1331; x_wconf 96'>250</span>\n </span>\n <span class='ocr_line' id='line_1_19' title="bbox 325 1427 672 1467; baseline 0 -8; x_size 40; x_descenders 8; x_ascenders 8">\n <span class='ocrx_word' id='word_1_91' title='bbox 325 1427 413 1459; x_wconf 96'>Sehr</span>\n <span class='ocrx_word' id='word_1_92' title='bbox 425 1429 579 1467; x_wconf 96'>geehrter</span>\n <span class='ocrx_word' id='word_1_93' title='bbox 593 1429 672 1460; x_wconf 96'>Herr</span>\n </span>\n <span class='ocr_line' id='line_1_20' title="bbox 326 1516 1207 1557; baseline 0.001 -10; x_size 40; x_descenders 9; x_ascenders 7">\n <span class='ocrx_word' id='word_1_94' title='bbox 326 1516 395 1548; x_wconf 93'>Ihre</span>\n <span class='ocrx_word' id='word_1_95' title='bbox 411 1516 734 1556; x_wconf 92'>Rundfunkbeiträge</span>\n <span class='ocrx_word' id='word_1_96' title='bbox 748 1518 820 1548; x_wconf 96'>sind</span>\n <span class='ocrx_word' id='word_1_97' title='bbox 834 1524 887 1548; x_wconf 96'>am</span>\n <span class='ocrx_word' id='word_1_98' title='bbox 906 1518 1103 1549; x_wconf 95'>15.07.2018</span>\n <span class='ocrx_word' id='word_1_99' title='bbox 1116 1518 1207 1557; x_wconf 95'>fällig.</span>\n </span>\n <span class='ocr_line' id='line_1_21' title="bbox 311 1598 2167 1658; baseline 0.003 -23.038; x_size 39; x_descenders 8; x_ascenders 7">\n <span class='ocrx_word' id='word_1_100' title='bbox 311 1601 402 1658; x_wconf 96'>Bitte</span>\n <span class='ocrx_word' id='word_1_101' title='bbox 420 1601 545 1658; x_wconf 96'>zahlen</span>\n <span class='ocrx_word' id='word_1_102' title='bbox 556 1604 613 1636; x_wconf 96'>Sie</span>\n <span class='ocrx_word' id='word_1_103' title='bbox 626 1605 692 1636; x_wconf 96'>den</span>\n <span class='ocrx_word' id='word_1_104' title='bbox 707 1605 824 1644; x_wconf 96'>Betrag</span>\n <span class='ocrx_word' id='word_1_105' title='bbox 837 1612 901 1636; x_wconf 96'>von</span>\n <span class='ocrx_word' id='word_1_106' title='bbox 917 1605 1014 1640; x_wconf 96'>52,50</span>\n <span class='ocrx_word' id='word_1_107' title='bbox 1031 1605 1122 1636; x_wconf 96'>EUR.</span>\n <span class='ocrx_word' id='word_1_108' title='bbox 1140 1606 1198 1637; x_wconf 96'>Für</span>\n <span class='ocrx_word' id='word_1_109' title='bbox 1209 1606 1261 1637; x_wconf 96'>die</span>\n <span class='ocrx_word' id='word_1_110' title='bbox 1276 1598 1510 1645; x_wconf 96'>Überweisung</span>\n <span class='ocrx_word' id='word_1_111' title='bbox 1526 1607 1636 1638; x_wconf 96'>haben</span>\n <span class='ocrx_word' id='word_1_112' title='bbox 1649 1608 1701 1638; x_wconf 95'>wir</span>\n <span class='ocrx_word' id='word_1_113' title='bbox 1714 1608 1767 1638; x_wconf 95'>ein</span>\n <span class='ocrx_word' id='word_1_114' title='bbox 1780 1608 2106 1647; x_wconf 91'>Zahlungsformular</span>\n <span class='ocrx_word' id='word_1_115' title='bbox 2117 1610 2167 1641; x_wconf 96'>für</span>\n </span>\n <span class='ocr_line' id='line_1_22' title="bbox 324 1648 601 1680; baseline 0 0; x_size 40.27866; x_descenders 8.2786608; x_ascenders 8">\n <span class='ocrx_word' id='word_1_116' title='bbox 324 1648 382 1680; x_wconf 96'>Sie</span>\n <span class='ocrx_word' id='word_1_117' title='bbox 395 1649 601 1680; x_wconf 96'>vorbereitet.</span>\n </span>\n <span class='ocr_line' id='line_1_23' title="bbox 324 1735 2019 1776; baseline 0.002 -8; x_size 39; x_descenders 7; x_ascenders 8">\n <span class='ocrx_word' id='word_1_118' title='bbox 324 1737 481 1768; x_wconf 96'>Möchten</span>\n <span class='ocrx_word' id='word_1_119' title='bbox 497 1735 555 1767; x_wconf 96'>Sie</span>\n <span class='ocrx_word' id='word_1_120' title='bbox 568 1737 634 1767; x_wconf 93'>den</span>\n <span class='ocrx_word' id='word_1_121' title='bbox 651 1736 947 1774; x_wconf 92'>Rundfunkbeitrag</span>\n <span class='ocrx_word' id='word_1_122' title='bbox 960 1736 1089 1767; x_wconf 96'>einfach</span>\n <span class='ocrx_word' id='word_1_123' title='bbox 1103 1737 1168 1767; x_wconf 96'>und</span>\n <span class='ocrx_word' id='word_1_124' title='bbox 1182 1737 1326 1775; x_wconf 96'>bequem</span>\n <span class='ocrx_word' id='word_1_125' title='bbox 1341 1744 1399 1776; x_wconf 96'>per</span>\n <span class='ocrx_word' id='word_1_126' title='bbox 1412 1738 1595 1769; x_wconf 96'>Lastschrift</span>\n <span class='ocrx_word' id='word_1_127' title='bbox 1610 1738 1750 1770; x_wconf 95'>zahlen?</span>\n <span class='ocrx_word' id='word_1_128' title='bbox 1767 1740 1818 1771; x_wconf 95'>Mit</span>\n <span class='ocrx_word' id='word_1_129' title='bbox 1832 1741 1891 1771; x_wconf 93'>der</span>\n <span class='ocrx_word' id='word_1_130' title='bbox 1907 1741 2019 1772; x_wconf 91'>Erteilu</span>\n </span>\n <span class='ocr_line' id='line_1_24' title="bbox 483 1742 2285 1785; baseline -0.007 0; x_size 39.27866; x_descenders 8.2786608; x_ascenders 7">\n <span class='ocrx_word' id='word_1_131' title='bbox 483 1781 488 1785; x_wconf 50'>2</span>\n <span class='ocrx_word' id='word_1_132' title='bbox 855 1780 860 1784; x_wconf 76'>:</span>\n <span class='ocrx_word' id='word_1_133' title='bbox 1154 1780 1166 1785; x_wconf 2'>rn</span>\n <span class='ocrx_word' id='word_1_134' title='bbox 1255 1781 1324 1785; x_wconf 37'>ET</span>\n <span class='ocrx_word' id='word_1_135' title='bbox 1739 1765 1744 1769; x_wconf 21'>‘</span>\n <span class='ocrx_word' id='word_1_136' title='bbox 2025 1749 2044 1772; x_wconf 96'>ng</span>\n <span class='ocrx_word' id='word_1_137' title='bbox 2081 1743 2146 1773; x_wconf 93'>des</span>\n <span class='ocrx_word' id='word_1_138' title='bbox 2163 1742 2285 1774; x_wconf 91'>SEPA-</span>\n </span>\n <span class='ocr_line' id='line_1_25' title="bbox 323 1745 2249 1831; baseline 0.003 -19; x_size 40; x_descenders 9; x_ascenders 8">\n <span class='ocrx_word' id='word_1_139' title='bbox 323 1781 670 1829; x_wconf 91'>Lastschriftmandats</span>\n <span class='ocrx_word' id='word_1_140' title='bbox 684 1780 816 1811; x_wconf 96'>werden</span>\n <span class='ocrx_word' id='word_1_141' title='bbox 830 1780 884 1810; x_wconf 93'>die</span>\n <span class='ocrx_word' id='word_1_142' title='bbox 899 1780 1216 1819; x_wconf 91'>Rundfunkbeiträge</span>\n <span class='ocrx_word' id='word_1_143' title='bbox 1230 1781 1346 1819; x_wconf 96'>künftig</span>\n <span class='ocrx_word' id='word_1_144' title='bbox 1360 1788 1424 1812; x_wconf 96'>von</span>\n <span class='ocrx_word' id='word_1_145' title='bbox 1441 1782 1540 1812; x_wconf 95'>Ihrem</span>\n <span class='ocrx_word' id='word_1_146' title='bbox 1556 1783 1659 1814; x_wconf 96'>Konto</span>\n <span class='ocrx_word' id='word_1_147' title='bbox 1673 1783 1895 1823; x_wconf 96'>eingezogen.</span>\n <span class='ocrx_word' id='word_1_148' title='bbox 1912 1784 2026 1817; x_wconf 96'>Gerne</span>\n <span class='ocrx_word' id='word_1_149' title='bbox 2040 1745 2173 1831; x_wconf 25'>Können</span>\n <span class='ocrx_word' id='word_1_150' title='bbox 2191 1787 2249 1819; x_wconf 58'>sie</span>\n </span>\n <span class='ocr_line' id='line_1_26' title="bbox 323 1823 2251 1866; baseline 0.004 -11; x_size 40; x_descenders 9; x_ascenders 8">\n <span class='ocrx_word' id='word_1_151' title='bbox 323 1825 441 1856; x_wconf 96'>hierfür</span>\n <span class='ocrx_word' id='word_1_152' title='bbox 452 1824 518 1855; x_wconf 93'>das</span>\n <span class='ocrx_word' id='word_1_153' title='bbox 533 1823 724 1862; x_wconf 92'>beigefügte</span>\n <span class='ocrx_word' id='word_1_154' title='bbox 739 1823 900 1853; x_wconf 96'>Formular</span>\n <span class='ocrx_word' id='word_1_155' title='bbox 912 1823 1109 1854; x_wconf 96'>verwenden</span>\n <span class='ocrx_word' id='word_1_156' title='bbox 1123 1823 1203 1854; x_wconf 94'>oder</span>\n <span class='ocrx_word' id='word_1_157' title='bbox 1218 1823 1284 1854; x_wconf 96'>Ihre</span>\n <span class='ocrx_word' id='word_1_158' title='bbox 1300 1824 1403 1854; x_wconf 96'>Daten</span>\n <span class='ocrx_word' id='word_1_159' title='bbox 1418 1827 1510 1855; x_wconf 93'>unter</span>\n <span class='ocrx_word' id='word_1_160' title='bbox 1524 1826 1898 1866; x_wconf 92'>rundfunkbeitrag.de</span>\n <span class='ocrx_word' id='word_1_161' title='bbox 1912 1829 2022 1860; x_wconf 96'>online</span>\n <span class='ocrx_word' id='word_1_162' title='bbox 2036 1830 2251 1862; x_wconf 96'>übermitteln.</span>\n </span>\n <span class='ocr_line' id='line_1_27' title="bbox 322 1910 766 1944; baseline -0.007 0; x_size 40.27866; x_descenders 8.2786608; x_ascenders 8">\n <span class='ocrx_word' id='word_1_163' title='bbox 322 1914 375 1944; x_wconf 96'>Mit</span>\n <span class='ocrx_word' id='word_1_164' title='bbox 388 1912 615 1944; x_wconf 95'>freundlichen</span>\n <span class='ocrx_word' id='word_1_165' title='bbox 629 1910 766 1942; x_wconf 96'>Grüßen</span>\n </span>\n <span class='ocr_line' id='line_1_28' title="bbox 323 1998 1351 2039; baseline -0.003 -6; x_size 39; x_descenders 8; x_ascenders 7">\n <span class='ocrx_word' id='word_1_166' title='bbox 323 2002 368 2033; x_wconf 93'>Ihr</span>\n <span class='ocrx_word' id='word_1_167' title='bbox 383 1999 665 2039; x_wconf 91'>Beitragsservice</span>\n <span class='ocrx_word' id='word_1_168' title='bbox 677 2005 742 2030; x_wconf 96'>von</span>\n <span class='ocrx_word' id='word_1_169' title='bbox 755 1998 850 2033; x_wconf 95'>ARD,</span>\n <span class='ocrx_word' id='word_1_170' title='bbox 865 1998 943 2029; x_wconf 96'>ZDF</span>\n <span class='ocrx_word' id='word_1_171' title='bbox 958 1999 1022 2030; x_wconf 96'>und</span>\n <span class='ocrx_word' id='word_1_172' title='bbox 1039 1998 1351 2031; x_wconf 96'>Deutschlandradio</span>\n </span>\n <span class='ocr_line' id='line_1_29' title="bbox 320 2138 985 2186; baseline -0.009 -11; x_size 44; x_descenders 12; x_ascenders 8">\n <span class='ocrx_word' id='word_1_173' title='bbox 320 2142 367 2175; x_wconf 96'>So</span>\n <span class='ocrx_word' id='word_1_174' title='bbox 382 2141 558 2174; x_wconf 96'>errechnet</span>\n <span class='ocrx_word' id='word_1_175' title='bbox 573 2139 643 2186; x_wconf 96'>sich</span>\n <span class='ocrx_word' id='word_1_176' title='bbox 656 2139 716 2178; x_wconf 96'>der</span>\n <span class='ocrx_word' id='word_1_177' title='bbox 728 2138 985 2182; x_wconf 96'>Gesamtbetrag</span>\n </span>\n <span class='ocr_line' id='line_1_30' title="bbox 322 2184 2113 2231; baseline 0 -13; x_size 37; x_descenders 7; x_ascenders 12">\n <span class='ocrx_word' id='word_1_178' title='bbox 322 2185 498 2230; x_wconf 67'>"Buchungen:</span>\n <span class='ocrx_word' id='word_1_179' title='bbox 648 2184 978 2222; x_wconf 22'>a,</span>\n <span class='ocrx_word' id='word_1_180' title='bbox 1063 2204 1072 2207; x_wconf 24'>2</span>\n <span class='ocrx_word' id='word_1_181' title='bbox 1153 2209 1155 2211; x_wconf 11'>'</span>\n <span class='ocrx_word' id='word_1_182' title='bbox 1647 2186 1677 2231; x_wconf 77'>it</span>\n <span class='ocrx_word' id='word_1_183' title='bbox 2100 2210 2113 2223; x_wconf 49'>;</span>\n </span>\n <span class='ocr_line' id='line_1_31' title="bbox 511 2197 2284 2271; baseline -0.006 -10; x_size 50.623604; x_descenders 8.6236048; x_ascenders 17">\n <span class='ocrx_word' id='word_1_184' title='bbox 511 2236 547 2261; x_wconf 96'>Ihr</span>\n <span class='ocrx_word' id='word_1_185' title='bbox 560 2234 724 2260; x_wconf 94'>Kontostand</span>\n <span class='ocrx_word' id='word_1_186' title='bbox 735 2241 778 2259; x_wconf 96'>am</span>\n <span class='ocrx_word' id='word_1_187' title='bbox 789 2235 952 2260; x_wconf 95'>22.03.2018</span>\n <span class='ocrx_word' id='word_1_188' title='bbox 1679 2203 1793 2224; x_wconf 16'>es</span>\n <span class='ocrx_word' id='word_1_189' title='bbox 1940 2197 2097 2266; x_wconf 0'>ANNERDDNL</span>\n <span class='ocrx_word' id='word_1_190' title='bbox 2155 2204 2284 2271; x_wconf 25'>en</span>\n </span>\n <span class='ocr_line' id='line_1_32' title="bbox 318 2260 2302 2319; baseline -0.007 -7; x_size 46.27866; x_descenders 8.2786608; x_ascenders 14">\n <span class='ocrx_word' id='word_1_191' title='bbox 318 2285 447 2312; x_wconf 93'>04.04.18</span>\n <span class='ocrx_word' id='word_1_192' title='bbox 509 2281 761 2313; x_wconf 82'>Zahlungseingang</span>\n <span class='ocrx_word' id='word_1_193' title='bbox 772 2288 832 2307; x_wconf 96'>vom</span>\n <span class='ocrx_word' id='word_1_194' title='bbox 843 2283 1005 2307; x_wconf 96'>03.04.2018</span>\n <span class='ocrx_word' id='word_1_195' title='bbox 2211 2260 2302 2319; x_wconf 31'>E35</span>\n </span>\n <span class='ocr_line' id='line_1_33' title="bbox 318 2316 2303 2367; baseline -0.007 -7; x_size 42; x_descenders 11; x_ascenders 12">\n <span class='ocrx_word' id='word_1_196' title='bbox 318 2333 447 2360; x_wconf 83'>26.06.18</span>\n <span class='ocrx_word' id='word_1_197' title='bbox 511 2329 768 2360; x_wconf 60'>Rundfunkbeiträge</span>\n <span class='ocrx_word' id='word_1_198' title='bbox 779 2329 817 2354; x_wconf 96'>für</span>\n <span class='ocrx_word' id='word_1_199' title='bbox 830 2330 838 2355; x_wconf 95'>1</span>\n <span class='ocrx_word' id='word_1_200' title='bbox 854 2329 992 2362; x_wconf 96'>Wohnung</span>\n <span class='ocrx_word' id='word_1_201' title='bbox 1557 2336 1675 2361; x_wconf 93'>06.2018</span>\n <span class='ocrx_word' id='word_1_202' title='bbox 1688 2350 1696 2353; x_wconf 85'>-</span>\n <span class='ocrx_word' id='word_1_203' title='bbox 1706 2337 1824 2362; x_wconf 85'>08.2018</span>\n <span class='ocrx_word' id='word_1_204' title='bbox 2212 2316 2303 2367; x_wconf 59'>52.50</span>\n </span>\n <span class='ocr_line' id='line_1_34' title="bbox 1580 2364 2301 2422; baseline -0.003 -15; x_size 34; x_descenders 9; x_ascenders 6">\n <span class='ocrx_word' id='word_1_205' title='bbox 1580 2378 1799 2422; x_wconf 94'>Gesamtbetrag</span>\n <span class='ocrx_word' id='word_1_206' title='bbox 2209 2396 2254 2420; x_wconf 68'>52</span>\n <span class='ocrx_word' id='word_1_207' title='bbox 2261 2364 2301 2420; x_wconf 58'>=</span>\n </span>\n <span class='ocr_line' id='line_1_35' title="bbox 499 3466 2459 3486; baseline -0.009 5; x_size 29; x_descenders 8; x_ascenders 5">\n <span class='ocrx_word' id='word_1_208' title='bbox 884 3458 1132 3486; x_wconf 26'>EEE</span>\n <span class='ocrx_word' id='word_1_209' title='bbox 1843 3458 1886 3486; x_wconf 46'>nn</span>\n <span class='ocrx_word' id='word_1_210' title='bbox 1913 3466 2430 3486; x_wconf 38'>[nn</span>\n <span class='ocrx_word' id='word_1_211' title='bbox 2457 3475 2459 3477; x_wconf 25'>|</span>\n </span>\n </p>\n </div>\n</div>\n`; const domParser = new DOMParser(); let hasRecognizedText = false; if (hocr) { const notEmptyWords = Array.from( domParser.parseFromString(hocr, "text/html").querySelectorAll("span.ocrx_word:not(:empty)") ); for (const word of notEmptyWords) { if (word.textContent?.trim()) { hasRecognizedText = true; break; } } } console.timeEnd("test") console.log(hasRecognizedText);
Regex
console.time("test"); const hocr = `<div class='ocr_page' id='page_1' title='image ""; bbox 0 0 2460 3486; ppageno 0'>\n <div class='ocr_carea' id='block_1_1' title="bbox 3 0 2459 3486">\n <p class='ocr_par' id='par_1_1' lang='deu' title="bbox 0 0 2459 3486">\n <span class='ocr_line' id='line_1_1' title="bbox 11 0 2052 19; baseline -0.004 -1; x_size 25.174164; x_descenders 5.1741633; x_ascenders 5">\n <span class='ocrx_word' id='word_1_1' title='bbox 229 0 261 24; x_wconf 10'>111</span>\n <span class='ocrx_word' id='word_1_2' title='bbox 1351 0 1639 24; x_wconf 34'>EEE</span>\n </span>\n <span class='ocr_line' id='line_1_2' title="bbox 990 100 2355 217; baseline -0.012 -50; x_size 76; x_descenders 19; x_ascenders 25">\n <span class='ocrx_word' id='word_1_3' title='bbox 990 152 1125 217; x_wconf 24'>ER</span>\n <span class='ocrx_word' id='word_1_4' title='bbox 1438 123 1630 181; x_wconf 92'>ARD®</span>\n <span class='ocrx_word' id='word_1_5' title='bbox 1686 102 1880 212; x_wconf 70'>Cr</span>\n <span class='ocrx_word' id='word_1_6' title='bbox 1977 115 2258 163; x_wconf 15'>EEE</span>\n </span>\n <span class='ocr_line' id='line_1_3' title="bbox 448 168 2357 296; baseline -0.007 -50.027; x_size 40.362125; x_descenders 8.210145; x_ascenders 8.3506098">\n <span class='ocrx_word' id='word_1_7' title='bbox 448 168 628 275; x_wconf 35'>USE</span>\n <span class='ocrx_word' id='word_1_8' title='bbox 627 198 764 252; x_wconf 0'>TE</span>\n <span class='ocrx_word' id='word_1_9' title='bbox 778 168 924 275; x_wconf 0'>Ldlhe</span>\n <span class='ocrx_word' id='word_1_10' title='bbox 1045 210 1128 296; x_wconf 39'>At</span>\n <span class='ocrx_word' id='word_1_11' title='bbox 1352 199 1916 249; x_wconf 5'>ee</span>\n <span class='ocrx_word' id='word_1_12' title='bbox 2315 199 2357 249; x_wconf 0'>E</span>\n </span>\n <span class='ocr_line' id='line_1_4' title="bbox 716 253 2067 317; baseline -0.022 1; x_size 67.038216; x_descenders 11.038215; x_ascenders 24">\n <span class='ocrx_word' id='word_1_13' title='bbox 716 283 839 317; x_wconf 22'>RS]</span>\n <span class='ocrx_word' id='word_1_14' title='bbox 925 260 959 314; x_wconf 0'>a1</span>\n <span class='ocrx_word' id='word_1_15' title='bbox 992 253 1045 302; x_wconf 7'>5</span>\n <span class='ocrx_word' id='word_1_16' title='bbox 1713 255 2067 293; x_wconf 92'>BEITRAGSSERVICE</span>\n </span>\n <span class='ocr_line' id='line_1_5' title="bbox 452 264 866 360; baseline -0.058 -15; x_size 66; x_descenders 13; x_ascenders 19">\n <span class='ocrx_word' id='word_1_17' title='bbox 452 280 825 360; x_wconf 90'>rundfunkbeitrag</span>\n <span class='ocrx_word' id='word_1_18' title='bbox 829 264 866 339; x_wconf 0'>\\)</span>\n </span>\n <span class='ocr_line' id='line_1_6' title="bbox 1713 489 2083 516; baseline -0.011 0; x_size 32.55394; x_descenders 6.5539398; x_ascenders 7">\n <span class='ocrx_word' id='word_1_19' title='bbox 1713 491 1761 516; x_wconf 96'>Sie</span>\n <span class='ocrx_word' id='word_1_20' title='bbox 1772 489 1918 516; x_wconf 96'>erreichen</span>\n <span class='ocrx_word' id='word_1_21' title='bbox 1932 495 1989 514; x_wconf 96'>uns</span>\n <span class='ocrx_word' id='word_1_22' title='bbox 2000 489 2083 513; x_wconf 96'>unter</span>\n </span>\n <span class='ocr_line' id='line_1_7' title="bbox 485 523 2108 587; baseline -0.013 -19; x_size 40.362125; x_descenders 8.210145; x_ascenders 8.3506098">\n <span class='ocrx_word' id='word_1_23' title='bbox 485 544 521 568; x_wconf 90'>07</span>\n <span class='ocrx_word' id='word_1_24' title='bbox 532 544 610 568; x_wconf 72'>2FC7</span>\n <span class='ocrx_word' id='word_1_25' title='bbox 622 542 747 568; x_wconf 60'>F7B11C</span>\n <span class='ocrx_word' id='word_1_26' title='bbox 759 542 829 567; x_wconf 39'>DO21</span>\n <span class='ocrx_word' id='word_1_27' title='bbox 845 541 925 566; x_wconf 34'>A8BSD</span>\n <span class='ocrx_word' id='word_1_28' title='bbox 1030 568 1055 578; x_wconf 26'>...</span>\n <span class='ocrx_word' id='word_1_29' title='bbox 1712 526 1828 587; x_wconf 46'>Teen,</span>\n <span class='ocrx_word' id='word_1_30' title='bbox 1844 526 1870 586; x_wconf 0'>1</span>\n <span class='ocrx_word' id='word_1_31' title='bbox 1873 526 1926 585; x_wconf 24'>Er</span>\n <span class='ocrx_word' id='word_1_32' title='bbox 1938 525 1996 584; x_wconf 48'>=</span>\n <span class='ocrx_word' id='word_1_33' title='bbox 2004 524 2038 583; x_wconf 0'>ae</span>\n <span class='ocrx_word' id='word_1_34' title='bbox 2069 523 2108 582; x_wconf 78'>2</span>\n </span>\n <span class='ocr_line' id='line_1_8' title="bbox 326 563 2121 633; baseline -0.007 -11; x_size 29; x_descenders 4; x_ascenders 11">\n <span class='ocrx_word' id='word_1_35' title='bbox 326 563 378 632; x_wconf 91'>P</span>\n <span class='ocrx_word' id='word_1_36' title='bbox 400 577 454 608; x_wconf 95'>DV</span>\n <span class='ocrx_word' id='word_1_37' title='bbox 482 584 517 608; x_wconf 94'>06</span>\n <span class='ocrx_word' id='word_1_38' title='bbox 545 583 608 612; x_wconf 96'>0,70</span>\n <span class='ocrx_word' id='word_1_39' title='bbox 648 582 783 609; x_wconf 93'>Deutsche</span>\n <span class='ocrx_word' id='word_1_40' title='bbox 791 584 853 609; x_wconf 93'>Post</span>\n <span class='ocrx_word' id='word_1_41' title='bbox 871 564 903 627; x_wconf 6'>oo</span>\n <span class='ocrx_word' id='word_1_42' title='bbox 965 573 1070 633; x_wconf 53'>IE:</span>\n <span class='ocrx_word' id='word_1_43' title='bbox 1713 567 1843 617; x_wconf 27'>Bean</span>\n <span class='ocrx_word' id='word_1_44' title='bbox 1882 598 1921 612; x_wconf 94'>aus</span>\n <span class='ocrx_word' id='word_1_45' title='bbox 1929 592 1976 611; x_wconf 96'>dem</span>\n <span class='ocrx_word' id='word_1_46' title='bbox 1985 592 2008 610; x_wconf 77'>dt.</span>\n <span class='ocrx_word' id='word_1_47' title='bbox 2020 590 2121 620; x_wconf 95'>Festnetz,</span>\n </span>\n <span class='ocr_line' id='line_1_9' title="bbox 325 615 2195 693; baseline -0.008 -43; x_size 27; x_descenders 5; x_ascenders 8">\n <span class='ocrx_word' id='word_1_48' title='bbox 325 641 337 651; x_wconf 90'>*</span>\n <span class='ocrx_word' id='word_1_49' title='bbox 346 641 418 666; x_wconf 80'>4557</span>\n <span class='ocrx_word' id='word_1_50' title='bbox 429 642 440 651; x_wconf 80'>*</span>\n <span class='ocrx_word' id='word_1_51' title='bbox 452 642 575 666; x_wconf 78'>0137861</span>\n <span class='ocrx_word' id='word_1_52' title='bbox 579 623 588 670; x_wconf 78'>*</span>\n <span class='ocrx_word' id='word_1_53' title='bbox 947 623 1045 693; x_wconf 35'>Se</span>\n <span class='ocrx_word' id='word_1_54' title='bbox 1712 622 1739 640; x_wconf 93'>60</span>\n <span class='ocrx_word' id='word_1_55' title='bbox 1747 619 1865 640; x_wconf 89'>Cent/Anruf</span>\n <span class='ocrx_word' id='word_1_56' title='bbox 1874 625 1911 639; x_wconf 96'>aus</span>\n <span class='ocrx_word' id='word_1_57' title='bbox 1921 619 1960 638; x_wconf 96'>den</span>\n <span class='ocrx_word' id='word_1_58' title='bbox 1969 619 1994 638; x_wconf 96'>dt.</span>\n <span class='ocrx_word' id='word_1_59' title='bbox 2005 615 2195 638; x_wconf 90'>Mobilfunknetzen)</span>\n </span>\n <span class='ocr_line' id='line_1_10' title="bbox 326 668 1922 699; baseline -0.002 0; x_size 36.55394; x_descenders 6.5539398; x_ascenders 11">\n <span class='ocrx_word' id='word_1_60' title='bbox 326 680 418 699; x_wconf 10'>*.nDns</span>\n <span class='ocrx_word' id='word_1_61' title='bbox 429 681 440 692; x_wconf 72'>*</span>\n <span class='ocrx_word' id='word_1_62' title='bbox 996 678 1065 693; x_wconf 23'>ne:</span>\n <span class='ocrx_word' id='word_1_63' title='bbox 1712 670 1922 697; x_wconf 92'>Servicezeiten</span>\n </span>\n <span class='ocr_line' id='line_1_11' title="bbox 1712 703 2114 740; baseline -0.01 -7; x_size 34; x_descenders 8; x_ascenders 7">\n <span class='ocrx_word' id='word_1_64' title='bbox 1712 708 1820 740; x_wconf 93'>Montag</span>\n <span class='ocrx_word' id='word_1_65' title='bbox 1834 721 1842 724; x_wconf 93'>-</span>\n <span class='ocrx_word' id='word_1_66' title='bbox 1854 706 1953 738; x_wconf 95'>Freitag</span>\n <span class='ocrx_word' id='word_1_67' title='bbox 1966 706 1982 730; x_wconf 92'>7</span>\n <span class='ocrx_word' id='word_1_68' title='bbox 1995 719 2003 722; x_wconf 92'>-</span>\n <span class='ocrx_word' id='word_1_69' title='bbox 2018 706 2050 730; x_wconf 96'>19</span>\n <span class='ocrx_word' id='word_1_70' title='bbox 2063 703 2114 730; x_wconf 96'>Uhr</span>\n </span>\n <span class='ocr_line' id='line_1_12' title="bbox 1713 762 1923 790; baseline -0.01 0; x_size 32.55394; x_descenders 6.5539398; x_ascenders 7">\n <span class='ocrx_word' id='word_1_71' title='bbox 1713 762 1923 790; x_wconf 96'>Postanschrift</span>\n </span>\n <span class='ocr_line' id='line_1_13' title="bbox 3 796 2124 860; baseline -0.013 -11.04; x_size 32.208996; x_descenders 6.2089958; x_ascenders 8">\n <span class='ocrx_word' id='word_1_72' title='bbox 3 819 45 860; x_wconf 29'>er</span>\n <span class='ocrx_word' id='word_1_73' title='bbox 1712 800 1780 826; x_wconf 96'>ARD</span>\n <span class='ocrx_word' id='word_1_74' title='bbox 1791 799 1853 825; x_wconf 96'>ZDF</span>\n <span class='ocrx_word' id='word_1_75' title='bbox 1867 796 2124 824; x_wconf 96'>Deutschlandradio</span>\n </span>\n <span class='ocr_line' id='line_1_14' title="bbox 1713 832 2126 868; baseline -0.01 -6; x_size 33; x_descenders 7; x_ascenders 8">\n <span class='ocrx_word' id='word_1_76' title='bbox 1713 834 1946 868; x_wconf 92'>Beitragsservice,</span>\n <span class='ocrx_word' id='word_1_77' title='bbox 1959 834 2049 859; x_wconf 96'>50656</span>\n <span class='ocrx_word' id='word_1_78' title='bbox 2063 832 2126 858; x_wconf 96'>Köln</span>\n </span>\n <span class='ocr_line' id='line_1_15' title="bbox 1711 890 2067 922; baseline -0.014 -3; x_size 33; x_descenders 7; x_ascenders 8">\n <span class='ocrx_word' id='word_1_79' title='bbox 1711 893 1779 919; x_wconf 92'>Web</span>\n <span class='ocrx_word' id='word_1_80' title='bbox 1791 890 2067 922; x_wconf 92'>rundfunkbeitrag.de</span>\n </span>\n <span class='ocr_line' id='line_1_16' title="bbox 1712 948 1998 976; baseline -0.014 0; x_size 32.208996; x_descenders 6.2089958; x_ascenders 8">\n <span class='ocrx_word' id='word_1_81' title='bbox 1712 950 1811 976; x_wconf 96'>Datum</span>\n <span class='ocrx_word' id='word_1_82' title='bbox 1834 948 1998 974; x_wconf 96'>26.06.2018</span>\n </span>\n <span class='ocr_line' id='line_1_17' title="bbox 1712 1005 2039 1038; baseline -0.012 -6; x_size 32; x_descenders 7; x_ascenders 7">\n <span class='ocrx_word' id='word_1_83' title='bbox 1712 1007 1974 1038; x_wconf 92'>Beitragsnummer</span>\n <span class='ocrx_word' id='word_1_84' title='bbox 1985 1005 2039 1029; x_wconf 96'>250</span>\n </span>\n <span class='ocr_line' id='line_1_18' title="bbox 325 1295 1363 1338; baseline 0.005 -12; x_size 39; x_descenders 8; x_ascenders 7">\n <span class='ocrx_word' id='word_1_85' title='bbox 325 1295 484 1335; x_wconf 96'>Zahlung</span>\n <span class='ocrx_word' id='word_1_86' title='bbox 499 1297 563 1328; x_wconf 93'>der</span>\n <span class='ocrx_word' id='word_1_87' title='bbox 577 1298 924 1337; x_wconf 92'>Rundfunkbeiträge</span>\n <span class='ocrx_word' id='word_1_88' title='bbox 937 1315 949 1321; x_wconf 92'>-</span>\n <span class='ocrx_word' id='word_1_89' title='bbox 964 1300 1284 1338; x_wconf 92'>Beitragsnummer</span>\n <span class='ocrx_word' id='word_1_90' title='bbox 1297 1300 1363 1331; x_wconf 96'>250</span>\n </span>\n <span class='ocr_line' id='line_1_19' title="bbox 325 1427 672 1467; baseline 0 -8; x_size 40; x_descenders 8; x_ascenders 8">\n <span class='ocrx_word' id='word_1_91' title='bbox 325 1427 413 1459; x_wconf 96'>Sehr</span>\n <span class='ocrx_word' id='word_1_92' title='bbox 425 1429 579 1467; x_wconf 96'>geehrter</span>\n <span class='ocrx_word' id='word_1_93' title='bbox 593 1429 672 1460; x_wconf 96'>Herr</span>\n </span>\n <span class='ocr_line' id='line_1_20' title="bbox 326 1516 1207 1557; baseline 0.001 -10; x_size 40; x_descenders 9; x_ascenders 7">\n <span class='ocrx_word' id='word_1_94' title='bbox 326 1516 395 1548; x_wconf 93'>Ihre</span>\n <span class='ocrx_word' id='word_1_95' title='bbox 411 1516 734 1556; x_wconf 92'>Rundfunkbeiträge</span>\n <span class='ocrx_word' id='word_1_96' title='bbox 748 1518 820 1548; x_wconf 96'>sind</span>\n <span class='ocrx_word' id='word_1_97' title='bbox 834 1524 887 1548; x_wconf 96'>am</span>\n <span class='ocrx_word' id='word_1_98' title='bbox 906 1518 1103 1549; x_wconf 95'>15.07.2018</span>\n <span class='ocrx_word' id='word_1_99' title='bbox 1116 1518 1207 1557; x_wconf 95'>fällig.</span>\n </span>\n <span class='ocr_line' id='line_1_21' title="bbox 311 1598 2167 1658; baseline 0.003 -23.038; x_size 39; x_descenders 8; x_ascenders 7">\n <span class='ocrx_word' id='word_1_100' title='bbox 311 1601 402 1658; x_wconf 96'>Bitte</span>\n <span class='ocrx_word' id='word_1_101' title='bbox 420 1601 545 1658; x_wconf 96'>zahlen</span>\n <span class='ocrx_word' id='word_1_102' title='bbox 556 1604 613 1636; x_wconf 96'>Sie</span>\n <span class='ocrx_word' id='word_1_103' title='bbox 626 1605 692 1636; x_wconf 96'>den</span>\n <span class='ocrx_word' id='word_1_104' title='bbox 707 1605 824 1644; x_wconf 96'>Betrag</span>\n <span class='ocrx_word' id='word_1_105' title='bbox 837 1612 901 1636; x_wconf 96'>von</span>\n <span class='ocrx_word' id='word_1_106' title='bbox 917 1605 1014 1640; x_wconf 96'>52,50</span>\n <span class='ocrx_word' id='word_1_107' title='bbox 1031 1605 1122 1636; x_wconf 96'>EUR.</span>\n <span class='ocrx_word' id='word_1_108' title='bbox 1140 1606 1198 1637; x_wconf 96'>Für</span>\n <span class='ocrx_word' id='word_1_109' title='bbox 1209 1606 1261 1637; x_wconf 96'>die</span>\n <span class='ocrx_word' id='word_1_110' title='bbox 1276 1598 1510 1645; x_wconf 96'>Überweisung</span>\n <span class='ocrx_word' id='word_1_111' title='bbox 1526 1607 1636 1638; x_wconf 96'>haben</span>\n <span class='ocrx_word' id='word_1_112' title='bbox 1649 1608 1701 1638; x_wconf 95'>wir</span>\n <span class='ocrx_word' id='word_1_113' title='bbox 1714 1608 1767 1638; x_wconf 95'>ein</span>\n <span class='ocrx_word' id='word_1_114' title='bbox 1780 1608 2106 1647; x_wconf 91'>Zahlungsformular</span>\n <span class='ocrx_word' id='word_1_115' title='bbox 2117 1610 2167 1641; x_wconf 96'>für</span>\n </span>\n <span class='ocr_line' id='line_1_22' title="bbox 324 1648 601 1680; baseline 0 0; x_size 40.27866; x_descenders 8.2786608; x_ascenders 8">\n <span class='ocrx_word' id='word_1_116' title='bbox 324 1648 382 1680; x_wconf 96'>Sie</span>\n <span class='ocrx_word' id='word_1_117' title='bbox 395 1649 601 1680; x_wconf 96'>vorbereitet.</span>\n </span>\n <span class='ocr_line' id='line_1_23' title="bbox 324 1735 2019 1776; baseline 0.002 -8; x_size 39; x_descenders 7; x_ascenders 8">\n <span class='ocrx_word' id='word_1_118' title='bbox 324 1737 481 1768; x_wconf 96'>Möchten</span>\n <span class='ocrx_word' id='word_1_119' title='bbox 497 1735 555 1767; x_wconf 96'>Sie</span>\n <span class='ocrx_word' id='word_1_120' title='bbox 568 1737 634 1767; x_wconf 93'>den</span>\n <span class='ocrx_word' id='word_1_121' title='bbox 651 1736 947 1774; x_wconf 92'>Rundfunkbeitrag</span>\n <span class='ocrx_word' id='word_1_122' title='bbox 960 1736 1089 1767; x_wconf 96'>einfach</span>\n <span class='ocrx_word' id='word_1_123' title='bbox 1103 1737 1168 1767; x_wconf 96'>und</span>\n <span class='ocrx_word' id='word_1_124' title='bbox 1182 1737 1326 1775; x_wconf 96'>bequem</span>\n <span class='ocrx_word' id='word_1_125' title='bbox 1341 1744 1399 1776; x_wconf 96'>per</span>\n <span class='ocrx_word' id='word_1_126' title='bbox 1412 1738 1595 1769; x_wconf 96'>Lastschrift</span>\n <span class='ocrx_word' id='word_1_127' title='bbox 1610 1738 1750 1770; x_wconf 95'>zahlen?</span>\n <span class='ocrx_word' id='word_1_128' title='bbox 1767 1740 1818 1771; x_wconf 95'>Mit</span>\n <span class='ocrx_word' id='word_1_129' title='bbox 1832 1741 1891 1771; x_wconf 93'>der</span>\n <span class='ocrx_word' id='word_1_130' title='bbox 1907 1741 2019 1772; x_wconf 91'>Erteilu</span>\n </span>\n <span class='ocr_line' id='line_1_24' title="bbox 483 1742 2285 1785; baseline -0.007 0; x_size 39.27866; x_descenders 8.2786608; x_ascenders 7">\n <span class='ocrx_word' id='word_1_131' title='bbox 483 1781 488 1785; x_wconf 50'>2</span>\n <span class='ocrx_word' id='word_1_132' title='bbox 855 1780 860 1784; x_wconf 76'>:</span>\n <span class='ocrx_word' id='word_1_133' title='bbox 1154 1780 1166 1785; x_wconf 2'>rn</span>\n <span class='ocrx_word' id='word_1_134' title='bbox 1255 1781 1324 1785; x_wconf 37'>ET</span>\n <span class='ocrx_word' id='word_1_135' title='bbox 1739 1765 1744 1769; x_wconf 21'>‘</span>\n <span class='ocrx_word' id='word_1_136' title='bbox 2025 1749 2044 1772; x_wconf 96'>ng</span>\n <span class='ocrx_word' id='word_1_137' title='bbox 2081 1743 2146 1773; x_wconf 93'>des</span>\n <span class='ocrx_word' id='word_1_138' title='bbox 2163 1742 2285 1774; x_wconf 91'>SEPA-</span>\n </span>\n <span class='ocr_line' id='line_1_25' title="bbox 323 1745 2249 1831; baseline 0.003 -19; x_size 40; x_descenders 9; x_ascenders 8">\n <span class='ocrx_word' id='word_1_139' title='bbox 323 1781 670 1829; x_wconf 91'>Lastschriftmandats</span>\n <span class='ocrx_word' id='word_1_140' title='bbox 684 1780 816 1811; x_wconf 96'>werden</span>\n <span class='ocrx_word' id='word_1_141' title='bbox 830 1780 884 1810; x_wconf 93'>die</span>\n <span class='ocrx_word' id='word_1_142' title='bbox 899 1780 1216 1819; x_wconf 91'>Rundfunkbeiträge</span>\n <span class='ocrx_word' id='word_1_143' title='bbox 1230 1781 1346 1819; x_wconf 96'>künftig</span>\n <span class='ocrx_word' id='word_1_144' title='bbox 1360 1788 1424 1812; x_wconf 96'>von</span>\n <span class='ocrx_word' id='word_1_145' title='bbox 1441 1782 1540 1812; x_wconf 95'>Ihrem</span>\n <span class='ocrx_word' id='word_1_146' title='bbox 1556 1783 1659 1814; x_wconf 96'>Konto</span>\n <span class='ocrx_word' id='word_1_147' title='bbox 1673 1783 1895 1823; x_wconf 96'>eingezogen.</span>\n <span class='ocrx_word' id='word_1_148' title='bbox 1912 1784 2026 1817; x_wconf 96'>Gerne</span>\n <span class='ocrx_word' id='word_1_149' title='bbox 2040 1745 2173 1831; x_wconf 25'>Können</span>\n <span class='ocrx_word' id='word_1_150' title='bbox 2191 1787 2249 1819; x_wconf 58'>sie</span>\n </span>\n <span class='ocr_line' id='line_1_26' title="bbox 323 1823 2251 1866; baseline 0.004 -11; x_size 40; x_descenders 9; x_ascenders 8">\n <span class='ocrx_word' id='word_1_151' title='bbox 323 1825 441 1856; x_wconf 96'>hierfür</span>\n <span class='ocrx_word' id='word_1_152' title='bbox 452 1824 518 1855; x_wconf 93'>das</span>\n <span class='ocrx_word' id='word_1_153' title='bbox 533 1823 724 1862; x_wconf 92'>beigefügte</span>\n <span class='ocrx_word' id='word_1_154' title='bbox 739 1823 900 1853; x_wconf 96'>Formular</span>\n <span class='ocrx_word' id='word_1_155' title='bbox 912 1823 1109 1854; x_wconf 96'>verwenden</span>\n <span class='ocrx_word' id='word_1_156' title='bbox 1123 1823 1203 1854; x_wconf 94'>oder</span>\n <span class='ocrx_word' id='word_1_157' title='bbox 1218 1823 1284 1854; x_wconf 96'>Ihre</span>\n <span class='ocrx_word' id='word_1_158' title='bbox 1300 1824 1403 1854; x_wconf 96'>Daten</span>\n <span class='ocrx_word' id='word_1_159' title='bbox 1418 1827 1510 1855; x_wconf 93'>unter</span>\n <span class='ocrx_word' id='word_1_160' title='bbox 1524 1826 1898 1866; x_wconf 92'>rundfunkbeitrag.de</span>\n <span class='ocrx_word' id='word_1_161' title='bbox 1912 1829 2022 1860; x_wconf 96'>online</span>\n <span class='ocrx_word' id='word_1_162' title='bbox 2036 1830 2251 1862; x_wconf 96'>übermitteln.</span>\n </span>\n <span class='ocr_line' id='line_1_27' title="bbox 322 1910 766 1944; baseline -0.007 0; x_size 40.27866; x_descenders 8.2786608; x_ascenders 8">\n <span class='ocrx_word' id='word_1_163' title='bbox 322 1914 375 1944; x_wconf 96'>Mit</span>\n <span class='ocrx_word' id='word_1_164' title='bbox 388 1912 615 1944; x_wconf 95'>freundlichen</span>\n <span class='ocrx_word' id='word_1_165' title='bbox 629 1910 766 1942; x_wconf 96'>Grüßen</span>\n </span>\n <span class='ocr_line' id='line_1_28' title="bbox 323 1998 1351 2039; baseline -0.003 -6; x_size 39; x_descenders 8; x_ascenders 7">\n <span class='ocrx_word' id='word_1_166' title='bbox 323 2002 368 2033; x_wconf 93'>Ihr</span>\n <span class='ocrx_word' id='word_1_167' title='bbox 383 1999 665 2039; x_wconf 91'>Beitragsservice</span>\n <span class='ocrx_word' id='word_1_168' title='bbox 677 2005 742 2030; x_wconf 96'>von</span>\n <span class='ocrx_word' id='word_1_169' title='bbox 755 1998 850 2033; x_wconf 95'>ARD,</span>\n <span class='ocrx_word' id='word_1_170' title='bbox 865 1998 943 2029; x_wconf 96'>ZDF</span>\n <span class='ocrx_word' id='word_1_171' title='bbox 958 1999 1022 2030; x_wconf 96'>und</span>\n <span class='ocrx_word' id='word_1_172' title='bbox 1039 1998 1351 2031; x_wconf 96'>Deutschlandradio</span>\n </span>\n <span class='ocr_line' id='line_1_29' title="bbox 320 2138 985 2186; baseline -0.009 -11; x_size 44; x_descenders 12; x_ascenders 8">\n <span class='ocrx_word' id='word_1_173' title='bbox 320 2142 367 2175; x_wconf 96'>So</span>\n <span class='ocrx_word' id='word_1_174' title='bbox 382 2141 558 2174; x_wconf 96'>errechnet</span>\n <span class='ocrx_word' id='word_1_175' title='bbox 573 2139 643 2186; x_wconf 96'>sich</span>\n <span class='ocrx_word' id='word_1_176' title='bbox 656 2139 716 2178; x_wconf 96'>der</span>\n <span class='ocrx_word' id='word_1_177' title='bbox 728 2138 985 2182; x_wconf 96'>Gesamtbetrag</span>\n </span>\n <span class='ocr_line' id='line_1_30' title="bbox 322 2184 2113 2231; baseline 0 -13; x_size 37; x_descenders 7; x_ascenders 12">\n <span class='ocrx_word' id='word_1_178' title='bbox 322 2185 498 2230; x_wconf 67'>"Buchungen:</span>\n <span class='ocrx_word' id='word_1_179' title='bbox 648 2184 978 2222; x_wconf 22'>a,</span>\n <span class='ocrx_word' id='word_1_180' title='bbox 1063 2204 1072 2207; x_wconf 24'>2</span>\n <span class='ocrx_word' id='word_1_181' title='bbox 1153 2209 1155 2211; x_wconf 11'>'</span>\n <span class='ocrx_word' id='word_1_182' title='bbox 1647 2186 1677 2231; x_wconf 77'>it</span>\n <span class='ocrx_word' id='word_1_183' title='bbox 2100 2210 2113 2223; x_wconf 49'>;</span>\n </span>\n <span class='ocr_line' id='line_1_31' title="bbox 511 2197 2284 2271; baseline -0.006 -10; x_size 50.623604; x_descenders 8.6236048; x_ascenders 17">\n <span class='ocrx_word' id='word_1_184' title='bbox 511 2236 547 2261; x_wconf 96'>Ihr</span>\n <span class='ocrx_word' id='word_1_185' title='bbox 560 2234 724 2260; x_wconf 94'>Kontostand</span>\n <span class='ocrx_word' id='word_1_186' title='bbox 735 2241 778 2259; x_wconf 96'>am</span>\n <span class='ocrx_word' id='word_1_187' title='bbox 789 2235 952 2260; x_wconf 95'>22.03.2018</span>\n <span class='ocrx_word' id='word_1_188' title='bbox 1679 2203 1793 2224; x_wconf 16'>es</span>\n <span class='ocrx_word' id='word_1_189' title='bbox 1940 2197 2097 2266; x_wconf 0'>ANNERDDNL</span>\n <span class='ocrx_word' id='word_1_190' title='bbox 2155 2204 2284 2271; x_wconf 25'>en</span>\n </span>\n <span class='ocr_line' id='line_1_32' title="bbox 318 2260 2302 2319; baseline -0.007 -7; x_size 46.27866; x_descenders 8.2786608; x_ascenders 14">\n <span class='ocrx_word' id='word_1_191' title='bbox 318 2285 447 2312; x_wconf 93'>04.04.18</span>\n <span class='ocrx_word' id='word_1_192' title='bbox 509 2281 761 2313; x_wconf 82'>Zahlungseingang</span>\n <span class='ocrx_word' id='word_1_193' title='bbox 772 2288 832 2307; x_wconf 96'>vom</span>\n <span class='ocrx_word' id='word_1_194' title='bbox 843 2283 1005 2307; x_wconf 96'>03.04.2018</span>\n <span class='ocrx_word' id='word_1_195' title='bbox 2211 2260 2302 2319; x_wconf 31'>E35</span>\n </span>\n <span class='ocr_line' id='line_1_33' title="bbox 318 2316 2303 2367; baseline -0.007 -7; x_size 42; x_descenders 11; x_ascenders 12">\n <span class='ocrx_word' id='word_1_196' title='bbox 318 2333 447 2360; x_wconf 83'>26.06.18</span>\n <span class='ocrx_word' id='word_1_197' title='bbox 511 2329 768 2360; x_wconf 60'>Rundfunkbeiträge</span>\n <span class='ocrx_word' id='word_1_198' title='bbox 779 2329 817 2354; x_wconf 96'>für</span>\n <span class='ocrx_word' id='word_1_199' title='bbox 830 2330 838 2355; x_wconf 95'>1</span>\n <span class='ocrx_word' id='word_1_200' title='bbox 854 2329 992 2362; x_wconf 96'>Wohnung</span>\n <span class='ocrx_word' id='word_1_201' title='bbox 1557 2336 1675 2361; x_wconf 93'>06.2018</span>\n <span class='ocrx_word' id='word_1_202' title='bbox 1688 2350 1696 2353; x_wconf 85'>-</span>\n <span class='ocrx_word' id='word_1_203' title='bbox 1706 2337 1824 2362; x_wconf 85'>08.2018</span>\n <span class='ocrx_word' id='word_1_204' title='bbox 2212 2316 2303 2367; x_wconf 59'>52.50</span>\n </span>\n <span class='ocr_line' id='line_1_34' title="bbox 1580 2364 2301 2422; baseline -0.003 -15; x_size 34; x_descenders 9; x_ascenders 6">\n <span class='ocrx_word' id='word_1_205' title='bbox 1580 2378 1799 2422; x_wconf 94'>Gesamtbetrag</span>\n <span class='ocrx_word' id='word_1_206' title='bbox 2209 2396 2254 2420; x_wconf 68'>52</span>\n <span class='ocrx_word' id='word_1_207' title='bbox 2261 2364 2301 2420; x_wconf 58'>=</span>\n </span>\n <span class='ocr_line' id='line_1_35' title="bbox 499 3466 2459 3486; baseline -0.009 5; x_size 29; x_descenders 8; x_ascenders 5">\n <span class='ocrx_word' id='word_1_208' title='bbox 884 3458 1132 3486; x_wconf 26'>EEE</span>\n <span class='ocrx_word' id='word_1_209' title='bbox 1843 3458 1886 3486; x_wconf 46'>nn</span>\n <span class='ocrx_word' id='word_1_210' title='bbox 1913 3466 2430 3486; x_wconf 38'>[nn</span>\n <span class='ocrx_word' id='word_1_211' title='bbox 2457 3475 2459 3477; x_wconf 25'>|</span>\n </span>\n </p>\n </div>\n</div>\n`; const HOCR_HAS_WORDS_REGEX = new RegExp(/<span[^>]*?ocrx_word[^>]*?>.*?<\/span>/, "g"); const HOCR_EMPTY_WORD_REGEX = new RegExp(/>\s*?<\/span>/); let hasRecognizedText = undefined; if (hocr) { hasRecognizedText = false; const matches = hocr.matchAll(HOCR_HAS_WORDS_REGEX); let match = matches.next(); while (!match.done) { // if NON-EMPTY word found, terminate the iteration if (!HOCR_EMPTY_WORD_REGEX.test(match.value[0])) { hasRecognizedText = true; break; } match = matches.next(); } } console.timeEnd("test"); console.log(hasRecognizedText);
Rendered benchmark preparation results:
Suite status:
<idle, ready to run>
Run tests (2)
Previous results
Fork
Test case name
Result
DOMParser
Regex
Fastest:
N/A
Slowest:
N/A
Latest run results:
No previous run results
This benchmark does not have any results yet. Be the first one
to run it!
Autogenerated LLM Summary
(model
llama3.2:3b
, generated one year ago):
I'm ready to help! Please go ahead and ask your question or provide more context about the problem you're trying to solve. I'll do my best to assist you.
Related benchmarks:
Split vs regexp
Regex vs split/join checking alphanumeric big number
Regex vs split/join checking
Regex vs Split Time
Regex vs Split and pop node name
Comments
Confirm delete:
Do you really want to delete benchmark?