项目作者: tc39

项目描述 :
ECMAScript RegExp Match Indices
高级语言: HTML
项目地址: git://github.com/tc39/proposal-regexp-match-indices.git
创建时间: 2018-05-02T20:16:27Z
项目社区:https://github.com/tc39/proposal-regexp-match-indices

开源协议:BSD 3-Clause "New" or "Revised" License

下载


RegExp Match Indices for ECMAScript

ECMAScript RegExp Match Indices provide additional information about the start and end
indices of captured substrings relative to the start of the input string.

A polyfill can be found in the regexp-match-indices package on NPM.

NOTE: This proposal was previously named “RegExp Match Array Offsets”, but has been renamed
to more accurately represent the current status of the proposal.

Status

Stage: 4
Champion: Ron Buckton (@rbuckton)

For detailed status of this proposal see TODO, below.

Authors

Motivations

Today, ECMAScript RegExp objects can provide information about a match when calling the exec
method. This result is an Array containing information about the substrings that were matched,
along with additional properties to indicate the input string, the index in the input at which
the match was found, as well as a groups object containing the substrings for any named capture
groups.

However, there are several more advanced scenarios where this information may not necessarily be
sufficient. For example, an ECMAScript implementation of TextMate Language syntax highlighting
needs more than just the index of the match, but also the start and end indices for individual
capture groups.

As such, we propose the adoption of an additional indices property on the array result (the
substrings array) of the RegExpBuiltInExec abstract operation (and thus the result from
RegExp.prototype.exec(), String.prototype.match, etc.). This property would itself be an indices array
containing a pair of start and end indices for each captured substring. Any unmatched capture
groups would be undefined, similar to their corresponding element in the substrings array.
In addition, the indices array would itself have a groups property containing the start and end
indices for each named capture group.

NOTE: For performance reasons, indices will only be added to the result if the d flag is specified.

Why Use d For the RegExp Flag

We chose d due to its presence in the word indices, which is the basis for the naming of the feature (i.e.,
lastIndex on a RegExp, index on a match, etc. The character i is already in use for ignore-case, and n has
precedence in other engines for handling capturing vs. non-capturing groups. This is similar to the “sticky” flag
using the y character, since s was used for dot-all.

Why not use o and offsets instead of d and indices? Our goal is to align the name of the property
with the existing nomenclature on RegExp (i.e., lastIndex and index).

Does d have a different meaning in other engines? Yes and no. For the few engines that do have a d flag
(Onigmo, Perl, and java.util.regex), the meanings differ. Onigmo and Perl both use the d flag for
backwards-compatiblity (and Perl’s documentation seems strongly worded towards discouraging its use), while
java.util.regex uses d for the treatment of new-line handling. You can find a full list of the flags supported
by 46 different RegExp engines in flags_comparison.md.

Prior Art

Examples

  1. const re1 = /a+(?<Z>z)?/d;
  2. // indices are relative to start of the input string:
  3. const s1 = "xaaaz";
  4. const m1 = re1.exec(s1);
  5. m1.indices[0][0] === 1;
  6. m1.indices[0][1] === 5;
  7. s1.slice(...m1.indices[0]) === "aaaz";
  8. m1.indices[1][0] === 4;
  9. m1.indices[1][1] === 5;
  10. s1.slice(...m1.indices[1]) === "z";
  11. m1.indices.groups["Z"][0] === 4;
  12. m1.indices.groups["Z"][1] === 5;
  13. s1.slice(...m1.indices.groups["Z"]) === "z";
  14. // capture groups that are not matched return `undefined`:
  15. const m2 = re1.exec("xaaay");
  16. m2.indices[1] === undefined;
  17. m2.indices.groups["Z"] === undefined;

TODO

The following is a high-level list of tasks to progress through each stage of the TC39 proposal process:

Stage 1 Entrance Criteria

  • Identified a “champion“ who will advance the addition.
  • xProse outlining the problem or need and the general shape of a solution.
  • Illustrative examples of usage.
  • High-level API.

Stage 2 Entrance Criteria

Stage 3 Entrance Criteria

Stage 4 Entrance Criteria

  • Test262 acceptance tests have been written for mainline usage scenarios and merged.
  • Two compatible implementations which pass the acceptance tests:
    • V8 (tracking bug) — Shipping in Chrome Canary 91 (V8 v9.0.259)
    • SpiderMonkey (tracking bug) — Shipping in Firefox Nightly 88
    • JavaScriptCore (tracking bug) — Shipping in Safari Technology Preview 122
    • Engine262 (PR#1, PR#2)
  • A pull request has been sent to tc39/ecma262 with the integrated spec text.
  • The ECMAScript editor has signed off on the pull request.