Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add String#byteindex, String#byterindex, and MatchData#byteoffset #5518

Merged
merged 4 commits into from Feb 19, 2022

Conversation

@shugo
Copy link
Member

@shugo shugo commented Feb 2, 2022

See https://bugs.ruby-lang.org/issues/13110 for details.

@shugo shugo marked this pull request as ready for review Feb 2, 2022
@shugo shugo force-pushed the byteindex branch 3 times, most recently from afccf12 to 5bdc612 Feb 2, 2022
string.c Outdated Show resolved Hide resolved
@shugo shugo force-pushed the byteindex branch 2 times, most recently from 339950f to cf85543 Feb 7, 2022
re.c Outdated Show resolved Hide resolved
@nurse
Copy link
Member

@nurse nurse commented Feb 18, 2022

The example implementation for checking byte position is as follows:

diff --git a/string.c b/string.c
index 31c3f11045..4554699861 100644
--- a/string.c
+++ b/string.c
@@ -3979,6 +3979,20 @@ rb_str_index_m(int argc, VALUE *argv, VALUE str)
     return LONG2NUM(pos);
 }

+/* whether given pos is valid character boundary or not
+ * Note that in this function, "character" means a code point
+ * (Unicode scalar value), not a grapheme cluster.
+ */
+bool
+str_check_byte_pos(VALUE str, long pos)
+{
+    const char *s = RSTRING_PTR(str);
+    const char *e = RSTRING_END(str);
+    const char *p = s + pos;
+    const char *pp = rb_enc_left_char_head(s, p, e, rb_enc_get(str));
+    return p == pp;
+}
+
 /*
  *  call-seq:
  *    byteindex(substring, offset = 0) -> integer or nil
@@ -4040,6 +4054,10 @@ rb_str_byteindex_m(int argc, VALUE *argv, VALUE str)
         }
     }

+    if (!str_check_byte_pos(str, pos)) {
+        rb_raise(rb_eArgError, "invalid pos");
+    }
+
     if (RB_TYPE_P(sub, T_REGEXP)) {
         if (pos > RSTRING_LEN(str))
             return Qnil;
./ruby -e'p "あいう".byteindex("う", 4)'
-e:1:in `byteindex': invalid pos (ArgumentError)
	from -e:1:in `<main>'

@shugo
Copy link
Member Author

@shugo shugo commented Feb 19, 2022

The example implementation for checking byte position is as follows:

Thank you . I've added the boundary check.

nurse
nurse approved these changes Feb 19, 2022
Copy link
Member

@nurse nurse left a comment

LGTM. Could you add NEWS?

@shugo shugo merged commit c8817d6 into ruby:master Feb 19, 2022
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants