Skip to content

[Feature Request]: support isAlpha et al. on codepoints #27248

Open
@vasslitvinov

Description

@vasslitvinov

Summary of Feature

Description:

I want to be able to invoke the predicates isAlpha(), isDigit(), and so on, within a loop over a string's codepoints.

Currently these predicates are available only on whole strings and include an on-statement. So if I want to use them within my loop, there is a prohibitive amount of overhead of wrapping my codepoint into a string and performing an unnecessary on within isDigit().

Is this issue currently blocking your progress?

No.

Code Sample

Consider the implementation of Arkouda's Strings.isdecimal()

It checks whether each (unicode) character of myString either satisfies isDigit() or is a numeric subscript or superscript.

Currently the implementation does expensive computations if myString.isDigit() fails. Instead I would like for it to do simply:

for cp in myString.codepoints() do
  if ! isCodepointDigit(cp) &&
     // whatever other checks I need to do
     ! isNumericSubSuperScript(cp) then
    return false;
return true;

Ideally, if myString is long enough, I would like it to be a forall loop (see #19112) with a eureka exit (#12700).

Also if the config param useCachedNumCodepoints is true, I would like to check whether myString is an ASCII string and, if so, run much simpler/more efficient code. Currently I am reluctant to to use this route because string.isASCII() is O(string size) if this param is false and I should not be checking this param from Arkouda because it is undocumented / unstable.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions