Strings and Cursors

Argentum strings are mere sequences of Unicode code-points (internally encoded in Utf-8).

Argentum string is immutable. They are not indexable.
There is only one way to access string characters - is to read them one-by-one start-to-end.

Argentum String has unique features separating it from other Argentum classes:

  • sys_String is strictly immutable. You cannot modify string characters.
  • There is zero reason in having mutable or pin-reference to sys_String, all references to strings are shared-immutable. So there is a handy alias for shared reference to immutable *sys_String - str.
  • In order to access string characters there should be acquired a sys_Cursor object that holds the current position in the string and extracts the codepoints moving the current position start-to-end as in an input stream. So cursor is mutable but underlying string is not.
  • String class cannot be inherited by other classes.
  • String class cannot be extended with new fields.
s = "Hello";   // s is of type `str` aka `*sys_String` - a shared pointer to characters

// Acquire a cursor
c = s.cursor();  // c is a cursor pointing to the first character of the string

// Read character one by one
c.getCh()  // Returns a Unicode code-point of character `H`
c.getCh()  // Returns a Unicode code-point of character `e`

// Make a copy of cursor
c1 = @c;   // Creates a separate cursor pointing at `l`

// Skip code-points in a loop
loop c.getCh() == 'o';   // skip all characters till 'o' (including)

// Attempts to read after end of string
c.getCh()  // Returns 0 as the end-of-string indicator
c.getCh()  // This and all subsequent calls will return 0

// Cursor assignments and resets
c.set(s);  // Resets cursor at the beginning of string "Hello"
c := c1;    // Makes `c` and `c1` referencing the same cursor pointing at 'l'
c := @c1;   // Makes `c` a distinct cursor pointing to 'l'

Both String and Cursor classes can be extended with methods that perform parsing. For example there is a string.ag module that adds following methods:

String.tokenize(char) // that splits string by char and returns a SharedArray(String)
Cursor.getTill(char)  // that extracts a substring out of string up to given char
Cursor.peekCh()       // that returns the next code-point without removing it from stream

This set of helper functions covers needs of tests, examples and demos, and can be easily extended as needed.

For completeness: the opposite task - string synthesis - is performed by another two runtime library classes:

  • sys_Blob - a generic byte array of variable size that also allows 8-16-32-64 bit integer access and supports utf8 characters runes manipulations. It can produce strings out of byte ranges.
  • sys_StrBuilder descendant of sys_Blob that exposes put* methods for different data types that working together with string interpolations allow to build strings and format user data types using lightweight syntax.

Leave a Reply

Your email address will not be published. Required fields are marked *