How Argentum programming language handles object hierarchies

Practical example, comparison with popular languages, operation semantics, multithreading features, internal structure

Original link: reddit

To avoid excessive theorizing, let's examine the reference model of the language using a practical example of a desktop application's data model. We'll design a card set editor, which will consist of cards containing text blocks, images, auto-routing connector lines, and buttons that allow navigating between the cards.

The appearance of our cards.
The appearance of our cards.
Class structure of out test application.
Here we express composition relations by directly including child objects into the owning object: Cards[], Rectangle, Point[]...
For aggregation and association, we use familiar analogs from C++ - weak and shared pointers.

As we can see, there are composite, associative, and aggregative relationships here:

  • Cards belong to the document.
  • Elements belong to cards.
  • Bitmaps belong to graphical blocks.
  • Connectors link to arbitrary blocks through anchor points.
  • Cards are connected to other cards through Button objects.
  • Text blocks share styles.
An example of the object hierarchy. Document has two cards. The first card contains a text block and a button that links to the second card. The second card contains a text block, an image and a connector between them. Both text blocks use the same style.

The object hierarchy is built from three types of relationships:

  • Association
  • Composition
  • Aggregation

Let's break them down one by one.

Composition: A owns B

All modern applications are built on data structures with tree-like ownership:

  • HTML/XML/Json DOM.
  • Structures of relational and document-oriented databases.
  • Intermediate representations in compilers.
  • Scenes in 3D applications.
  • User interface forms.

All of these are what UML calls composition - ownership with a single owner.

Our example is no exception. Our document owns cards, cards own elements, and elements own anchor points. These are all - composition relations.

Composition invariants:

  • An object lives as long as its owner exists, and the owner references it.
  • An object can have exactly one owner.
  • An object cannot be a direct or indirect owner of itself.

It would be very useful if modern programming languages allowed explicitly declaring such ownership relationships and checked composition invariants at compile time.

Composition support in С++/Rust/Swift

Why not in Java/JS/Kotlin/Scala? In languages with garbage collection, all references are multiple ownership references. That means they are aggregations, not compositions. So, let's look for built-in composition support in languages without garbage collection.

For example, let's try to design the data structure of the above-described application in C++.

First, let's try including objects directly into the structure where they are used. This won't work for three reasons:

Polymorphism: Cards can contain blocks of different types.

Non-owning associative references: They are present in our application model:

  • the y link connectors to anchors points
  • and buttons to cards.

If we represent them as weak_ptr, which is a natural representation of associations in C++, the objects they point to must be standalone objects, not fields in other objects. Smart pointers in C++ generally don't like to point inside other objects.

Time-of-life issue: At any moment when we delete any CardItem object, in the stack of the current thread, in the frame of some function (as a local parameter or temporary value), there might be a reference to that object. Then, after returning to that function, we might access a pointer that points somewhere inside an already deleted data structure.

Let's consider the example:

  • We have Card1, which contains TextBox1.
  • Somewhere in the editor, there is a reference to the current focused object.
  • The user presses the delete button.
  • The handler in the currently focused object asks the scene to delete all selected objects.
  • Among them there is our TextBox1.
  • This creates a situation with multiple dangling pointers in the stack, which is a form of time-of-life issue.
Time-of-life issue demonstration

These three considerations mentioned above - polymorphism, incompatibility with weak_ptr, and the time-of-life issue - prevent us from implementing composition by directly including objects into other objects. We need a composite pointer.

C++ provides unique_ptr for cases of single-owner ownership. This pointer solves the polymorphism problem but does not solve the time-of-life issue and the weak_ptr problems mentioned earlier. An object held by a unique_ptr cannot be a weak_ptr target. There is a logical explanation for this: a weak_ptr can lose its reference at any time, so when dereferenced, the object needs to be protected from deletion. In C++, dereferencing a weak_ptr generates a temporary shared_ptr. However, shared_ptr is incompatible with unique_ptr for obvious reason: the former deletes the object when the reference count reaches zero, while the latter deletes it immediately. As a result, neither weak_ptr nor shared_ptr can be used with unique_ptr on the same object.

This leaves us with storing our uniquely owning composite references in the form of shared_ptr, which actually implements aggregation, not composition, which leads us to acknowledge that composition, as a concept, is not supported in standard C++.

Interestingly, the situation remains the same when transitioning to Rust. Replace unique_ptr with Box, and shared_ptr with std::rc. You'll get exactly the same behavior with addition of an extra RefCell wrapper.

Similarly, the situation in Swift is also comparable. ARC-strong is equivalent to shared_ptr, and weak in this language corresponds to weak_ptr.

Let's reiterate: the listed programming languages use shared references, initially intended for multiple ownership aggregation, not only for that but also for composition, solely because they need reference counters to handle temporary references from the stack, which save us from the time-of-life issue. Architectural decisions were made based on the technical similarity of protection against premature deletion and shared ownership. As a result, we lost built-in protection against multiple ownership and reference cycles.

Composition in Argentum

First, a few words about the syntax:

a = expression;  // This defines a new variable.
a := expression; // This modifies an existing variable - assignment.
ClassName        // This creates an instance.

// Type declarations are only required in the parameters and return types
// of functions, delegates, and methods (but not lambdas).
// In all other expressions types are inferred automatically.
fn myCoolFunction(param int) int { param * 5 }

// The type of a variable and class field is determined by the type of the initializer.
x = 42;  // It's Int64

Now let's get back to references.

To prevent the "time-of-life issue," Argentum introduces two separate types of references:

  • Composite reference: Lives in object fields and function results. This type is declared as @T.
  • Temporary stack reference: Lives only on the stack. Declared as T. An object can be referenced by any number of temporary stack references, plus at most one composite reference.

Examples:

class Image {
   // resource is a field of type @Bitmap,
   // initialized with an instance of Bitmap class;
   resource = Bitmap;
}
// A new local variable references a new object of the `Image` class.
a = Image;

// The local variable `b` references the same object as `a`.
b = a;

// The local variable `c` references a copy of the object `a`.
c = @a;

// The whole point of this: you cannot assign a stack-referenced object
// to a composite reference. As this could violate the single ownership rule.
a.resource := c.resource; // Compilation error
// Why the expression `c.resource` is of stack reference type?
// Because any read of a composite reference returns a stack reference.

// Assigning any reference detaches it from the old object and attaches it
// to the new one.
// When detaching, the old object may be deleted if it was the last reference.
a.resource := Bitmap;

The composition invariants described at the beginning of the article are automatically ensured in Argentum:

  • In Argentum, an object can have exactly one owner, and the owner references the object using a composite reference. This reference type can only be assigned with a copy of other objects or a with a newly created object, so there are never happen two composite references to the same object.
  • An object lives as long as its owner lives, and the owner references it. Additionally, protection against the "time-of-life issue" is provided: an object remains alive as long as it is referenced by a composite reference or at least one stack reference. This is achieved through a reference counter in the object. Since references are built into the language, the compiler can perform escape analysis and optimize unnecessary retain/release operations with counters. Moreover, because composite references work with mutable objects, and mutable objects in Argentum always belong to a single thread, this reference counters are not atomic. This ensures that the destruction of the object and the release of its resources will happen at a predictable time on the correct thread, which is often important.
  • An object cannot be a direct or indirect owner of itself. When creating an object, the compiler already knows which composite field of which object the newly created object will be assigned to. This guarantees that the owner of an object is always created before that object, and thus, the object will never own itself.

In summary, Argentum has built-in support for composition. It checks the safety of all operations with composites at compile time and optimizes reference counter accesses, which are never atomic in the case of composition. The syntax for operations with composites is concise and intuitive.

All operations on composite pointers are protected from memory leaks - no objects stay in memory longer than they remain owned/referenced from stack frames.

Associations (weak pointers): "A knows about B"

Association is a relationship without ownership. In our example, each button has a target field, which references a certain card. Also in the connector, there are start and end fields , which refer to some anchor points. These are all examples of associations. Associations are very common in application models: links between UI forms, foreign keys in databases, almost everywhere when ids and refs appear in arbitrary data formats, cross-references between controllers, data models, and views in MVC - all of these are associations.

Associations can form arbitrary graphs, including those containing cycles. Objects on both sides of the association must have owners in their hierarchies.

As with composition, the associative relationship has invariants:

  • The referencing object and the target object can have independent lifetimes.
  • The association is broken when either of the two objects is deleted.

Associative references in C++/Rust/Swift

All the listed languages support associations using weak_ptr or rc::Weak or weak. As already discussed in the `composition` chapter, they all require the target to be a shared object, which limits the compiler's ability to check the integrity of data structures.

Also all the listed languages allow accessing the associative reference without checking for the loss of the object.

How association is represented in Argentum

Associative links play a significant role

  • in the built-in operation of copying object hierarchies (which has a dedicated section later in this post)
  • and in multithreaded operations (which will be covered in a separate post).

In other scenarios, Argentum's associative links are almost identical to weak references in C++/Rust/Swift, with two differences:

  • The target can be any object, even one stored in a composite reference.
  • it not possible to access weak target without checking for it presence.
// In Argentum, an associative link to a class T is declared as &T.

// Here is a function that takes an associative link to CardItem as a parameter
// and returns the same associative link to CardItem:
fn myWeirdFunction(parameter &CardItem) &CardItem { parameter }

// In Argentum, there is an `&`-operator that creates an & reference to an object:
a = TextBlock; // `a` is an owning reference
w = &a;        // `w` is an &-reference to `a`

// `x` is an empty &-reference (like null pointer but strictly typed):
x = &TextBlock;

// Now `x` and `w` point to the same object:
x := w;

// An &-reference can exist in fields, variables, parameters, results,
// and temporary values. For example, in a class definition.
class C {
  field = &C; // `field` is a reference to `C`, initially null
}

An &T reference can lose its target at any moment as a result of any operation that deletes objects. Therefore, before using an &T reference, it needs to be locked and checked for the presence of the target. This operation generates a temporary stack reference. However, since there is always a possibility that the target was lost or the reference initially did not point to any object, the result of such conversion is always an optional temporary stack reference.

A brief note about the optional data type in Argentum:

  • For any type (not just references), there can exist an optional wrapper.
  • For a type T, the optional type would be ?T. For example, ?int, ?@Card, ?Connector, etc.
  • A ?T variable can hold either "nothing" or a value of type T.
  • The binary operation "A ? B" works with optionals. It requires operand A to have type ?T, and operand B to be a conversion T->X. The result of the operation is ?X. The binary operation "A ? B" works like the "if" operator:
    • Evaluates operand A.
    • If it is "nothing," then the result of the whole operation will be "nothing" of type ?X.
    • If it contains a value of type T, it is extracted from the optional, bound to the name "_", and operand B is executed, and its result is wrapped in ?X.

The information on optional data provided above is the minimum necessary to illustrate the workings of &-references. The rest of the description of the optional type can be found here: optionals-based control structures.

In Argentum, an &T (weak references) is automatically converted to ?T (optional stack references) whenever a value of optional type is expected. Therefore, the "?" operator applied to an &T reference automatically performs target locking to prevent deletion, checks the result for "not lost," and executes the code upon success:

// If the variable `weak` (which is an &-reference) is not empty,
// assign the string "Hello" to the field of the object it references.
weak ? _.text := "Hello";

Since the variable "_" exists during the entire execution of the right operand, the result of dereferencing the &-reference will be protected from deletion throughout its execution time:

weak ? {
   _.myMethod();
   handleMyData(_);
   log(_.text);
};

Since the result of the check is only visible inside the right operand of the "?"-operation, Argentum program is safe at the syntax level from:

  • Accessing the inner content of an "empty" optional.
  • Dereferencing nulls.
  • Dereferencing the lost &-references.
  • Skipping checks of type casting, array indexing, map key access.

All of these design choices make Argentum an extremely safe language.

Internally, an &-reference is implemented as a pointer to a dynamically allocated structure with four fields, which stores a pointer to the target, the thread identifier of the target, counters, and flags. One of the flags indicates whether the reference has never been passed across thread boundaries. Such intra-thread references are processed by simpler code and do not require inter-thread synchronization.

In summary of this section:

  • Argentum has built-in support for associative references.
  • Unlike C++/Rust/Swift, the targets of &-references can remain composites since object protection after dereferencing is performed using a temporary stack reference, not with shared_ptr/arc/rc.
  • Dereferencing an &-reference is combined with a check for the presence of the target, making dereferencing without checking syntactically impossible, which enforces safety.
  • If a reference to an object does not cross thread boundaries, it does not use synchronization primitives.
  • Operations on &-references have a lightweight and straightforward syntax.

No operations on Associative references produce owning pointers, so no memory leaks are possible.

Aggregation: one object, many owners

In our example, a set of TextBox objects can refer to the same Style object. And each Style object remains alive as long as someone refers to it. In real-world applications, there are surprisingly few scenarios that allow objects to be safely shared in this way. The collective wisdom of the programming community has long declared sharing mutable objects an anti-pattern. All examples of safe sharing boil down to the maxim 'shared XOR mutable.'

For instance, it is safe to refer to the same String object in Java or use the same texture resources in different objects in a 3D scene because these objects are immutable.

Aggregation invariants:

  • All references to the shared object are equal and have the same rights.
  • The shared object remains alive as long as there is at least one aggregate reference to it.

If we add the immutability of the target object as an invariant, an important guarantee emerges: the object cannot directly or indirectly refer to itself. This is because you can only obtain an aggregate reference to an immutable object, and it's impossible to assign a reference to an immutable object.

How existing mainstream languages support aggregation

All modern languages fully support aggregation, even in the form where the shared object can be mutable. However, this can lead to hard-to-detect problems in the application's business logic, memory leaks, and race conditions in multi-threaded environments. An exception is Rust, which enforces shared objects to be immutable while still allowing breaking that immutability using the Cell wrapper. But Rust being a system-level language has full right to do it.

How aggregation is implemented in Argentum

In Argentum, an aggregate reference can only refer to an immutable object. Moreover, immutability and sharing are the same concept: it's impossible to have a composite reference to an immutable object and impossible to have an aggregate reference to a mutable object.

An aggregate reference to a class T is declared as *T.

In Argentum, there is a freeze operator that takes a stack reference to a mutable object and returns an aggregate shared reference to an immutable object. The freeze operator is denoted as "*expression". If it's possible, the object gets frozen in-place, but if there exist some additional references to the object, it makes a frozen copy of the object.

A frozen object fields cannot be assigned, and also invocation of mutating methods is not allowed. In Argentum, methods are divided into three categories:

  • Mutating methods: These can only be called on mutable objects.
  • Non-mutating methods: These can be called on any object.
  • Shared methods: These can only be called on frozen objects, where this is guaranteed to be an aggregate reference.

Example of immutability:

a = Style;
a.size := 14;
fa = *a;        // `fa` is of type of `*Style` and points to the frozen copy of `a`.
x = fa.size;    // You can access fields of frozen objects.
fa.size := 16;  // Compilation error. Attempt to modify a frozen object.

Example of sharing:

s = *Style.setSize(18);  // Create a Style object, fill it, and freeze it.
t = TextBox.setStyle(s); // Create a TextBox and add a reference to `s` to it.

// This could have been done more easily:
t = TextBox
   .setStyle(*Style.setSize(18));

// The second textBox will be referring the same Style.
t1 = TextBox.setStyle(t.style);
Two text boxes share the same Style

A frozen object cannot be unfrozen, but it can be copied using the familiar @-operator, and this copy will be mutable.

s = @t1.style;  // `s` - a mutable copy of the existing Style.
size := 24;     // change its size,
t1.style := *s; // freeze and save it in t1.
Now each TextBox references its own Style.

The internal implementation of *T references in Argentum use a counter with an additional concurrency flag. This flag allows shared objects living in the same thread to operate without atomic operations and synchronization primitives.

The complete prohibition of modifying shared objects in Argentum has three important consequences:

  • The absence of cycles is guaranteed, so managing the lifetime of aggregates is sufficient with a simple reference counting. No Garbage Collection is needed.
  • Any thread that has an aggregate reference has a 100% guarantee that all objects accessible through this reference and all outgoing references remain accessible to this thread, regardless of the actions of other threads. This significantly reduces the need for retain/release operations.
  • All operations on a reference counter from any thread are not required to be synchronized with the behavior of other threads. These operations can be delayed and even reordered as long as any retain is executed before any release. This allows batching of operations with counters, making them much cheaper.

All of the above significantly reduces the overhead of multithreaded reference counting.

Immutability of shared objects is a best practice requirement aimed at improving the reliability of business logic of applications, eliminating races, and eliminating undefined behavior. The resulting significant acceleration in working with counters and the elimination of GC is a pleasant bonus, not the ultimate goal.

In summary, Argentum follows the maxim "shared XOR mutable". And although every programmer can say, "I violated this unwritten rule several times and still alive," following this maxim is a beneficial strategy in large-scale applications and long-term applications life cycle.

Shared pointer operations do not create cycles in ownership graph, because shared pointer points only to immutable object and immutable object field cannot be assigned. Thus, no object can point directly or indirectly to itself with shared pointer, so there might not be memory leaks in shred pointers.

Automatic copying of object hierarchies

The deep copy operator is not directly related to the reference model of Argentum. Instead, we could've decided to:

  • prohibit assigning anything but a new object instance to a composite reference,
  • or force programmers to manually implement the Clone trait,
  • or limit the copy operation by just copying of the composition subtree, and leave associations and aggregations as simple values of pointers, similar to Rust.

Without full-fledged automatic copying operations, the object system of Argentum would still be complete, safe, and guaranteed to be protected from memory leaks. However, it would not be convenient. Several operations would either become impossible or require a significant amount of handwritten code:

  • conversion of a stack reference to a composite reference,
  • freezing an object if someone else is still referring to it, and it cannot be frozen in place,
  • unfreezing an object.

The built-in Argentum copy operation uses the following principles (in descending order of importance):

  1. The invariants of the reference model must not be violated.
  2. Null-safety must be maintained (i.e., if a field is not null in the original object, it cannot become null in the copy).
  3. The result of the copy must be meaningful.
  4. Data must not be lost.
  5. Copying must work in a multi-threaded environment.
  6. The overhead of the copying operation in terms of time and memory should be minimal.

The first and second principles require that when copying the root of the object tree, the entire subtree through all composite references must be copied.

Copying an aggregate reference can be done simply by copying the value of the reference, so that the original and the copy share the same immutable sub-object. After all, this is the essence of aggregation.

For associative references, there are two cases:

Case 1. If the copied reference points outside the copied object hierarchy, the only value it can have is the value of the original. Therefore, the copy of such a reference will point to the original object.

Let's consider an example, trying to copy a card that references another card:

// If the document contains card[0], copy _it_ to the end of the cards list
doc.cards[0] ? doc.cards.append(@_);
The copied subtree is shown in black. The result of the copy is shown in blue, and external references are shown in red.

It is easy to notice that this is the exactly how the copying code written by the programmer would work.

Case 2. If the copied reference points to an object that is involved in the same copy-operation, it means that this reference is related to the internal topology of the object. In this case, we have a choice whether to point it to the original object or to the copy. If it points to the original, we lose information about the internal topology, and we agreed not to lose information. Therefore, internal references are copied in a way that preserves the original topology. Let's consider an example of copying objects with internal cross-references:

doc.cards[1] ? doc.cards.append(@_);
The copied subtree is shown in black. The result of the copy is shown in blue, and internal references that preserve the topology are shown in red.

Such copying of references is also the expected behavior. This is exactly how manually written copying code would work.

This copy operation is universal: The same principles are applied in the design pattern "prototype".

  • For example, when developing a graphical user interface, we can create a list item consisting of icons, text, checkboxes, and cleverly linked references that provide the desired behavior with attached handlers. Then we can copy and insert this item into list controls for each item of the data model. This copying algorithm ensures the expected internal connections and behavior.
  • The same principles are applied in document model processing,
  • 3D engines for creating prefabs,
  • and AST transformations in compilers.

It is difficult to find a scenario where this copying operation would be inapplicable.

There is another reason for automating the copying operation. If the language requires manual implementation of this operation, for example through the implementation of the Clone trait, it would lead to user code being executed in the middle of a large copying operation involving multiple objects. This code would be exposed to the objects in an incomplete and invalid state.

In Argentum, this operation is automated, always correct, efficient, and safe.

If objects manage some system resources, special "afterCopy" and "dispose" functions can be defined for them, which will be called when copying and deleting objects. These functions are called when object hierarchies are already (or are still) in a valid state. They can be used to close files, release resources, copy handles, reset caches, and perform other actions to manage system resources.

In conclusion: Automated copying operation is like an automatic transmission in a car. It may be criticized, but it significantly simplifies life.

Impact of Multithreading on the Reference Model

The Argentum multithreading model is a large topic that requires a separate article. Here, only a parts related to the reference model is described.

The multithreading model of Argentum resembles the web-workers model, where threads are lightweight processes living in a shared address space and have access to all immutable objects of the application through aggregate references. Additionally, each thread has its own hierarchy of mutable objects. Examples of threads with their own state include graphic scene threads, HTTP client threads, and document model threads. There can also be simple stateless worker threads that just process tasks, operating on objects passed as task parameters. Each thread has an incoming queue of asynchronous tasks. A task consists of an associative reference to one of the objects living in the thread, a function to be executed, and a list of parameters to be passed to this function. Thus, tasks are executed in threads but are sent to objects stored in these threads. To send a task, the code does not need to know anything about threads and queues; it just needs an associative reference to the receiving object.

Role and Behavior of References in a Multithreading Environment:

  • Stack references T and composite references @T are local intra-thread references, not visible outside their thread. They always point to the object of their thread. They can be passed as task parameters to another thread. In this case, they rebound to the thread of the target object. The actual sending of a task to the queue occurs when there are no references to the objects of this task in the current thread. This guarantees that the mutable object will not be simultaneously accessed by two threads.
  • Aggregate references *T can be freely shared between threads, and the object referenced by the link is always accessible to any number of threads.
  • Associative references &T allow storing relationships between objects from different threads. When attempting to synchronously access an object in another thread, the &T reference returns null. However, associative references allow asynchronously sending tasks to their target objects regardless of which thread these objects are in. Thus, an associative reference in a multithreading environment plays the role of a universal object locator.

In summary: All rules of reference behavior are the same in single-threaded and multithreaded cases. All invariants continue to hold. Copying, freezing, and thawing operations do not require inter-thread synchronization.

Conclusion

Argentum's reference model allows it to avoid data races, memory leaks, and provides memory safety and null safety at the syntax level. Built-in reference types enable building data models and object hierarchies that maintain their structure and check ownership invariants at the compilation stage. The syntax of reference operations is concise and mnemonic.

The language is currently in an experimental development state, so many specific cases have not been considered, and almost no optimizations have been made. The language lacks syntactic sugar and a good debugger. Work is ongoing, and constructive criticism is welcome.

Leave a Reply

Your email address will not be published. Required fields are marked *