How fast is Argentum (part 2)

The previous "benchmark" tested integer arithmetic, control structures and function/lambda calls.

This benchmark tests data structures, object creation, method calls and pointers. It's based on https://programming-language-benchmarks.vercel.app/problem/binarytrees

First of all, tests from the above site have a number of issues:

  • We can see lots of GC-driven languages in the top rows. But if we look at the consumed memory it becomes easily understandable: these language don't collect memory during the test. Run it a little bit longer, and these programs eventually suffer of sudden stops that make it not only slower than most (all) of the rivals but it even make these languages incapable of doing real-time/multimedia/GUI related tasks.
  • Another set of champions use unsafe code blocks to make the actual computations. But the real applications don't run unsafe.
  • Half of the samples checks only one of left-right pointers against null. Thus application code becomes prone to crashes.
  • Some safe languages (Java, JS etc.) prevent objects in this-pointers from being suddenly deleted, others unsafely don't do it. There are languages advertised as being safe, but in these benchmarks their safety net is deliberately not used.

Let's use the whole idea of this test, but write another code that is free the from above issues:

C++

Code

#include <iostream>
#include <memory>
using std::shared_ptr;

struct Node {
  shared_ptr<Node> left, right;
  Node(int d) {
    if (d > 0) {
      left = std::make_shared<Node>(d - 1);
      right = std::make_shared<Node>(d - 1);
    }
  }
  virtual int check() {
    return 1 +
     (left ? shared_ptr<Node>(left)->check() : 0) +
     (right ? shared_ptr<Node>(right)->check() : 0);
  }
};

const int minDepth = 4;
const int maxDepth = 18;

int main(){
  std::cout
    << "stretch tree of depth " << maxDepth
    << " check: " << std::make_shared<Node>(maxDepth)->check()
    << std::endl;

  auto longLivedTree = std::make_shared<Node>(maxDepth);

  int nResults = (maxDepth - minDepth) / 2 + 1;
  for (int i = 0; i < nResults; ++i) {
    int depth = i * 2 + minDepth;
    int n = 1 << (maxDepth - depth + minDepth);
    int check = 0;
    for (int x = 1; x < n + 1; ++x)
        check += std::make_shared<Node>(depth)->check();
    std::cout << n << " trees of depth " << depth << " check: " << check << std::endl;
  }
  std::cout << "long lived tree of depth " << maxDepth << " check: " << longLivedTree->check() << std::endl;
  return 0;
}

Run -O3 optimized C++

ak@AkGpd:~/proj/tree$ /usr/bin/time ./tree-o3
stretch tree of depth 18 check: 524287
262144 trees of depth 4 check: 8126464
65536 trees of depth 6 check: 8323072
16384 trees of depth 8 check: 8372224
4096 trees of depth 10 check: 8384512
1024 trees of depth 12 check: 8387584
256 trees of depth 14 check: 8388352
64 trees of depth 16 check: 8388544
16 trees of depth 18 check: 8388592
long lived tree of depth 18 check: 524287
2.37user 0.03system 0:02.41elapsed 99%CPU (0avgtext+0avgdata 69092maxresident)k
0inputs+0outputs (0major+16523minor)pagefaults 0swaps

Python

Code:

import sys

class Node:
    def __init__(self, d):
        if d == 0:
            self.l = None
            self.r = None
        else:
            self.l = Node(d-1)
            self.r = Node(d-1)

    def check(self):
        r = 1
        if self.l:
           r += self.l.check()
        if self.r:
           r += self.r.check()
        return r

minDepth = 4
maxDepth = 18

print('stretch tree of depth {0} check: {1}'.format(maxDepth, Node(maxDepth).check()))

longLivedTree = Node(maxDepth)
nResults = (maxDepth - minDepth) // 2 + 1
for i in range(0, nResults):
    depth = i * 2 + minDepth
    check = 0
    n = 1 << (maxDepth - depth + minDepth)
    for _ in range(1, n + 1):
        check += Node(depth).check()
    print('{0} trees of depth {1} check: {2}'.format(n, depth, check))

print('long lived tree of depth {0}\t check: {1}'.format(
          maxDepth, longLivedTree.check()))

Run Python code:

ak@AkGpd:~/proj/tree$ /usr/bin/time python3 tree.py
stretch tree of depth 18 check: 524287
262144 trees of depth 4 check: 8126464
65536 trees of depth 6 check: 8323072
16384 trees of depth 8 check: 8372224
4096 trees of depth 10 check: 8384512
1024 trees of depth 12 check: 8387584
256 trees of depth 14 check: 8388352
64 trees of depth 16 check: 8388544
16 trees of depth 18 check: 8388592
long lived tree of depth 18      check: 524287
103.58user 0.96system 1:44.56elapsed 99%CPU (0avgtext+0avgdata 174080maxresident)k
0inputs+0outputs (0major+703747minor)pagefaults 0swaps

Rust

Code

use std::rc::Rc;

struct TreeNode {
    l: Option<Rc<TreeNode>>,
    r: Option<Rc<TreeNode>>,
}

impl TreeNode {
    fn check(&self) -> i32 {
        let mut ret = 1;
        if let Some(l) = &self.l {
            ret += l.check();
        }
        if let Some(r) = &self.r {
            ret += r.check();
        }
        ret
    }

    fn create<'r>(depth: i32) -> Rc<TreeNode> {
        if depth > 0 {
            Rc::new(TreeNode {
                l: Some(Self::create(depth - 1)),
                r: Some(Self::create(depth - 1)),
            })
        } else {
            Rc::new(TreeNode { l: None, r: None })
        }
    }
}

const MIN_DEPTH: i32 = 4;
const MAX_DEPTH: i32 = 18;

fn main() {
    {
        let tree = TreeNode::create(MAX_DEPTH);
        println!("stretch tree of depth {} check: {}", MAX_DEPTH, tree.check());
    }
    let long_lived_tree = TreeNode::create(MAX_DEPTH);
    let n_res = (MAX_DEPTH - MIN_DEPTH) / 2 + 1;
    for i in 0..n_res {
        let d = i * 2 + MIN_DEPTH;
        let n = 1 << (MAX_DEPTH - d + MIN_DEPTH);
        let mut chk = 0;
        for _i in 0..n {
            let a = TreeNode::create(d);
            chk += a.check();
        }
        println!("{} trees of depth {} check: {}", n, d, chk)
    }
    println!("long lived tree of depth {} check: {}", MAX_DEPTH, long_lived_tree.check());
}

Run Rust

ak@AkGpd:~/rust-tree$ /usr/bin/time ./tree
stretch tree of depth 18 check: 524287
262144 trees of depth 4 check: 8126464
65536 trees of depth 6 check: 8323072
16384 trees of depth 8 check: 8372224
4096 trees of depth 10 check: 8384512
1024 trees of depth 12 check: 8387584
256 trees of depth 14 check: 8388352
64 trees of depth 16 check: 8388544
16 trees of depth 18 check: 8388592
long lived tree of depth 18 check: 524287
3.82user 0.00system 0:03.83elapsed 99%CPU (0avgtext+0avgdata 51240maxresident)k
0inputs+0outputs (0major+12373minor)pagefaults 0swaps

Argentum

Code

using string;
using sys { log }
using utils { forRange }
const LF = utf32_(10);

class Node {
    left = ?Node;
    right = ?Node;

    check() int {
        r = 1;
        left ? r += _.check();
        right ? r += _.check();
        r
    }
    init(n int) this {
        n > 0 ? {
            left := Node.init(n - 1);
            right := Node.init(n - 1);
        }
    }
}
const minDepth = 4;
const maxDepth = 18;

log("stretch tree of depth {maxDepth} check: {Node.init(maxDepth).check()}{LF}");
longLivedTree = Node.init(maxDepth);

nResults = (maxDepth - minDepth) / 2 + 1;
forRange(0, nResults) `i {
    depth = i * 2 + minDepth;
    n = 1 << (maxDepth - depth + minDepth);
    check = 0;
    forRange(0, n) `x
        check += Node.init(depth).check();
    log("{n} trees of depth {depth} check: {check}{LF}");
};
log("long lived tree of depth {maxDepth} check: {longLivedTree.check()}{LF}");

Run Argentum:

ak@AkGpd:~/proj/argentum/build-rel$ /usr/bin/time ./tree
stretch tree of depth 18 check: 524287
262144 trees of depth 4 check: 8126464
65536 trees of depth 6 check: 8323072
16384 trees of depth 8 check: 8372224
4096 trees of depth 10 check: 8384512
1024 trees of depth 12 check: 8387584
256 trees of depth 14 check: 8388352
64 trees of depth 16 check: 8388544
16 trees of depth 18 check: 8388592
long lived tree of depth 18 check: 524287
2.77user 0.01system 0:02.79elapsed 99%CPU (0avgtext+0avgdata 50620maxresident)k
0inputs+0outputs (0major+12357minor)pagefaults 0swaps

Numbers

LanguageTime (ms)Memory (Kb)
C++2.3769092
Argentum2.7750620
Rust3.8251240
Python103.58174080

Conclusions

These numbers mean little-to-nothing. Choosing different environments, compilers, compilation modes could dramatically alter these numbers, shuffling three top rows at random.

Just choosing more appropriate algorithms solves the problem much much faster and in less memory than any compiler optimizations could ever achieve. For example, just by replacing the allocator to a thread-bounded one could eliminate mutexes at every object allocation and disposal, and introducing allocator with pooled/paged modes could eliminate all allocation overheads whatsoever (in the end all Node instances have equal fixed size). This alone could speed up any of rivals by x100.

So the main takeout for me is Argentum still has no optimizations and lowerings in the compiler, and it uses old C allocator from c_stdlib (as a matter of fact, it takes the same time to complete this benchmark with -O0 and -O3). And as such it has long way to progress. But even with these disadvantages it resides in the same league of fast languages as C++ and Rust but providing much cleaner syntax and rock-solid safety and resilience. And such a unique feature as the total absence of memory leaks, of course.

Leave a Reply

Your email address will not be published. Required fields are marked *