My Experiences with Julia
Why I chose Julia
In 2021, I had to do some numerical calculations for my master thesis about theoretical physics. So, I had to choose a language:
First I thought about using python. It is commonly used for scientific calculations, mainly because of its good ergonomics, i.e. python code is shorter than e.g. equivalent C code. Dynamic typing can lead to a horrible mess in larger projects, but I knew my code would stay small. The problem with python for my use case is its poor performance. Numpy is quite fast, definitely fast enough, but if I had to manually write a hot loop in python, it would be unacceptably slow.
A faster alternative would be C/C++. While it is definitely fast enough, its poor ergonomics would inflate the time needed for writing the code. Also, C++ has its own problems, but this is a story for another post.
Rust is essentially C/C++, but without their ugliness. But Rust is quite a verbose language, which is fine for large projects, but not what you want for short scientific calculations.
Julia was advertised to me as a language for scientific calculations like mine. It looks like python, has a really nice syntax for mathematical equations, but is nearly as fast as C. This is exactly what I wanted. The code would be short and clean like python, but I can write hot loops manually without completely butchering the performance. Julia is a bit slower than C, but I am not trying to squeeze out every bit of performance. So, I choose Julia, thinking that it is exactly the right language for my kind of problem… Oh boy, was I wrong.
Tutorials are not References
There are two different types of documentation, I will call them tutorials and references. Tutorials are what you read if you never used a library before and you want to get an overview, references are what you read if you want to e.g. now the exact set of arguments that a specific function accepts. (My seperation into two types of documentation is similar to divio’s seperation into four types of documentation. My “reference” is their “reference” and my “tutorials” are their “tutorial”, “how-to guides” and “explanation”.)
Julia’s flagship libraries have great tutorials, but fall flat on their face when it comes to references. Take for example DifferentialEquations.jl. Their tutorial is quite nice, but a reference like scipy’s reference or the worlds best reference is missing.
Let’s explore Julia’s references and see what they do wrong.
Let’s say you have a variable and the data you are interested in is somewhere hidden in its deeply nested datatype. In rust, you would first use rls or rust-analyze to find the name of the datatype. Then you can run cargo doc --open
which will generate html documentation about this datatype. Each datatype is either a Product-Type (a.k.a. a structure) or a Sum-Types (a.k.a. an Enum). The generated documentation will contain links to the fields of the structures or enums. By recursively looking through this tree, you can find what you are looking for.
So, lets try the same thing in Julia. To find the name of the datatype, there is the typeof(var)
command in Julia. In my numerics code, this once returned
julia> typeof(sol)
SciMLBase.ODESolution{Float64, 2, Vector{Vector{Float64}}, Nothing, Nothing, Vector{Float64}, Vector{Vector{Vector{Float64}}}, SciMLBase.ODEProblem{Vector{Float64}, Tuple{Float64, Float64}, false, brüssel_conf, SciMLBase.ODEFunction{false, typeof(limit_dgl), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Symbol, DiffEqBase.ContinuousCallback{var"#condition#28"{Float64, Vector{Float64}}, var"#affect!#29", var"#affect!#29", typeof(DiffEqBase.INITIALIZE_DEFAULT), typeof(DiffEqBase.FINALIZE_DEFAULT), Float64, Int64, Nothing, Int64}, Tuple{Symbol}, NamedTuple{(:callback,), Tuple{DiffEqBase.ContinuousCallback{var"#condition#28"{Float64, Vector{Float64}}, var"#affect!#29", var"#affect!#29", typeof(DiffEqBase.INITIALIZE_DEFAULT), typeof(DiffEqBase.FINALIZE_DEFAULT), Float64, Int64, Nothing, Int64}}}}, SciMLBase.StandardODEProblem}, OrdinaryDiffEq.Tsit5, OrdinaryDiffEq.InterpolationData{SciMLBase.ODEFunction{false, typeof(limit_dgl), UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Vector{Vector{Float64}}, Vector{Float64}, Vector{Vector{Vector{Float64}}}, OrdinaryDiffEq.Tsit5ConstantCache{Float64, Float64}}, DiffEqBase.DEStats}
As if this weren’t bad enough the inline help also is rather limited
help?> ODESolution
search: ODESolution RODESolution
struct ODESolution{T, N, uType, uType2, DType, tType, rateType, P, A, IType, DE} <: SciMLBase.AbstractODESolution{T, N, uType}
help?> SciMLBase.AbstractODESolution
abstract type AbstractODESolution{T, N, S} <: SciMLBase.AbstractTimeseriesSolution{T, N, S}
help?> SciMLBase.AbstractTimeseriesSolution
abstract type AbstractTimeseriesSolution{T, N, A} <: AbstractDiffEqArray{T, N, A}
help?> AbstractDiffEqArray
search: AbstractDiffEqArray
No documentation found.
Summary
≡≡≡≡≡≡≡≡≡
abstract type AbstractDiffEqArray{T, N, A}
Subtypes
≡≡≡≡≡≡≡≡≡≡
DiffEqArray{T, N, A, B, C, D, E, F}
SciMLBase.AbstractNoiseProcess{T, N, A, isinplace}
SciMLBase.AbstractTimeseriesSolution{T, N, A}
Supertype Hierarchy
≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡
AbstractDiffEqArray{T, N, A} <: AbstractVectorOfArray{T, N, A} <: AbstractArray{T, N} <: Any
Searching for ODESolution
in the documentation also shows no entry for ODESolution.
This means that it is possible that you have a variable that contains the information you want, but it is virtually impossible to extract said information.
The inline documentation is sadly not available online. Julia made me realize how great cargo doc
is.
The problem with free functions
A tutorial alone is not enough, you need a chain of links from a starting point to the function/argument/datatype you want. The authors of Julia argued that member functions are a bad idea, because it connects a function to a specific datatype instead of multiple datatypes. But this is not without disadvantages: Let’s say you want to know the methods to modify a Dict
. In e.g. rust you can find those methods in the documentation of the Dict
datatype. But if you execute apropos("Dict")
in Julia you find both functions like Base.mergewith
, but also stuff like Base.setenv
or Base.Cmd
.
Hard to debug due to bad error messages
I would consider myself a semi-experienced programmer. Less experienced than people who worked in the industry for years or decades, but certainly more experienced than the average scientist programmer. Julia made me feel like an absolute beginner again. Unlike other languages, even those I spent less time learning on, I hade trouble writing certain code or debugging certain errors that I would consider very basic. And I don’t mean that my solution looked unelegant, like the code of a beginner. No, I was literally not able to write certain basic things myself.
An example can be found here.
Dynamic Typing
Julia is a dynamically typed programming language. Julia feels way more dynamically typed than e.g. Python. Typical Julia code (ab-)uses the features enabled by dynamic typing way more than typical python code. I do not like dynamic typing and Julia only reinforced that opinion. The error messages are worse and debugging is harder, because Julia is dynamically typed. One of the things I find weird is that a scalar can sometimes be treated like a vector with one element and a NxM matrix can sometimes be treated like a vector with N times M elements: I.e.
= [1, 2]
vector = 3
scalar = [4 5;6 7]
matrix for el in vector
println(el)
end
for el in scalar
println(el)
end
for el in matrix
println(el)
end
prints the numbers 1-7.
Performance
Julia is advertised as a fast language. Whether Julia fulfills this promise depends on how one defines “performance”. One way would be to do what the debian benchmarksgame did and implement a simple problem with a hot loop in this language. For example, this is calculates the integral of a function (see https://en.wikipedia.org/wiki/Riemann_sum#Left_Riemann_sum)
function integrate(func, a,b, N)
= 0
sum = (b-a)/N
step for i in 1:N
+= func(a+i*step)*step
sum end
return sum
end
this code not only looks nice, but is basically as fast as it gets. So, promise fulfilled. On the other hand, if you would e.g. run this code
using Plots
using DifferentialEquations
it takes around 350 seconds on my machine, 20 seconds if it is already precompiled. Due to its jit, Julia sets new records for slowness. This is far slower than similar python code, slower than similar rust code, if you do an incremental build and sometimes even slower than rust code, if you include the time a clean build takes. The “you can run the code without compilation” advantage of an interpreted language is therefore completely gone. This is the kind of slowness that impacts your workflow. I spend a significant amount of the work-time waiting for this jit-compilation, also known as compile time latency or TTFP (time to first plot). This problem is increased by the fact that updating Julia with pacaur -Syu
forces a recompilation. This slowness affects other software written in Julia as well. For example, my editor formats on save. For most languages, the formatting time-delay is not noticable, but for Julia it sometimes took more than 10 seconds.
There are ways around the using ...
-is-slow issue. Since using ...
is only slow if you run it the first time in this julia-interpreter-process, the solution is to always keep the julia-interpreter-process running. E.g. vscode has a julia extension with the Julia: Execute File in REPL
command, which takes the contents of the currently open file and puts it into the running Interpreter. But this comes with other problems: Now, you have hidden state, the output does not solely depend on your code, but also on older versions of your code. If, for example, you change
function func(x::Int)
...
end
to
function func(x)
...
end
, func(123)
will still execute the old code. This is both tricky to debug and forces you to restart the interpreter, which can take a while since using ...
is slow. Also, you cannot add/remove/change fields of a struct
without restarting the interpreter or running into other problems.
The idea of “You only need to start the interpreter once a day” failed for me.
Confusion about the “recommended” way.
For every software, it is important that there is no confusion about what the “official” “recommended” “proper” way is. If there are multiple ways to do something and you do not know how you should do it, you might choose a solution that seems to work, but leads to problems further down the road. Examples of this Problem occurring in Julia:
You installed the package in the official ArchLinux repos called “julia”? Might seem like a good idea and will mostly work, but it is unsuported by upstream and can lead to linking errors during runtime.
How should you split code among multiple files? Good question.
Confusion about the recommended workflow exists. For example
Julia: Execute File in REPL
in combination withRevise
is wrong,
Does confusion about the “recommended” way exist in other languages? Absolutely, this is a problem in software engineering in general, not just a problem in Julia. Is it worse in Julia than in other languages? Maybe, even though C++ is probably the king here. Also, shout-out to the developers of the rust language. They do a phenomenal job at preventing this kind of confusion.
Other Stuff
Go To Definition
commands in editors work worse for Julia than for other languages.The tooling around Julia seems very immature. The Julia formatter is slow - sometimes formatting a small file took over 10 seconds - and buggy.
Plots.jl
does not allow zooming if used in vscode.- VSCode does not show you what was printed just before a crash.
While the tooling is bad, the scientific ecosystem is enormous. For every numerical method there is a library. This is the main advantage over Rust.
Variable Binding can be confusing:
function f(du, u, p, t) .= 2.0 du @assert du == [2.0] end
is not the same as
function f(du, u, p, t) = [2.0] du @assert du == [2.0] end
because only the first has an effect on a variable in the callers scope. This might be confusing.
Conclusion
Hot loops in Julia are quite fast, but you have to wait quite long for your program to start up.
The Julia tooling is quite immature and buggy, at least compared to rust or python. The ecosystem for scientific libraries however, is great and gigantic.
Julia has really nice, short syntax for mathematic equations. Compared to other languages, your numerics code will be much shorter. The problem is that this short code will take longer to write, because dynamic typing, bad reference documentation makes writing and debugging Julia code harder.
After a while, I decided to rewrite parts of my code in Rust. Rust code is much more verbose and I had to implement basic numerical methods since rust has few numeric libraries, so the code was much longer. However, it was still faster to write, because rust has a better type system, reference documentation, error messages and a mature ecosystem. My Rust code greatly outperformed the Julia code.
Other Opinions
I posted this article on the Julia discourse server and got quite a lot of comments.
Jakob also wrote about Julia. I agree with his love towards rust and the sections - Compile time latency - Large memory consumption - Julia can’t easily integrate into other languages - Weak static analysis - Abstract interfaces are unenforced and undiscoverable
I partially disagree with his “The ecosystem is immature” section because Julia has a lot of scientific packages.
His post about union types vs sum types is also great at highlighting a difference between Julia and Rust.
Victor Zverovich also wrote about Julia.
Yuri Vishnevsky thinks Julia has a correctness problem.
Updates
In December 2022, the developer of SciML responded that most of the things I complained about have seen significant improvement.