Rust is in my 2021 watchlist

Engineering May 2, 2021

I'm not kind of person forcing my team sticking into any specific technology, such as Golang, or Mongodb ... because I believe we should choose the right tech for a specific problem, we can't rely one-fit-all technology and limit our speed of innovation. Hence, I can say I am quite neutral on technical solution, and choose whatever technical framework that fit our use-case.

On the other hand, I tend to be extreme on betting new technologies. I mentioned my reasonings in another post so I will go straightforward to Rust lang as my new bet in 2021. My team caught interests in Rust lang since 2020 and explored it to compare with Golang. The conclusion we got is, each language is proper for different use-cases. If we want to write restful (http) apis, we should use Golang as it is mature, fast and simpler to express business logics than Rust. Rust, on the other hand, is found to be outstanding in data processing. Many amazing frameworks/softwares are built upon Rust to solve problems relating to data, such as Vector, Materialize Db, Apache Arrow & Fusion, etc... which promised a safer, lighter and faster way on processing data from streaming to batching.

One bold evidence for Rust strength is the library Polars which is described as "a blazingly fast DataFrames in Rust & Python". Indeed, I am more convinced when looking at its performance benchmark, and believe Polars will be the successor of Pandas. But more that that, this library has 2 unique features that Pandas doesn't have.

  • One, Polars is native-polygot: Polars supports 3 languages at the same time: Rust as native, Python binding, and Javascript binding (via wasm). I love this awesome capability so much because my team are very language diversified, hence we can utilize this library at many languages we're using.
  • Two, Polars offers 2 approaches of handling data at the same time: SQL via Datafusion and Data frame, but both ways are unified and inter-transferable on one great foundation of Apache Arrow: a zero-memory copy data mechanism. I think this feature is a productivity booster for data teams, and they will love it very soon.

It is quite soon to say any bold claims on Rust and its data-tooling ecosystem, because many Rust OSSes are still on development and adoption phases. But from the promising potential of what I'm looking at Rust, I am willing to take a bet on Polars to be successor of Pandas and Rust will become a new data engine for data science community with a very smooth adoption due to Python binding capability. Let's validate my guess in 2022.

Tags