#288 Performance benchmarks for Python 3.11 are amazing


Manage episode 331635123 series 1305988
By Michael Kennedy and Brian Okken. Discovered by Player FM and our community — copyright is owned by the publisher, not Player FM, and audio is streamed directly from their servers. Hit the Subscribe button to track updates in Player FM, or paste the feed URL into other podcast apps.

Watch the live stream:

Watch on YouTube
About the show

Sponsored by us! Support our work through:

Brian #1: Polars: Lightning-fast DataFrame library for Rust and Python

  • Suggested by a several listeners
  • “Polars is a blazingly fast DataFrames library implemented in Rust using Apache Arrow Columnar Format as memory model.
    • Lazy | eager execution
    • Multi-threaded
    • SIMD (Single Instruction/Multiple Data)
    • Query optimization
    • Powerful expression API
    • Rust | Python | ...”
  • Python API syntax set up to allow parallel and execution while sidestepping GIL issues, for both lazy and eager use cases. From the docs: Do not kill parallelization
  • The syntax is very functional and pipeline-esque:

    import polars as pl q = ( pl.scan_csv("iris.csv") .filter(pl.col("sepal_length") > 5) .groupby("species") .agg(pl.all().sum()) ) df = q.collect() 
  • Polars User Guide is excellent and looks like it’s entirely written with Python examples.

  • Includes a 30 min intro video from PyData Global 2021

Michael #2: PSF Survey is out

  • Have a look, their page summarizes it better than my bullet points will.

Brian #3: Gin Config: a lightweight configuration framework for Python

  • Found through Vincent D. Warmerdam’s excellent intro videos on gin on calmcode.io
  • Quickly make parts of your code configurable through a configuration file with the @gin.configurable decorator.
  • It’s in interesting take on config files. (Example from Vincent)

     # simulate.py @gin.configurable def simulate(n_samples): ... # config.py simulate.n_samples = 100 
  • You can specify:

    • required settings: def simulate(n_samples=gin.REQUIRED)`
    • blacklisted settings: @gin.configurable(blacklist=["n_samples"])
    • external configurations (specify values to functions your code is calling)
    • can also references to other functions: dnn.activation_fn = @tf.nn.tanh
  • Documentation suggests that it is especially useful for machine learning.
  • From motivation section:
    • “Modern ML experiments require configuring a dizzying array of hyperparameters, ranging from small details like learning rates or thresholds all the way to parameters affecting the model architecture.
    • Many choices for representing such configuration (proto buffers, tf.HParams, ParameterContainer, ConfigDict) require that model and experiment parameters are duplicated: at least once in the code where they are defined and used, and again when declaring the set of configurable hyperparameters.
    • Gin provides a lightweight dependency injection driven approach to configuring experiments in a reliable and transparent fashion. It allows functions or classes to be annotated as @gin.configurable, which enables setting their parameters via a simple config file using a clear and powerful syntax. This approach reduces configuration maintenance, while making experiment configuration transparent and easily repeatable.”

Michael #4: Performance benchmarks for Python 3.11 are amazing

  • via Eduardo Orochena
  • Performance may be the biggest feature of all
  • Python 3.11 has
    • task groups in asyncio
    • fine-grained error locations in tracebacks
    • the self-type to return an instance of their class
  • The "Faster CPython Project" to speed-up the reference implementation.
    • See my interview with Guido and Mark: talkpython.fm/339
    • Python 3.11 is 10~60% faster than Python 3.10 according to the official figures
    • And a 1.22x speed-up with their standard benchmark suite.
  • Arriving as stable until October



Joke: Why wouldn't you choose a parrot for your next application

300 episodes