The SciPy Ecosystem Should Use Custom Entrypoints More

Summary: Python library developers can declare custom entry_points in their packages. This language feature is a good fit for “plugin discovery”, and it should be more widely used.

Think of a Python library that has some programmatic interface, some protocol, intended to be “pluggable” by other libraries. Sometimes it is useful to search the set of installed Python packages to discover plugins that extend that interface. I recently learned of a nice way to do this that I think should be more widely known and used.

Entrypoints are commonly used for “console scripts”

The most familiar use of the entry_points parameter is defining executables in a Python package. A minimal example of that looks like:

# setup.py
from setuptools import setup

setup(
    name='stuff',
    pymodules=['stuff'],
    entry_points={
        'console_scripts': [
            'my_custom_executable = stuff:main',
            ]
        }
    )

# stuff.py
def main():
    print("Hello world")

$ my_custom_executable
Hello world

Entrypoints can also be used to advertise custom extension points

The Python library intake defines a protocol for “drivers” that can read from some file format or database and return a Python data structure. Intake comes with some drivers included, and external libraries can define their own. Intake compiles a registry of the drivers it can find installed on the system. External libraries can advertise their drivers to intake by including an 'intake.drivers' entrypoint. For example, the library intake_xarray includes a driver for reading zarr files.

setup(
    ...
    entry_points={
        'intake.drivers': [
            'zarr = intake_xarray.xzarr:ZarrSource',
            ...
            ]
        }
    )

To discover this driver, intake uses the small library entrypoints, which provides a simple high-level API for searching all the installed Python packages for a given entrypoint.

>>> import entrypoints
>>> entrypoints.get_group_all('intake.drivers')
[EntryPoint('zarr', 'intake_xarray.xzarr', 'ZarrSource', None), ...]

The same approach is used by nbconvert to discover plugins for exporting Jupyter notebooks to other formats.

setup(
    ...
    entry_points={
        "nbconvert.exporters" : [
            'custom=nbconvert.exporters:TemplateExporter',
            'html=nbconvert.exporters:HTMLExporter',
            'slides=nbconvert.exporters:SlidesExporter',
            'latex=nbconvert.exporters:LatexExporter',
            'pdf=nbconvert.exporters:PDFExporter',
            'markdown=nbconvert.exporters:MarkdownExporter',
            'python=nbconvert.exporters:PythonExporter',
            'rst=nbconvert.exporters:RSTExporter',
            'notebook=nbconvert.exporters:NotebookExporter',
            'asciidoc=nbconvert.exporters:ASCIIDocExporter',
            'script=nbconvert.exporters:ScriptExporter']
        }
    )

Alternatives

Package Naming Convention

In earlier releases, intake used a different approach for driver discovery. It searched the set of all installed packages for ones whose names began with 'intake_'. It imported each one and searched its top-level namespace for classes that inherited from a certain base class. This had several disadvantages:

Drivers had to be packaged in packages named intake_* to be discoverable. This excluded libraries that pre-dated intake from adding discoverable intake drivers.
Drivers had to subclass an object from intake as opposed to duck-typing like one.
The discovery process was slow because it required importing every package named intake_* order to search its contents for subclasses.

Namespace Packages

A data export tool called suitcase uses Python namespace packages. Each participating package is expected to define a namespace package suitcase.X (for some X) and that package is expected to contain callable objects with certain names and signatures. Like the naming convention approach, this excludes pre-existing packages from participating in suitcase’s plugin mechanism. Additionally, namespace packages are fragile: if any package fails to implement namespace packaging correctly, it can break all the other installed suitcase.* packages.

Name Collisions

What if two different protocols happen to use the same entrypoint name? If package authors prefix their entrypoint with their package name, as in intake.drivers and nbconvert.exporters, we’ll avoid this problem.

There is also potential for name collisions within entrypoints. Suppose that in addition to intake_xarray’s zarr reader

setup(
    ...
    entry_points={
        'intake.drivers': [
            'zarr = intake_xarray.xzarr:ZarrSource',
            ...
            ]
        }
    )

I have installed an alternative zarr reader

setup(
    ...
    entry_points={
        'intake.drivers': [
            'zarr = my_alternative_reader:ZarrSource',
            ...
            ]
        }
    )

The function entrypoints.get_group_all('intake.drivers') returns a list with both Entrypoints. It’s up to the library author to decide how what to do from there. Intake resolves this with a configuration file and some command line tools for reviewing the options and specifying priority.

Where else should we use this?

Entrypoints are a good language feature for advertising objects in a library that participate in a plugin mechanism. I have applied it to intake, and I propose applying it to suitcase. There is also discussion about using in more places in the Jupyter ecosystem. Let’s keep an eye out for situations where a custom entrypoint is the best tool for the job.

Thanks to Min RK and Matthias Bussonnier for bringing this feature to my attention in their review of intake#236. Thanks also to Thomas Kluyver for the excellent entrypoints library and to Martin Durant for his work on intake.