Python Analytics Interface

Python analytics interface executes Python function as Kestrel analytics.

Use a Python Analytics

Create a profile for each analytics in the python analytics interface config file (YAML):

  • Default path: ~/.config/kestrel/pythonanalytics.yaml.

  • A customized path specified in the environment variable KESTREL_PYTHON_ANALYTICS_CONFIG.

Example of the python analytics interface config file:

profiles:
    analytics-name-1: # the analytics name to use in the APPLY command
        module: /home/user/kestrel-analytics/analytics/piniponmap/analytics.py
        func: analytics # the analytics function in the module to call
    analytics-name-2:
        module: /home/user/kestrel-analytics/analytics/suspiciousscoring/analytics.py
        func: analytics

Develop a Python Analytics

A Python analytics is a python function that follows the rules:

  1. The function takes in one or more Kestrel variable dumps in Pandas DataFrames.

  2. The return of the function is a tuple containing either or both:

    • Updated variables. The number of variables can be either 0, e.g., visualization analytics, or the same number as input Kestrel variables. The order of the updated variables should follow the same order as input variables.

    • An object to display, which can be any of the following types:

      • Kestrel display object

      • HTML element as a string

      • Matplotlib figure (by default, Pandas DataFrame plots use this)

    The display object can be either before or after updated variables. In other words, if the input variables are var1, var2, and var3, the return of the analytics can be either of the following:

    # the analytics enriches variables without returning a display object
    return var1_updated, var3_updated, var3_updated
    
    # this is a visualization analytics and no variable updates
    return display_obj
    
    # the analytics does both variable updates and visualization
    return var1_updated, var3_updated, var3_updated, display_obj
    
    # the analytics does both variable updates and visualization
    return display_obj, var1_updated, var3_updated, var3_updated
    
  3. Parameters in the APPLY command are passed in as environment varibles. The names of the environment variables are the exact parameter keys given in the APPLY command. For example, the following command

    APPLY python://a1 ON var1 WITH XPARAM=src_ref.value, YPARAM=number_observed
    

    creates environment variables $XPARAM with value src_ref.value and $YPARAM with value number_observed to be used by the analytics a1. After the execution of the analytics, the environment variables will be roll back to the original state.

  4. The Python function could spawn other processes or execute other binaries, where the Python function just acts like a wrapper. Check our domain name lookup analytics as an example.

class kestrel_analytics_python.interface.PythonInterface[source]

Bases: AbstractAnalyticsInterface

static schemes()[source]

Python analytics interface only supports python:// scheme.

static list_analytics(config)[source]

Load config to list avaliable analytics.

static execute(uri, argument_variables, config, session_id=None, parameters=None)[source]

Execute an analytics.

class kestrel_analytics_python.interface.PythonAnalytics(profile_name, profiles, parameters)[source]

Bases: AbstractContextManager

Handler of a Python Analytics

Use it as a context manager:

with PythonAnalytics(profile_name, profiles, parameters) as func:
    func(input_kestrel_variables)
  1. Validate and retrieve profile data. The data should be a dict with “module” and “func”, plus appropriate values.

  2. Prepare the analytics by loading the module. Also verify the function exists.

  3. Execute the analytics and process return intelligently.

  4. Clean the environment.

Parameters
  • profile_name (str) – The name of the profile/analytics.

  • profiles (dict) – name to profile (dict) mapping.

  • parameters (dict) – key-value pairs of parameters.