Source code for kestrel_datasource_stixshifter.interface

"""The STIX-shifter data source package provides access to data sources via
`stix-shifter`_.

The STIX-shifter interface connects to multiple data sources. Users need to
provide one *profile* per data source. The profile name (case insensitive) will
be used in the ``FROM`` clause of the Kestrel ``GET`` command, e.g., ``newvar =
GET entity-type FROM stixshifter://profilename WHERE ...``. Kestrel runtime
will load profiles from 3 places (the later will override the former):

#. STIX-shifter interface config file (only when a Kestrel session starts):

    Create the STIX-shifter interface config file (YAML):

    - Default path: ``~/.config/kestrel/stixshifter.yaml``.

    - A customized path specified in the environment variable ``KESTREL_STIXSHIFTER_CONFIG``.

    Example of STIX-shifter interface config file containing profiles
    (note that the ``options`` section is not required):

    .. code-block:: yaml

        profiles:
            host101:
                connector: elastic_ecs
                connection:
                    host: elastic.securitylog.company.com
                    port: 9200
                    indices: host101
                    options:  # use any of this section when needed
                        verify_cert: false  # allow invalid/expired/self-signed certificate
                        retrieval_batch_size: 10000  # set to 10000 to match default Elasticsearch page size; Kestrel default across connectors: 2000
                        single_batch_timeout: 120  # increase it if hit 60 seconds (Kestrel default) timeout error for each batch of retrieval
                        cool_down_after_transmission: 2  # seconds to cool down between data source API calls, required by some API such as sentinelone; Kestrel default: 0
                        allow_dev_connector: True  # do not check version of a connector to allow custom/testing connector installed with any version; Kestrel default: False
                        dialects:  # more info: https://github.com/opencybersecurityalliance/stix-shifter/tree/develop/stix_shifter_modules/elastic_ecs#dialects
                          - beats  # need it if the index is created by Filebeat/Winlogbeat/*beat
                config:
                    auth:
                        id: VuaCfGcBCdbkQm-e5aOx
                        api_key: ui2lp2axTNmsyakw9tvNnw
            host102:
                connector: qradar
                connection:
                    host: qradar.securitylog.company.com
                    port: 443
                config:
                    auth:
                        SEC: 123e4567-e89b-12d3-a456-426614174000
            host103:
                connector: cbcloud
                connection:
                    host: cbcloud.securitylog.company.com
                    port: 443
                config:
                    auth:
                        org-key: D5DQRHQP
                        token: HT8EMI32DSIMAQ7DJM
        options:  # this section is not required
            fast_translate:  # use firepit-native translation (Dataframe as vessel) instead of stix-shifter result translation (JSON as vessel) for the following connectors
                - qradar
                - elastic_ecs
            translation_workers_count: 8  # default: 2

    Full specifications for data source profile sections/fields:

    - Connector-specific fields: in `stix-shifter`_, go to ``stix_shifter_modules/connector_name/configuration`` like `elastic_ecs config`_.

    - General fields shared across connectors: in `stix-shifter`_, go to `stix_shifter_modules/lang_en.json`_.

    The stix-shifter YAML config supports expansion of environment variables,
    e.g., ``$HOST101_ID`` and ``$HOST101_KEY`` will be replaced by values from
    the environment variables when the following section of the config loads by
    Kestrel:

    .. code-block:: yaml

        profiles:
            host101:
                config:
                    auth:
                        id: $HOST101_ID
                        api_key: $HOST101_KEY

#. environment variables (only when a Kestrel session starts):

    Three environment variables are required for each profile:

    - ``STIXSHIFTER_PROFILENAME_CONNECTOR``: the STIX-shifter connector name,
      e.g., ``elastic_ecs``.

    - ``STIXSHIFTER_PROFILENAME_CONNECTION``: the STIX-shifter `connection
      <https://github.com/opencybersecurityalliance/stix-shifter/blob/master/OVERVIEW.md#connection>`_
      object in JSON string.

    - ``STIXSHIFTER_PROFILENAME_CONFIG``: the STIX-shifter `configuration
      <https://github.com/opencybersecurityalliance/stix-shifter/blob/master/OVERVIEW.md#configuration>`_
      object in JSON string.

    Example of environment variables for a profile:

    .. code-block:: console

        $ export STIXSHIFTER_HOST101_CONNECTOR=elastic_ecs
        $ export STIXSHIFTER_HOST101_CONNECTION='{"host":"elastic.securitylog.company.com", "port":9200, "indices":"host101"}'
        $ export STIXSHIFTER_HOST101_CONFIG='{"auth":{"id":"VuaCfGcBCdbkQm-e5aOx", "api_key":"ui2lp2axTNmsyakw9tvNnw"}}'

#. any in-session edit through the ``CONFIG`` command.

After added data source profiles into ``stixshifter.yaml``, you can test the data source:

.. code-block:: console

    $ stix-shifter-diag data_source_name

where ``data_source_name`` is any profile named in the ``stixshifter.yaml`` config file, usually used in ``FROM stixshifter://data_source_name`` in the ``GET`` command.

The diagnosis utility will check config, test query translation, try connect to the data source to execute a small and a large query, and retrieve data back. Details of all steps will be printed for diagnosis purpose.

If you launch Kestrel in debug mode, STIX-shifter debug mode is still not
enabled by default. To record debug level logs of STIX-shifter, create
environment variable ``KESTREL_STIXSHIFTER_DEBUG`` with any value.

.. _STIX-shifter: https://github.com/opencybersecurityalliance/stix-shifter
.. _elastic_ecs config: https://github.com/opencybersecurityalliance/stix-shifter/blob/develop/stix_shifter_modules/elastic_ecs/configuration/lang_en.json
.. _stix_shifter_modules/lang_en.json: https://github.com/opencybersecurityalliance/stix-shifter/blob/develop/stix_shifter_modules/lang_en.json

"""

import multiprocessing
from kestrel.datasource import AbstractDataSourceInterface
from kestrel_datasource_stixshifter.config import load_profiles
from kestrel_datasource_stixshifter.query import query_datasource


multiprocessing.set_start_method("spawn", force=True)


[docs]class StixShifterInterface(AbstractDataSourceInterface):
[docs] @staticmethod def schemes(): """STIX-shifter data source interface only supports ``stixshifter://`` scheme.""" return ["stixshifter"]
[docs] @staticmethod def list_data_sources(config): """Get configured data sources from environment variable profiles.""" # CONFIG command is not supported # profiles will be updated according to YAML file and env var config["profiles"] = load_profiles() data_sources = list(config["profiles"].keys()) data_sources.sort() return data_sources
[docs] @staticmethod def query(uri, pattern, session_id, config, store, limit=None): """Query a stixshifter data source.""" return query_datasource(uri, pattern, session_id, config, store, limit)