Collectors

Collectors are used to gather data from various sources.

Supported options:

  1. RSS Collector
  2. Simple Web Collector
  3. RT Collector

The administration view now allows users to use the Preview feature to see the result of the configuration without the items being processed further for the Assess view. This feature is available for RSS, Simple Web and RT collector.

RSS Collector

RSS Collector enables Taranis AI to collect data from a user-defined RSS feed (See RSS feeds details).

  • Required fields:
    • FEED_URL
  • Optional fields:
    • USER_AGENT
    • PROXY_SERVER
    • ADDITIONAL_HEADERS [accepts a valid json] (can be used to add additional headers, not all headers work as expected)
    • CONTENT_LOCATION
    • XPATH
    • TLP_LEVEL
    • REFRESH_INTERVAL (see Bots - refresh_interval)
    • DIGEST_SPLITTING On/Off (creates News Items out of URLs present in the Summary field of RSS feed)
    • DIGEST_SPLITTING_LIMIT (default: 30)
    • BROWSER_MODE On/Off (see Browser Mode)

Basic configuration

Advanced configuration

The RSS Collector supports the use of XPath for locating elements. (See Simple Web Collector Advanced configuration)

Simple Web Collector

Simple Web Collector enables Taranis AI to collect data using web URLs and XPaths.

  • Required field:
    • WEB_URL
  • Optional fields:
    • USER_AGENT
    • PROXY_SERVER
    • ADDITIONAL_HEADERS
    • XPATH
    • TLP_LEVEL
    • DIGEST_SPLITTING On/Off
    • DIGEST_SPLITTING_LIMIT (default: 30)
    • BROWSER_MODE On/Off (see Browser Mode)

Basic configuration

The simplest way to use this collector is to use the WEB_URL field only. By using only the WEB_URL field, Taranis-AI autonomously determines the content to be collected. Even though it is mostly reliable, sometimes it is not perfect.

Advanced configuration

When content cannot be reliably collected using the Basic configuration, adding the attribute XPATH (See tutorial how to find it), can be useful. It is crucial to specify the XPath of the precise element containing the desired data.

RT Collector

RT Collector enables Taranis AI to collect data from a user-defined Request Tracker instance.

  • Required fields:

    • BASE_URL: Base URL of the RT instance (e.g. localhost).
    • RT_TOKEN: User token for the RT instance.
  • Optional fields:

    • ADDITIONAL_HEADERS
    • TLP_LEVEL

Digest Splitting

Digest Splitting is a feature that allows the user to split all available URLs in the located element into individual News Items. The Digest Splitting Limit is the maximum number of URLs that will be split into individual News Items. If the limit is reached, the remaining URLs are dropped. The Digest Splitting Limit is set to 30 News Items by default but can be adjusted by the administrator. Useful in case of timeouts during collection of too many News Items.

Browser Mode

Collectors will fail if the web page content is only available with JavaScript. In that case it is possible to turn on the Browser Mode. All requests will have JavaScript enabled, therefore, it is slower and can use more resources.