Skip to main content

Tokenize Source Event Data

Moogsoft Onprem tokenizes incoming data. After it has divided the data into tokens Moogsoft Onprem assembles the tokens into an event. This topic covers the tokenizing options so you can control how tokenising works.

Start and end characters

The first two are a start and end character. The square brackets [] are the JSON notation for a list. You can have multiple start and end characters. The system considers an event as all of the tokens between any start and end character.

start: [],
end: ["\n"],

The above example specifies:

  • There is nothing defined in start; however, a carriage return (new line) is defined as the end character

In the example above, the LAM is expecting an entire line to be written followed by a return, and it will process the entire line as one event.

Carefully set up, you can accept multi-line events.

Regular expressions

Regular expressions can be used to extract relevant data from the input data. Here's an example definition:

parsing:
{
    type: "regexp",
    regexp:
    {
        pattern : "(?m)^START: (.*?)$",
        capture_group: 1,
        tokeniser_type: "delimiters",
        delimiters:
        {
            ignoreQuotes: true,
            stripQuotes: true,
            ignores:    "",
            delimiter:  ["||","\r"]
        }
    }
}

Delimiters

Delimiters define how string are split into tokens for processing. To process a comma-separated file, where a comma separates each value, define the comma as a delimiter.

Token are referenced from the start position starting at one (not zero).

For example, for the input string “the,cat,sat,on,the,mat” where the delimiter is a comma, token 1 is “the”, token 2 “cat” and so on.

Combining tokenization and parsing can be complex. For example, if you use a comma delimiter and the token contains a comma, the token is split into two. To avoid this you can quote strings. You can then define whether to strip or ignore quotes.

An example delimiters section in a configuration file is as follows:

delimiters:
{
    ignoreQuotes: true,
    stripQuotes: false,
    ignores: "",
    delimiter: [",","\r"]
}

When ignoreQuotes is set to true, all quotes are ignored and inputs are tokenised on the delimiters only.

When ignoreQuotes is false, delimiting does not occur until the matching end quote is found. This allows tokens to include delimiters. For example, given the following input when the delimiter is a comma:

hello world, "goodbye, cruel world".

Found tokens when ignoreQuotes is true: [hello world, goodbye, cruel world] (3).

Found tokens when ignoreQuotes is false: [hello world, "goodbye, cruel world"] (2).

Set stripQuotes to true to remove start and end quotes from tokens. For example, "hello world" results in a single token: [hello world].

Ignores is a list of characters to ignore. Ignored characters are never included in tokens.

Delimiter is the list of valid delimiters used to split strings into tokens.

Mapping

For each event in the file, there is a positioned collection of tokens. Moogsoft Onprem enables you to name these positions so if you have a large number of tokens in a line, of which you are interested in only five or six, instead of remembering it is token number 32, you can call token 32 something meaningful.

variables:
[
    { name: "Identifier", position: 1 },
    { name: "Node", position: 4 },
    { name: "Serial", position: 3 },
    { name: "Manager", position: 6 },
    { name: "AlertGroup", position: 7 },
    { name: "Class", position: 8 },
    { name: "Agent", position: 9 },
    { name: "Severity", position: 5 },
    { name: "Summary", position: 10 },
    { name: "LastOccurrence",position: 1 }
]

The above example specifies:

  • position 1 is assigned to Identifier; position 4 is assigned to node and so on

  • Positions start at 1, and go up rather than array index style counting from 0

This is important because at the bottom of the file, socket_lam.conf there is a mapping object that configures how Moogsoft Onprem assigns to the attributes of the event that is sent to the message bus, values from the tokens that are parsed. For example, in mapping there is a value called rules, which is a list of assignments.

mapping:
{
    catchAll: "overflow",
    rules:
    [
        { name: "signature", rule: "$Node:$Serial" },
        { name: "source_id", rule: "$Node" },
        { name: "external_id", rule: "$Serial" },
        { name: "manager", rule: "$Manager" },
        { name: "source", rule: "$Node" },
        { name: "class", rule: "$Class" },
        { name: "agent", rule: "$LamInstanceName" },
        { name: "agent_location", rule: "$Node" },
        { name: "type", rule: "$AlertGroup" },
        { name: "severity", rule: "$Severity", conversion: "sevConverter" },
        { name: "description", rule: "$Summary" },
        { name: "first_occurred", rule: "$LastOccurrence" ,conversion: "stringToInt"},
        { name: "agent_time", rule: "$LastOccurrence",conversion: "stringToInt"}
    ]
}

In the example above, the first assignment name: "signature",rule:"$Node:$Serial" ( "$Node:$Serial is a string with $ syntax) means for signature take the tokens called Node and Serial and form a string with the value of Node followed by a colon followed by the value of Serial and call that signature in the event that is sent to the Moogsoft Onprem.

You define a number of these rules covering the base attributes of an event. For reference, Moogsoft Onprem expects a minimum set of attributes in an event that are shown in this particular section.

Using braces within mapping definitions allows you to include URLs and special characters. For example:

mapping:
{
    [
        { name: "type", rule: "${https://url}" },
        { name: "type", rule: "${https://url} customText" },
        { name: "type", rule: "${https://url}${keyA\\b\\c}" }
    ]
}

Escape backslashes (\\) and note that you cannot embed variables.

If you have an attribute that is never referenced in a rule, for example “enterprise trap number” which is never mapped into the attribute of an event, they are collected and placed as a JSON object in a variable defined in catchAll and passed as part of the event.

Custom info mapping

You can define custom_info mapping in LAM configuration files. This allows you to configure a hierarchical structure. An example mapping configuration is:

mapping:
{
    rules:
    [
        { name: "custom_info.eventDetails.branch", rule: "$branch" },
        { name: "custom_info.eventDetails.location", rule: "$location" },
        { name: "custom_info.ticketing.id", rule: "$incident_id" }
    ]
}

This produces the following custom_info structure:

"custom_info": {
    "eventDetails": 
    {
        "branch":"Kingston",
        "location":"KT1 1LF"
    },
    "ticketing": 
    {
        "id":94111
    }
}

You can use braces within mapping definitions. This allows you to include URLs and special characters. For example:

{ name: "type", rule: "${https://url}" },
{ name: "type", rule: "${https://url} customText" },
{ name: "type", rule: "${https://url}${keyA.b.c}" }

Note that you must escape backslashes and you cannot embed variables.

Polling LAMs with multiple target support

See Configure Polling LAMs to Poll More Than One Target Data Source.

Filtering

The filter defines whether a LAM uses a LAMbot. A LAMbot moves overflow properties to custom info and performs any actions that are configured in its LAMbot file. The LAMbot processing is defined in the presend property in the filter section of the LAM configuration file.

For example, the SolarWinds LAM configuration file contains this filter section:

filter:
{
    modules : ["CommonUtils.js"],
    presend : "SolarWindsLam.js"
}

This indicates that SolarWindsLam.js processes the events and then sends them to the Message Bus.

If you don’t want to map overflow properties, you can comment out the presend property to bypass the LAMbot and send events straight to the Message Bus. This speeds up processing if you have a high volume of incoming alerts. Alternatively, you can define a custom stream to receive events. See Alert Builder for details.

See LAMbot Configuration for more information on the presend function.

The optional modules property can be used to provide a list of JavaScript files that are loaded into the context of the LAMbot and executed. It allows LAMs to share modules. For example, you can write a generic Syslog processing module that is used in both the Socket LAM and the Logfile LAM. This reduces the need for duplicated code in each LAMbot.

Conversion rules

Conversion rules are used by Moogsoft Onprem to convert received data into a usable format, including severity levels and timestamps.

Severity

The following example looks up the value of severity and returns the mapped integer.

conversions:
{
    sevConverter:
    {
        lookup: "severity",
        input: "STRING",
        output: "INTEGER"
    },
}, 
constants:
{
    severity:
    {
        "CLEAR": 0,
        "INDETERMINATE": 1,
        "WARNING": 2,
        "MINOR": 3,
        "MAJOR": 4,
        "CRITICAL": 5,
        moog_lookup_default: 3
    }
}

In the above example:

  • conversions receives a text value for severity.

  • sevConverter uses a lookup table "severity" to reference a table named severity defined in the constants section.

  • The integer value matching the text value is returned.

  • moog_lookup_default is used to specify a default value when a received event does not map to a listed value.

For example, the text value "MINOR" is received and the integer value 3 is returned.

If moog_lookup_default is not used and a received event severity does not map to a specifically listed value, the event is not processed.

See Severity Reference for more information about the severity levels in Moogsoft Onprem.

Time

Time conversion in Moogsoft Onprem supports the Java platform standard API specification. See Simple Date Format for more information.

Some Unix time formats are indirectly supported and LAM logging indicates any automatic conversion that occurred at startup.

The only PCRE/Perl modifier automatically converted is the lone 'U' ungreedy modifier, PCRE's '-U' is not supported. If the pattern contains a -U it should be removed manually.

You can specify a time zone configuration so the LAM parses the incoming timestamps with the expected time zone. For example:

conversions:
{
    timeUnitConverter:
    {
        timeUnit: "MILLISECONDS",
        input: "STRING",
        output: "INTEGER"
    },
    timeConverter:
    {
        timeFormat: "%Y-%m-%dT%H:%M:%S",
        timeZone: "UTC",
        input: "STRING",
        output: "INTEGER"
    }
}

You can specify the timezone name or abbreviation. See List of TZ Database Time Zones for the full list.

JSON events

The other capability of all LAMs is the native ability to consume JSON events. You must have a start and end carriage return as it is expecting a whole JSON object following the carriage return.

Under parsing you have:

end: ["\n"]

For the delimiter you have:

delimiter: ["\r"]

JSON is a sequence of attribute/value, and the attribute is used as a name. Under mapping, you must define the following attribute builtInMapper: "CJsonDecoder". It automatically populates, prior to the rules being run, all of the values contained in the JSON object.

For example if the JSON object to be parsed was:

{"Node" : "acmeSvr01","Severity":"Major"...}\n

The attributes available to the rules in the mapping section would be xNode="acmeSvr01", $Severity="Major" and so on.