Alert Correlation Engine

The Alert Correlation Engine is a core component of the Threat Detection system that analyzes incoming alerts against historical data to identify patterns, relationships, and threat actor behaviors. By automatically linking related events and calculating historical context, it provides security analysts with enriched intelligence, reducing triage time and highlighting recurring false positives or persistent threats.

This engine consists of two primary parts: the real-time correlation matching logic (written in Go) and the correlation rule definitions managed by the backend (written in Java).

Architecture

The correlation process evaluates new alerts against historical data stored in Elasticsearch, applying strict matching criteria to build a comprehensive threat context.

Correlation Matching Logic

The real-time correlation engine (plugins/soc-ai/correlation/correlation.go) is responsible for querying historical alerts and determining if they are related to the current alert.

To prevent performance degradation when processing highly common alerts, the engine uses CORRELATION_MAX_ALERTS to limit the Elasticsearch query against the ALERT_INDEX_PATTERN.

Matching Criteria

For an historical alert to be correlated with a current alert, it must first meet the baseline criteria: it must have the exact same alert name, but a different unique ID.

If the baseline is met, the engine evaluates four specific entity relationships:

Alert Classification

When a match is found, the engine analyzes the tags of the historical alerts to determine their past classification. This helps analysts understand if a recurring alert is typically a false positive or an actual incident.

The engine categorizes historical alerts into four distinct classifications based on their first tag:

Possible incident
False positive
Standard alert
Unclassified alert (Fallback for empty or unrecognized tags)

Context Generation Process

The engine synthesizes the matched data into a human-readable context string appended to the alert.

The engine queries Elasticsearch for recent alerts with the same name, up to the configured limit.

Each historical alert is compared against the current alert using the isAlertRelated function to find matching IPs or Users.

Matches are grouped by their match types (e.g., "Adversary IP and Target IP") and their historical classifications are tallied.

A formatted string is generated summarizing the findings, such as: "In the past, there are X alerts with the same name... Y match the same Adversary IP and of these Z were classified as False positive."

Correlation Rules Data Model

The rules that dictate how alerts are generated and grouped are managed by the backend Java application using the UtmCorrelationRules JPA entity.

Entity Structure

The utm_correlation_rules table stores the definitions, metadata, and risk scoring for each correlation rule.

CIA Triad Scoring
Every correlation rule includes mandatory scoring for Confidentiality (ruleConfidentiality), Integrity (ruleIntegrity), and Availability (ruleAvailability). These values must be integers between 0 and 3.

@Entity
@Table(name = "utm_correlation_rules")
public class UtmCorrelationRules implements Serializable {
    @Id
    private Long id;

    @Column(name = "rule_name", length = 250, nullable = false)
    private String ruleName;

    @Enumerated(EnumType.STRING)
    @Column(name = "rule_adversary", length = 25, nullable = false)
    private AdversaryType ruleAdversary;
    
    // CIA Triad (0-3)
    @Min(value = 0) @Max(value = 3)
    private Integer ruleConfidentiality;
    // ... integrity and availability omitted for brevity
}

JSON Serialization Pattern

Several complex fields within the rule definition (such as ruleReferences, afterEvents, groupBy, and deduplicateBy) are stored as JSON strings in the database but exposed as standard Java List objects to the application.

The entity handles this using a specific serialization pattern with @Transient fields and custom getters/setters:

    @JsonIgnore
    @Column(name = "rule_deduplicate_by_def")
    private String deduplicateByDef; // Stored in DB as JSON string

    @Transient
    @JsonSerialize
    @JsonDeserialize
    @Getter(AccessLevel.NONE)
    @Setter(AccessLevel.NONE)
    private List<String> deduplicateBy; // Used by the application

    public List<String> getDeduplicateBy() throws UtmSerializationException {
        if (StringUtils.hasText(deduplicateByDef))
            deduplicateBy = UtilSerializer.jsonDeserializeList(String.class, deduplicateByDef);
        return deduplicateBy == null ? new ArrayList<>() : deduplicateBy;
    }

    public void setDeduplicateBy(List<String> deduplicateBy) throws UtmSerializationException {
        if (CollectionUtils.isEmpty(deduplicateBy))
            this.deduplicateByDef = null;
        else
            this.deduplicateByDef = UtilSerializer.jsonSerialize(deduplicateBy);
        this.deduplicateBy = deduplicateBy;
    }

When interacting with the UtmCorrelationRules entity programmatically, always use the getter/setter methods for list properties (e.g., getGroupBy()) rather than attempting to parse the raw Def string columns directly. The getters handle the UtmSerializationException and ensure null safety.

Data Type Relationships

Correlation rules are linked to specific data types via a Many-to-Many relationship. This ensures rules are only evaluated against relevant log sources. This relationship is managed through the utm_group_rules_data_type join table and eagerly fetched via the dataTypes property.

Custom Filters

If you need to extend the correlation engine's capabilities or create custom filters for your specific environment, refer to the official UTMStack documentation.

Custom Filters Documentation

Learn how to create, configure, and maintain custom filters and correlation rules in the UTMStack Wiki.