PROMPT:
After completing your Chapter 14 reading, consider the “Intelligence Cy
PROMPT: After completing your Chapter 14 reading, consider the “Intelligence Cycle for NSM” section and how IDS and IPS technologies help you gather data about activities in your network. In your initial post, select one step of the intelligence cycle and discuss how IDS/IPS false positives or negatives could impact your selected step.
CHAPTER 14: Friendly and Threat Intelligence
Abstract
The ability to generate intelligence related to friendly and hostile systems can be the defining factor that makes or breaks an investigation. This chapter begins with an introduction to the traditional intelligence cycle and how it relates to NSM analysis intelligence. Following this, we look at methods for generating friendly intelligence by generating asset data from network scan and leveraging PRADS data. Finally, we examine the types of threat intelligence and discuss some basic methods for researching tactical threat intelligence related to hostile hosts.
Keywords
Network Security Monitoring; Analysis; Intelligence; Threat; Hostile; Friendly; PRADS; nmap; Tactical; Strategic; Intel
CHAPTER CONTENTS
The Intelligence Cycle for NSM
Defining Requirements
Planning
Collection
Processing
Analysis
Dissemination
Generating Friendly Intelligence
The Network Asset History and Physical
Defining a Network Asset Model
Passive Real-time Asset Detection System (PRADS)
Making PRADS Data Actionable
Generating Threat Intelligence
Researching Hostile Hosts
Internal Data Sources
Open Source Intelligence
Researching Hostile Files
Open Source Intelligence
Conclusion
Intelligence has many definitions depending on the application. The definition that most closely aligns to NSM and information security is drawn from Department of Defense Joint Publication 1-02, and says that “intelligence is a product resulting from the collection, processing, integration, evaluation, analysis, and interpretation of available information concerning foreign nations, hostile or potentially hostile forces or elements, or areas of actual or potential operations.1 ”
While this definition might not fit perfectly for a traditional SOC performing NSM services (particularly the part about information concerning foreign nations), it does provide the all-important framing required to begin thinking about generating intelligence. The key component of this definition is that intelligence is a product. This doesn’t mean that it is bought or sold for profit, but more specifically, that it is produced from collected data, based upon a specific requirement. This means that an IP address, or the registered owner of that address, or the common characteristics of the network traffic generated by that IP address are not intelligence products. When those things are combined with context through the analysis process and delivered to meet a specific requirement, they become an intelligence product.
Most SOC environments are generally concerned with the development of two types of intelligence products: friendly intelligence and threat intelligence. In this chapter, we will take a look at the traditional intelligence cycle and methods that can be used to generate these intelligence products. This includes the creation of friendly intelligence products, as well as threat products associated with tactical threat intelligence. While reading, you should keep in mind that there are many components to intelligence as a whole, and we are only covering a small subset of that here.
The Intelligence Cycle for NSM *****
The generation of intelligence products in a SOC requires the coordinated effort of multiple stakeholders within the organization. Because there are so many moving parts to the process, it helps to be able to organize the intelligence generation process into an organized, repeatable framework. The framework that the government and military intelligence community (IC) have relied on for years is called the Intelligence Cycle.
Depending on the source you reference, the intelligence cycle can be broken down into any number of steps. For the purposes of this book, we will look at a model that uses six steps: defining requirements, planning, collection, processing, analysis, and dissemination. These steps form a cycle that can continually feed itself, ultimately allowing its products to shape how newer products are developed (Figure 14.1).
FIGURE 14.1 The Traditional Intelligence Cycle
Let’s go through each of these steps to illustrate how this cycle applies to the development of friendly and hostile intelligence for NSM.
Defining Requirements
An intelligence product is generated based upon a defined requirement. This requirement is what all other phases of the intelligence cycle are derived from. Just like a movie can’t be produced without a script, an intelligence product can’t be produced without a clearly defined intelligence requirement.
In terms of information security and NSM, that requirement is generally focused on a need for information related to assets you are responsible for protecting (friendly intelligence), or focused on information related to hosts that pose a potential threat to friendly assets (hostile intelligence).
These requirements are, essentially, requests for information and context that can help NSM analysts make judgments relevant to their investigations. This phase is ultimately all about asking the right questions, and those questions depend on whether the intelligence requirement is continual or situational. For instance, the development of a friendly intelligence product is a continual process, meaning that questions should be phrased in a broad, repeatable manner.
Some examples of questions designed to create baselines for friendly communication patterns might be:
What are the normal communication patterns occurring between friendly hosts?
What are the normal communication patterns occurring between sensitive friendly hosts and unknown external entities?
What services are normally provided by friendly hosts?
What is the normal ratio of inbound to outbound communication for friendly hosts?
On the other end of the spectrum, the development of a threat intelligence product is a situational process, meaning that questions are often specific, and designed to generate a single intelligence product for a current investigation:
Has the specific hostile host ever communicated with friendly hosts before, and if so, to what extent?
Is the specific hostile host registered to an ISP where previous hostile activity has originated?
How does the content of the traffic generated by the specific hostile host compare to activity that is known to be associated with currently identified hostile entities?
Can the timing of this specific event be tied to the goals of any particular organization?
Once you have asked the right question, the rest of the cards should begin to fall into place. We will delve further into the nature of friendly and threat intelligence requirements later in their respective sections.
Planning
With an intelligence requirement defined, appropriate planning can ensure that the remaining steps of the intelligence cycle can be completed. This involves planning each of these steps and assigning resources to them. In NSM terms, this means different things for different steps. For instance, during the collection phase this may mean assigning level three analysts (thinking back to our Chapter 1 discussion of classifying analysts) and systems administrators to work with sensors and collection tools. In the processing and analysis phase this may mean assigning level one and two analysts to these processes and sectioning off a portion of their time to work on this task.
Of course, the types of resources, both human and technical, that you assign to these tasks will vary depending upon your environment and the makeup of your technical teams. In larger organizations you may have a separate team specifically for generating intelligence products. In smaller organizations, you might be a one-man show responsible for the entirety of intelligence product creation. No matter how large or small your organization, you can participate in the development of friendly and threat intelligence.
Collection
The collection phase of the intelligence cycle deals with the mechanisms used for collecting the data that supports the outlined requirements. This data will eventually be processed, analyzed, and disseminated as the intelligence product.
In a SOC environment, you may find that your collection needs for intelligence purposes will force you to modify your overall collection plan. For the purposes of continual friendly intelligence collection, this can include the collection of useful statistics, like those discussed in Chapter 11, or the collection of passive real-time asset data, like the data generated with a tool we will discuss later, called PRADS.
When it comes to situational threat intelligence collection, data will typically be collected from existing NSM data sources like FPC or session data. This data will generally be focused on what interaction the potentially hostile entity had with trusted network assets. In addition, open source intelligence gathering processes are utilized to ascertain publicly available information related to the potentially hostile entity. This might include items like information about the registrant of an IP address, or known intelligence surrounding a mysterious suspicious file.
In order for intelligence collection to occur in an efficient manner, collection processes for certain types of data (FPC, PSTR, Session, etc.) should be well-documented and easily accessible.
Processing
Once data has been collected, some types of data must be further processed to become useful for analysis. This can mean a lot of different things for a lot of different types of data.
At a higher level, processing can mean just paring down the collected data set into something more immediately useful. This might mean applying filters to a PCAP file to shrink the total working data set, or selecting log files of only a certain type from a larger log file collection.
At a more granular level, this might mean taking the output from a third party or custom tool and using some BASH commands to format the output of those tools into something more easily readable. In cases where an organization is using a custom tool or database for intelligence collection, it might mean writing queries to insert data into this format, or pull it out of that format into something more easily readable.
Ultimately, processing can sometimes be seen as an extension of collection where collected data is pared down, massaged, and tweaked into a form that is ideal for the analyst.
Analysis
The analysis phase is where multiple collected and processed items are examined, correlated, and given the necessary context the make them useful. This is where intelligence goes from just being loosely related pieces of data to a finished product that is useful for decision-making.
In the analysis and generation of both friendly and threat intelligence products, the analyst will take the output of several tools and data sources and combine those data points on a per host basis, painting a picture of an individual host. A great deal more intelligence will be available for local hosts, and might allow this picture to include details about the tendencies and normal communication partners of the host. The analysis of potentially hostile hosts will be generated from a much smaller data set, and require the incorporation of open source intelligence into the analysis process.
What ultimately results from this process is the intelligence product, ready to be parsed by the analyst.
Dissemination
In most practical cases, an organization won’t have a dedicated intelligence team, meaning the NSM analysts will be generating intelligence products for their own use. This is a unique advantage, because the consumer of the intelligence will usually be the same person who generated it, or will at least be in the same room or under the same command structure. In the final phase of the intelligence cycle, the intelligence product is disseminated to the individual or group who initially identified the intelligence requirement.
In most cases, the intelligence product is constantly being evaluated and improved. The positive and negative aspects of the final product are critiqued, and this critique goes back into defining intelligence requirements and planning the product creation process. This is what makes this an intelligence cycle, rather than just an intelligence chain.
The remainder of this chapter is devoted to the friendly and threat intelligence products, and ways to generate and obtain that data. While the intelligence framework might not be referenced exclusively, the actions described in these sections will most certainly fit into this framework in a manner that can be adapted to nearly any organization.
Generating Friendly Intelligence
You cannot effectively defend your network if you do not know what is on it, and how it communicates. This statement cannot be emphasized enough. No matter how simple or sophisticated an attack may be, if you don’t know the roles of the devices on your network, especially those where critical data exists, then you won’t be able to effectively identify when an incident has occurred, contain that incident, or eradicate the attacker from the network. That’s why the development of friendly intelligence is so important.
In the context of this book, we present friendly intelligence as a continually evolving product that can be referenced to obtain information about hosts an analyst is responsible for protecting. This information should include everything the analyst needs to aid in the event of an investigation, and should be able to be referenced at any given time. Generally, an analyst might be expected to reference friendly intelligence about a single host any time they are investigating alert data associated with that host. This would typically be when the friendly host appears to be the target of an attack. Because of that, it isn’t uncommon for an analyst to reference this data dozens of times per shift for a variety of hosts. Beyond this, you should also consider that the analysis of friendly intelligence could also result in the manual observance of anomalies that can spawn investigations. Let’s look at a few ways to create friendly intelligence from network data.
The Network Asset History and Physical
When a physician assesses a new patient, the first thing they perform is an evaluation of the medical history and physical condition of the patient. This is called a patient history and physical, or an H&P. This concept provides a useful framework that can be applied the friendly intelligence of network assets.
The patient history assessment includes current and previous medical conditions that could impact the patient’s current or future health. This also usually includes a history of the patient’s family’s health conditions, so that risk factors for those conditions in the patient can be identified and mitigated.
Shifting this concept to a network asset, we can translate a network asset’s medical history to its connection history. This involves assessing previous communication transactions between the friendly host and other hosts on the network, as well as hosts outside of the network. This connection profiling extends beyond the hosts involved in this communication, but also to the services used by the host, both as a client and a server. If we can assess this connection history, we can make educated guesses about the validity of new connections a friendly host makes in the context of an investigation.
The patient physical exam captures the current state of a patient’s physical health, and measures items such as the patient’s demographic information, their height and weight, their blood pressure, and so on. This product of the physical exam is an overall assessment of a patient’s health. Often physical exams will be conducted with a targeted goal, such as assessments that are completed for the purposes of health insurance, or for clearance to play a sport.
When we think about a friendly network asset in terms of the patient physical exam, we can begin to identify criteria that help define the state the asset on the network, opposed to a state of health in a patient. These criteria include items such as the IP address and DNS name of the asset, the VLAN it is located in, the role of the device (workstation, web server, etc.), the operating system architecture of the device, or its physical network location. The product of this assessment on the friendly network asset is a state of its operation on the network, which can be used to make determinations about the activity the host is presenting in the context of an investigation.
Now, we will talk about some methods that can be used to create a network asset H&P. This will include using tools like Nmap to define the “physical exam” portion of an H&P through the creation of an asset model, as well as the use of PRADS to help with the “history” portion of the H&P by collecting passive real-time asset data.
Defining a Network Asset Model
A network asset model is, very simply, a list of every host on your network and the critical information associated with it. This includes things like the host’s IP address, DNS name, general role (server, workstation, router, etc), the services it provides (web server, SSH server, proxy server, etc), and the operating system architecture. This is the most basic form of friendly intelligence, and something all SOC environments should strive to generate.
As you might imagine, there are a number of ways to build a network asset model. Most organizations will employ some form of enterprise asset management software, and this software often has the capacity to provide this data. If that is true for your organization, then that is often the easiest way to get this data to your analysts.
If your organization doesn’t have anything like that in place, then you may be left to generate this type of data yourself. In my experience, there is no discrete formula for creating an asset model. If you walk into a dozen organizations, you will likely find a dozen different methods used to generate the asset model and a dozen more ways to access and view that data. The point of this section isn’t to tell you exactly how to generate this data, because that is something that will really have to be adapted from the technologies that exist in your organization. The goal here is simply to provide an idea of what an asset model looks like, and to provide some idea of how you might start generating this data in the short term.
Caution
Realistically, asset inventories are rarely 100% accurate. In larger organizations with millions of devices, it just isn’t feasible to create asset models that are complete and always up to date. That said, you shouldn’t strive to achieve a 100% solution if it just isn’t possible. In this case, sometimes it’s acceptable to shoot for an 80% solution because it is still 80% better than 0%. If anything, do your best to generate asset models of critical devices that are identified while doing collection planning.
One way to actively generate asset data is through internal port scanning. This can be done with commercial software, or with free software like Nmap. For instance, you can run a basic SYN scan with this command:
nmap –sn 172.16.16.0/24
This command will perform a basic ICMP (ping) scan against all hosts in the 172.16.16.0/24 network range, and generate output similar to Figure 14.2.
FIGURE 14.2 Ping Scan Output from Nmap
As you can see in the data shown above, any host that is allowed to respond to ICMP echo request packets will respond with an ICMP echo reply. Assuming all of the hosts on your network are configured to respond to ICMP traffic (or they have an exclusion in a host-based firewall), this should allow you to map the active hosts on the network. The information provided to us is a basic list of IP addresses.
We can take this a step farther by utilizing more advanced scans. A SYN scan will attempt to communicate with any host on the network that has an open TCP port. This command can be used to initiate a SYN scan:
nmap –sS 172.16.16.0/24
This command will send a TCP SYN packet to the top 1000 most commonly used ports of every host on the 172.16.16.0/24 network. The output is shown in Figure 14.3.
FIGURE 14.3 SYN Scan Output from Nmap
This SYN scan gives us a bit more information. So now, in addition to IP addresses of live hosts on the network, we also have a listing of open ports on these devices, which can indicate the services they provide.
We can extend this even farther by using the version detection and operating system fingerprinting features of nmap:
nmap –sV -O 172.16.16.0/24
The command will perform a standard SYN port scan, followed by tests that will attempt to assess the services listening on open ports, and a variety of tests that will attempt to guess the operating system architecture of the device. This output is shown in Figure 14.4.
FIGURE 14.4 Version and Operating System Detection Scan Output
This type of scan will generate quite a bit of additional traffic on the network, but it will help round out the asset model by providing the operating system architecture and helping clarify the services running on open ports.
The data shown in the screenshots above is very easily readable when it is output by Nmap in its default format, however, it isn’t the easiest the search through. We can fix this by forcing Nmap to output its results in a single line format. This format is easily searchable with the grep tool, and very practical for analysts to reference. To force nmap to output its results in this format, simply add –oG < filename > at the end of any of the commands shown above. In figure 14.5, we use the grep command to search for data associated with a specific IP address (172.16.16.10) in a file that is generated using this format (data.scan).
FIGURE 14.5 Greppable Nmap Output
You should keep in mind that using a scanner like nmap isn’t always the most conclusive way to build friendly intelligence. Most organizations schedule noisy scans like these in the evening, and this creates a scenario where devices might be missed in the scan because they are turned off. This also doesn’t account for mobile devices that are only periodically connected to the network, like laptops that employees take home at night, or laptops belonging to traveling staff. Because of this, intelligence built from network scan data should combine the results of multiple scans taking at different time periods. You may also need to use multiple scan types to ensure that all devices are detected. Generating an asset model with scan data is much more difficult than firing off a single scan and storing the results. It requires a concerted effort and may take quite a bit of finessing in order to get the results you are looking for on a consistent basis.
No matter how reliable your scan data may seem, it should be combined with another data source that can be used to validate the results. This can be something that is already generated on your network, like DNS transaction logs, or something that is part of your NSM data set, like session data. Chapter 4 and 11 describe some useful techniques for generating friendly host data with session data using SiLK. Another option is to use a passive tool, like PRADS, which we will talk about next.
Passive Real-time Asset Detection System (PRADS)
PRADS is a tool that is designed to listen to network traffic and gather data about hosts and services that can be used to map your network. It is based upon two other very successful tools, PADS, the Passive Asset Detection System, and P0f, the passive OS fingerprinting tool. PRADS combines the functionality of these tools into a single service that is effective for building friendly intelligence. It does this by generating data that can be loosely compared to session data that might be used by SiLK or Argus.
PRADS is included in Security Onion by default, so we can examine this data by creating a query in Sguil. We will talk more about Sguil in the next chapter, but if you remember our brief mention of Sguil in Chapter 9, then you know that it is an analyst console that can be used for viewing alerts from detection mechanisms and data from other NSM collection and detection tools.
You can access Sguil by launching the Sguil client from the Security Onion desktop, or by launching the client from another device and connecting remotely. Once there, you can sort the visible alerts by the Event Message column to find PRADS entries. You may notice that Sguil still references PADS for these events, but don’t worry, this is certainly PRADS data. Figure 14.6 shows sample PRADS log entries.
FIGURE 14.6 PRADS Data in Sguil
There are a couple of different types of entries shown in this image. New Asset alerts are generated when a host that hasn’t been seen communicating on the network before is observed. Changed Asset alerts are generated when a host that has been seen before exhibits a communication behavior that hasn’t been observed, such as a new HTTP user agent, or a new service.
To better understand how these determinations are made, let’s look at an example of PRADS log data. In a default Security Onion installation, PRADS runs with a command similar to this one:
prads -i eth1 -c /etc/nsm/< sensor-name >/prads.conf -u sguil -g sguil -L /nsm/sensor_data/< sensor-name >/sancp/ -f /nsm/sensor_data/< sensor-name >/pads.fifo -b ip or (vlan and ip)
This arguments shown here, along with a few other useful PRADS command-line arguments are:
-b < filter >: Listen to network traffic based upon BPFs.
-c < config file >: The PRADS configuration file.
-D: Run as a daemon.
-f < file >: Logs assets to a FIFO (first in, first out) file.
-g < group >: The group that PRADS will run as.
-i < interface >: The interface to listen on. PRADS will default to the lowest numbered interface if this is not specified.
-L < directory >: Logs cxtracker type output to the specified directory.
-l < file >: Logs assets to a flat file.
-r < file >: Read from a PCAP file instead of listening on the wire.
-u < username >: The user that PRADS will run as.
-v: Increase the verbosity of PRADS output.
In the case of SO, PRADS runs as the Sguil user and listens for data on the wire. Collected data is stored in a FIFO file so that it can be sucked into a database that Sguil can access.
Since most of the runtime options for PRADS in SO are configured with command-line arguments, the only real purpose that prads.conf serves is to identify the home_nets IP range variable (Figure 14.7). This variable tells PRADS which networks it should consider assets that it should monitor. In most situations you will configure this similarly to the $HOME_NET variable used by Snort or Suricata, since it is used in a similar manner.
FIGURE 14.7 Configuring the home_nets Variable in prads.conf
PRADS data stored in a database format is really convenient for querying asset data or writing tools that leverage this data, but it isn’t the greatest for viewing it in its raw form. Fortunately, asset data is also stored as a flat text file at /var/log/prads-assets.log. A sample of this file is shown in Figure 14.8.
FIGURE 14.8 The PRADS Log File
The first line of this file defines the format for log entries. This is:
asset,vlan,port,proto,service,[service-info],distance,discovered
These fields break down as such:
Asset: The IP address of asset in the home_nets variable that is detected
VLAN: The VLAN tag of the asset
Port: The port number of the detected service
Proto: The protocol number of the detected service
Service: The service PRADS has identified as being in use. This can involve the asset interacting the service as a CLIENT or a SERVER.
Service Info: The fingerprint that matches the identifying service, along with its output.
Distance: The distance to the asset based upon a guessed initial time-to-live value
Discovered: The Unix timestamp when the data was collected
Based upon this log data, you can see that PRADS itself doesn’t actually make the determination we saw earlier in Sguil of whether or not an asset is new or changed. PRADS simply logs the data it observes and leaves any additional processing to the user or other third party scripts or applications. This means that the New and Changed Asset alerts we were seeing in Sguil are actually generated by Sguil itself based on PRADS data, and not by PRADS itself.
Making PRADS Data ActionableThere are a couple of ways that we can use PRADS for friendly intelligence. The first method is to actually use Sguil and its notification of New and Changed assets. As an example, consider Figure 14.9.
FIGURE 14.9 Sguil Query for a Single Host
In the figure above, I’ve made a Sguil query for all of the events related to a single alert. This can be done pretty easily in Sguil by right-clicking an event associated with a host, hovering over Quick Query, then Query Event Table, and selecting the SrcIP or DstIP option depending on which IP address you want events for. Here, we see a number of events associated with the host at 172.16.16.145. This includes some Snort alerts, visited URLs, and more PRADS alerts.
Of the PRADS alerts shown, there are 4 New Asset Alerts that showsthe first time this host has ever connected to each of the individual destination IP addresses listed in the alert:
Alert ID 4.66: HTTP Connection to 23.62.111.152
Alert ID 4.67: HTTPS Connection to 17.149.32.33
Alert ID 4.68: HTTPS Connection to 17.149.34.62
Alert ID 4.69: NTP Connection to 17.151.16.38
When investigating this event, this provides useful context that can help you immediately determine whether a friendly device has ever connected to a specific remote device. In a case where you are seeing suspicious traffic going to an unknown address, the fact that the friendly device has never communicated with this address before might be an indicator that something suspicious is going on, and more investigation is required.
The figure also shows 1 Change Asset Alert showing the use of a new HTTP client user agent string.
Alert ID 4.71: Mozilla/4.0 (compatible; UPnP/1.0; Windows NT/5.1)
This type of context demonstrates that a friendly host is doing something that it has never done before. While this can mean something as simple as a user downloading a new browser, this can also be an indicator of malicious activity. You should take extra notice of devices that begin offering new services, especially when those devices are user workstations that shouldn’t be acting as servers.
At this point, we have the ability to discern any new behavior or change in behavior for a friendly host, which is an incredibly powerful form of friendly intelligence. While it may take some time for PRADS to “learn” your network when you first configure it, eventually, it can provide a wealth of information that would otherwise require a fair bit of session data analysis to accomplish.
Another way to make PRADS data actionable is to use it to define a baseline asset model. Since PRADS stores all of the asset information it collects for assets defined in the home_nets variable, this data can be parsed to show all of the data it has gathered on a per host basis. This is accomplished by using the prads-asset-report script, which is a Perl script that is included with PRADS. This script will take the output from a PRADS asset log file, and output a listing of all of the information it knows about each IP address. If you are using PRADS to log data to /var/log/prads-asset.log, then you can simply run the command prads-asset-report to generate this data. Otherwise, you can specify the location of PRADS asset data by using the –r < file > argument. A sample of this data is shown in Figure 14.10.
FIGURE 14.10 PRADS Asset Report Data
Notice in this output that PRADS also makes its best guess at the operating system architecture of each device. In the figure above, it can only identify a single device. PRADS is able to guess more accurately the more it can observe devices communicating on the network.
In some cases it might make the most sense to generate this report regularly and provide it in a format where analysts can access and search it easily. You can save the file that this script generates by adding the –w < filename > argument. In other cases, analysts might have direct access to the PRADS log data, which means they can use the prads-asset-report script itself to generate near real-time data. This can be done on the basis of an individual IP address, using the –i switch like this:
prads-asset-data –i 172.16.16.145
The output of this command is shown in Figure 14.11.
FIGURE 14.11 Searching for Individual IP Addresses in PRADS Asset Data
When generating an asset model from PRADS, remember it is a passive tool that can only report on devices it sees communicate across a sensor boundary. This means that devices that only communicate within a particular network segment and never talk upstream through a link that a sensor is monitoring will never be observed by PRADS. Because of this, you should pair PRADS with another technique like active scanning to ensure that you are accurately defining network assets.
PRADS is an incredibly powerful but eloquently simple tool that can be used to build friendly intelligence. Because of its minimal requirements and flexibility, it can find its way into most SOC environments. You can read more about PRADS at http://gamelinux.github.io/prads/.
Generating Threat Intelligence
Once you know your network, you are prepared to begin to know your adversary. With this in mind, we begin to dive into threat intelligence. If you work in information security then you are no stranger to this term. With the prevalence of targeted attacks occurring daily, most every vendor claims to offer a solution that will allow you to “generate threat intelligence to stop the APT.” While this is typically a bunch of vendor sales garbage gone awry, the generation of threat intelligence is a critical component of analysis in NSM, and pivotal for the success of a SOC.
Threat intelligence is a subset of intelligence as we defined it earlier in this chapter. This subset focuses exclusively on the hostile component of that definition, and seeks to gather data to support the creation of an intelligence product that can be used to make determinations about the nature of the threat. This type of intelligence can be broken down into three sub categories: strategic, operational, and tactical threat intelligence (Figure 14.12).
FIGURE 14.12 Types of Threat Intelligence
Strategic Intelligence is information related to the strategy, policy, and plans of an attacker at a high level. Typically, intelligence collection and analysis at this level only occurs by government or military organizations in response to threats from other governments or militaries. With that said, larger organizations are now developing these capabilities, and some of these organizations now sell strategic intelligence as a service. This is focused on the long-term goals of the force supporting the individual attacker or unit. Artifacts of this type of intelligence can include policy documents, war doctrine, position statements, and government, military, or group objectives.
Operational Intelligence is information related to how an attacker or group of attackers plans and supports the operations that support strategic objectives. This is different from strategic intelligence because it focuses on narrower goals, often more timed for short-term objectives that are only a part of the big picture. While this is, once again, usually more within the purview of government or military organizations, it is common that individual organizations will fall victim to attackers who are performing actions aimed at satisfying operational goals. Because of this, some public organizations will have visibility into these attacks, with an ability to generate operational intelligence. Artifacts of this type of intelligence are similar, but often more focused versions of artifacts used for the creation of strategic intelligence.
Tactical Intelligence refers to the information regarding specific actions taken in conducting operations at the mission or task level. This is where we dive into the tools, tactics, and procedures used by an attacker, and where 99% of SOCs performing NSM will focus their efforts. It is here that the individual actions of an attacker or group of attackers are analyzed and collected. This often includes artifacts such as indicators of compromise (IP addresses, file names, text strings) or listings of attacker specific tools. This intelligence is the most transient, and becomes outdated quickly.
From the Trenches
The discussion of threat intelligence often leads to a discussion of attribution, where the actions of an adversary are actually tied back to a physical person or group. It is important to realize that detection and attribution aren’t the same thing, and because of this, detection indicators and attribution indicators aren’t the same thing. Detection involves discovering incidents, where as attribution involves tying those incidents back to an actual person or group. While attribution is most certainly a positive thing, it cannot be done successfully without the correlation of strategic, operational, and tactical intelligence data. Generally speaking, this type of intelligence collection and analysis capability is not present within most private sector organizations without an incredibly large amount of visibility or data sharing from other organizations. The collection of indicators of compromise from multiple network attacks to generate tactical intelligence is an achievable goal. However, collecting and analyzing data from other traditional sources such as human intelligence (HUMINT), signals intelligence (SIGINT), and geospatial intelligence (GEOINT) isn’t within the practical capability of most businesses. Furthermore, even organizations that might have this practical capability are often limited in their actions by law.
When analyzing tactical intelligence, the threat will typically begin as an IP address that shows up in an IDS alert or some other detection mechanism. Other times, it may manifest as a suspicious file downloaded by a client. Tactical threat intelligence is generated by researching this data and tying it together in an investigation. The remainder of this chapter is devoted to providing strategies for generating tactical threat intelligence about adversarial items that typically manifest in an NSM environment.
Researching Hostile Hosts
When an alert is generated for suspicious communication between a friendly host and a potentially hostile host, one of the steps an analyst should take is to generate tactical threat intelligence related to the potentially hostile host. After all, the most the IDS alert will typically provide you with is the host’s IP address and a sample of the communication that tripped the alert. In this section we will look at information that can be gained from having only the host’s IP address or a domain name.
Internal Data SourcesThe quickest way to obtain information about external and potentially hostile hosts is to examine the internal data sources you already have available. If you are concerned about a potentially hostile host, this is likely because it has already communicated with one of your hosts. If that is the case, then you should have collected some of this data. The questions you want to answer with this data are:
1. Has the hostile host ever communicated with this friendly host before?
2. What is the nature of this host’s communication with the friendly host?
3. Has the hostile host ever communicated with other friendly hosts on the network?
The answers to these questions can lie within different data sources.
Question 1 can be answered easily if you have the appropriate friendly intelligence available, such as the PRADS data we examined earlier. With this in place, you should be able to determine if this is the first time these hosts began communicating, or if it occurred at an earlier time. You might even be able to determine the operating system architecture of the host. If this data isn’t available, then session data is probably the quickest way to get this answer.
Question 2 is something that can only be answered by a data source with a higher level of granularity. While session data can tell you some basics of when the communication occurred and the ports that are in use, it doesn’t provide the depth necessary to accurately describe exactly what is occurring. In some cases, the detection tool that generated the initial alert will provide this detail. Snort and Suricata will typically provide the offending packet that tripped one of their signatures, and tools like Bro will provide as much additional data as you’ve configured it to. In other scenarios, you may need to look to FPC data or PSTR data to find answers. In these cases, packet analysis skills will come in handy.
Answering Question 3 will typically begin with session data, as it is the quickest way to get information pertaining to communication records between hosts. With that said, if you find that communication has occurred between the hostile host and other friendly devices then you will probably want to turn to another data source like FPC or PSTR data to determine the exact nature of the communication. If this data isn’t available, then PRADS data is another way to arrive at an answer.
The internal analysis performed at this level is all about connecting the dots and looking for patterns. At a high level, these patterns might include a hostile host communicating with devices using a specific service, at specific time intervals, or in conjunction with other real world or technical events. At a more granular level, you might find patterns that indicate the hostile host is using a custom C2 protocol, or that the communication is responsible for several clients downloading suspicious files from other hosts.
The combined answers to these three questions will help you build threat intelligence surrounding the behaviors of the hostile host on your network. Often, analyzing the behavior of the hostile host in relation to a single event or communication sequence won’t provide the evidence necessary to further an investigation, but that same analysis applied to communication across the network could be the key to determining whether an incident has occurred.
Open Source IntelligenceOnce you’ve looked inward, it is time to examine other available intelligence sources. Open source intelligence (OSINT) is a classification given to intelligence that is collected from publicly available resources. In NSM, this typically refers to intelligence gathered from open websites. The key distinction with OSINT is that it allows you to gather information about a hostile entity without ever directly sending packets to them.
Now we will look at a few websites that can be used to perform OSINT research related to IP addresses, domain names, and malicious files. This is a broad topic with a variety of different approaches, and the topic of OSINT research could easily have its own book. If you’d like a much more detailed list of websites that can be used to perform OSINT research, then check out http://www.appliednsm.com/osint-resources.
IP and Domain RegistrationThe International Assigned Numbers Authority (IANA) is a department of the Internet Corporation for Assigned Names and Numbers (ICANN) that is responsible for overseeing the allocation of IP addresses, autonomous system number (ASN) allocation, DNS root zone management, and more. IANA delegates the allocation of addresses based upon region, to 5 individual Regional Internet Registries (RIRs). These organizations are responsible for maintaining records that associate each IP address with its registered owner. They are listed in Table 14.1.
Table 14.1
Regional Internet Registries
Each of these registries allows you to query them for the registration records associated with an IP address. Figure 14.13 shows the results from querying the ARIN database for the registration records associated with an IP address in the 50.128.0.0/9 range. This was done from http://whois.arin.net/ui/advanced.jsp.
FIGURE 14.13 Querying the ARIN RIR
In this case, we can see that this block of IP addresses is allocated to Comcast. We can also click on links that will provide contact information for representatives at this organization, including abuse, technical, and administrative Points of Contact (POCs). This is useful when you detect a hostile device in IP space that is owned by a reputable company attempting to break into your network. In a lot of cases this will indicate that the hostile device has been compromised by another adversary and is being used as a hop point for launching an attack. When this occurs, it’s a common practice to notify the abuse contact for the organization that the attack appears to be coming from.
From the Trenches
Notifying other organizations that one of their hosts might be compromised can be a bit of a struggle sometimes. In some cases, the organization won’t believe you, and in some more extreme scenarios, the organization might even accuse you of taking some type of adversarial action against them. Because this is a delicate process, there is proper etiquette involved in notifying someone else that their network might be compromised. This article written by Tom Liston for the SANS Internet Storm Center provides a good overview of some lessons learned from this process: http://www.dshield.org/diary.html?storyid=9325.
In a lot of cases, you will find that an IP address is registered to an ISP. In that case, you may have luck contacting the ISP if someone on their IP address space is attempting to attack your network, but in most cases I’ve experienced, this isn’t usually very fruitful. This is especially true when dealing with ISP’s outside of the jurisdiction of the US.
Because IP addresses are divided amongst the 5 RIR’s, you won’t necessarily know which one is responsible for a specific IP until you search for it. Fortunately, if you search for an IP address at an RIR’s website and the RIR isn’t responsible for that IP address, it will point you towards the correct RIR so that you can complete your search there. Another solution is to use a service that will make this determination for you, like Robtex, which we will look at in a moment.
Another useful piece of information that the registry record gives us is the Autonomous System Number (ASN) associated with the IP address. An ASN is a number used to identify a single network or group of networks controlled by a common entity. These are commonly assigned to ISPs, large corporations, or universities. While two IP address might be registered to two different entities, their sharing the same ASN might allow you to conclude that there is some relationship between the two addresses, though this is something to be evaluated on a case-by-case basis. You can search for ASN information specifically from each registry.
Just like with IP addresses, researching domain names usually begins with finding the registered owner of the domain. However, it is important to remember to distinguish between an actual physical host and a domain name. IP space is finite and exists with certain limitations. In general, if you see an IP address in your logs then you can usually assume that the data you have collected in relation to that host actually did come from that IP address (at least, for session-oriented communication). You can also have a reasonable amount of faith that the IP address does exist under the ownership of the entity that registered it, even though that machine might have been compromised and be controlled by someone else.
A domain name serves as a pointer to a location. When you see a domain name in your logs, usually because one of your friendly hosts is accessing that domain in some way, the truth is that the domain can be configured to point to any address at any given time. This means that the domain name you are researching from yesterday’s logs might point to a different IP address now. For that matter, the domain you research now may not point to anything later. It is common for attackers to compromise a host and then reassign domain names to the IP addresses of those hosts to serve malware or act in another malicious capacity. When the owner of that host discovers that it has been compromised and eradicates the attacker’s presence, the attacker will reassign the domain name to another compromised IP address. Even further to this point, malware now has the ability to use domain name generation algorithms to randomly register domains that can be used for command and control. Because of all this, a domain isn’t compromised; the IP address the domain points to is compromised. However, a domain name can be used for malicious purposes. This should be considered when researching potentially malicious domain names.
With that said, domain name registration is managed by ICANN, who delegates this authority to domain name registries. Whenever someone registers a domain name, they are required to provide contact information that is associated with this domain. Unfortunately, this process usually involves very little verification, so there is nothing to