Favicon Hashing Automates Host Reconnaissance for Penetration Testers

Penetration testers often rely on various techniques to discover all potential targets within a defined scope. While traditional methods like DNS mining are effective, a clever approach leveraging the ubiquitous favicon.ico file can uncover additional hosts that might otherwise be missed. This technique involves hashing the favicon.ico from a known target domain and then using that hash to query Shodan, a search engine for Internet-connected devices, to find other hosts that share the same icon.

The core idea behind this method is that many organizations standardize their favicon.ico across multiple subdomains and servers. By identifying a common favicon hash, security professionals can identify a broader attack surface. The process begins with fetching the favicon.ico file from a target website. A simple command-line script, demonstrated using curl and Python's mmh3 library, can calculate the MurmurHash3 of the favicon's content. For example, curl -sL https://www.canada.ca/favicon.ico | python -c "import sys, base64, mmh3; print(mmh3.hash(base64.encodebytes(sys.stdin.buffer.read())))" efficiently retrieves the icon and computes its hash.

Once the hash is obtained, the next step is to leverage Shodan to find other hosts exhibiting the same favicon hash. In the Shodan web interface, this query would look like http.favicon.hash:<hash_value>. However, to automate this process and extract specific information, using Shodan's API is more practical. A script using curl can query the API with the obtained hash, like curl -s -k "https://api.shodan.io/shodan/host/search?key=%APIKEY_SHODAN%&query={http.favicon.hash:<hash_value>}", where %APIKEY_SHODAN% is replaced with a valid Shodan API key.

The raw output from the Shodan API can be extensive, containing a wealth of information for each matching IP address. The primary goal for reconnaissance is often a list of hostnames. To extract these, command-line JSON processors like jq are invaluable. A jq query such as jq -r ".. | arrays[].hostnames?" can recursively search the JSON output for hostnames arrays and extract their values. This allows for the isolation of hostnames from potentially hundreds or thousands of results.

Further refinement is necessary to create a clean, usable list. This involves removing extraneous characters like brackets, commas, and quotes, as well as filtering out any null entries that might appear. Commands like grep -v null | tr -d " ,\" []" can achieve this. Combining these steps with sort and uniq provides a de-duplicated list of hostnames. The author notes that while this process can yield some false positives or variations (e.g., www.example.com vs. example.com), it significantly expands the discovered host inventory.

With a refined list of hostnames, penetration testers can then employ standard tools like nmap for port scanning and vulnerability assessment. For instance, nmap -sT -p443 --open --resolve-all -iL <hostname_list_file> can be used to scan for open ports on the discovered hosts. The process can also be extended to identify IP addresses for hosts that successfully resolve, enabling broader network scans using tools like masscan to identify all open TCP ports across the discovered infrastructure.

This favicon hashing technique, when automated, provides a powerful and efficient method for expanding the scope of penetration tests. It highlights how seemingly minor elements like website icons can be leveraged for significant reconnaissance gains, underscoring the importance of a comprehensive approach to host discovery in cybersecurity assessments. The author plans to detail further DNS-based recon methods in a subsequent post.