Professional OSINT - Germany & EU • blog.network-sec

This is the latest, 2024 Germany & EU version of Open Source Intelligence. All tried and tested.

On a regular basis we conduct OSINT research. It’s incredible, how far you can get, at the same time you need to wonder, how to use this power.

We suggest: Responsibly - at least.

Toolkits / Collections

Bellingcat’s Online Investigation Toolkit

Awesome OSINT

Awesome OSINT
Pretty much the best list we’ve seen, but it became dated lately.

Overview & Desktop Background Poster

Most used Tools

More Tools, Apps & Providers

TheHarvester
Recon-Ng
CTFR
Prepared Smartphone
Social Analyzer
GHunt - now has a Browser Plugin that does the manual cookie extraction process for you
Maltego
BelingCat
Start.me
Sublister
Shodan
Google Dorks
Google (Reverse) Image Search
Yandex (Reverse) Image Search
Tineye Reverse image search.
Cr3dOv3r
Reconspider
https://hunter.io/ - Find company email addresses of people
https://github.com/m8sec/CrossLinked - LinkedIn enum tool
https://github.com/ANG13T/SatIntel - OSINT tool for satellite reconnaissance
https://www.4kdownload.com/ - Windows software (payed) for Youtube Audio and Video download
https://devsnotdeus.medium.com/a-beginners-write-up-to-trace-labs-2020-osint-ctf-for-missing-persons-624077c3c9cb - TraceLabs OSINT CTF Writeup
https://www.offensiveosint.io/

People Data APIs

API	Free Tier?	Free Tier Details	Paid Plans
Clearbit API	Yes (Limited)	50 requests/month for free; people and company lookups	Starts at $99/month for more requests
Pipl API	No	Paid only, but offers a demo to test the API	Contact for pricing
FullContact API	Yes (Limited)	1000 matches/month for free	Pricing based on usage
People Data Labs	Yes (Limited)	1,000 API credits/month	Starts at $99/month
Apollo.io API	Yes (Limited)	Free tier with limited access to the database (up to 50 credits/month)	Paid plans start at $99/month
ZoomInfo API	No	Paid access only, but you can request a demo	Contact for pricing
Crunchbase API	Yes (Limited)	Free for non-commercial use, 50 calls per day	$29/month for higher limits
Hunter.io API	Yes (Limited)	25 searches/month for free	Starts at $49/month
UpLead API	Yes (Limited)	5 credits/month for free	Starts at $99/month
ScrapingBee API	Yes (Limited)	1,000 API calls per month for free	Starts at $29/month
Proxycurl	Yes (Limited)	About 10 free searches to LinkedIn	Starts at $49/month

APIs

https://rapidapi.com/

6 million scrapers, SMS, HLR, Instagram, LinkedIn, 5000 free request per month easily… you can thank us now.

VPN Region checker

For professional purposses, of course.

Our own tool - pretty practical.

By itself already an excellent OSINT tool, by the time of writing this paragraph (2024), probably the most valueable tool left in SocInt. It seems, people still falsely assume, Telegram would be anonymous by default. That’s not the case.

Telemetrio

https://telemetr.io

Telegram channels rating
Channel content search

Great Telegram OSINT tool.

Precautions

We would try to / assume by default:

You’re not the only researcher
Don’t share your details
Possibly use burner number, or don’t join certain channels, instead use screenshots

Most of this stuff is real - draw a clear line between OSINT (passive investigation) and HUMINT (active investigation, human interaction), or be prepared for precautions in the worst case. If you want to be perceived as harmless - do no harm. We like the Gorilla researcher in the jungle, as mental image. Observe, behold, respect the Prime Directive*. People will remember positive and negative behaviour of yours, the word might carry on for miles and centuries.

Techniques

Keyword Search: To start a research in Telegram, all you need is a keyword.
Lets use a media example (seen in a documentary), that’s rather entry level: #secretrave
Try all variants:
- hashtag
- @ for account names
- clear text search
Web search: t.me links are often indexed

We instantly find:

Public chat channels related to the topic
Links to accounts, other channels and external links (websites, WhatsApp groups, etc.)
Media: Photos, Videos, Audio messages, … No other social media plattform does this anymore (unless you’re really careless)

We can use the results to facilitate our investigation.

How to find initial keywords?

Once upon a time, an important person told me:
“These days, all you really need, is a link”.

He wasn’t refering to HTML links in general, but rather a keyword, a term, the magic word to open the cave. When you never heard, what a CTF is, an advert for TryHackMe or HTB might stare in your face for weeks, you’ll remain blind to it.

Once you acquired that tiny piece of information, the rest usually is a piece of cake - because other people, regular people, not Hackers, need to find the way in the first place.

It’s still the hardest part. Here’s a list of possibilities:

Documentaries: Often journalists did the legwork for you. If you got zero starting points and you aren’t the type of person that hits the street first, you could start there. Led us to two instant, positive findings on the example topic
Geo Detective: Documentaries and personal channels blur out what they think would be incriminating. With some practice you can identify most places from the background in a single screenshot, also usernames, popups of a Discord or Skype message on a screen recording, with a username of the friend of your target.
Bonus 1: User provided Street View images & Street View 360° for indoor id’ing, floor, desk, person… works of course only for public places
Bonus 2: Google Earth in VR - try it, become addicted!
Guess Work: Just make a list of terms around the topic, in the local language of your search area and one in english. Acquiring cultural background info can be as important. Try to pick up on slang terms that are recently being used
Street Work: Go out, ask around. Simply join a group for a while, walk with them, have a chat. Sometimes it’s enough to ask the next person on a bus stop, be prepared that they immediately throw the link in your face and don’t say anything else
Telephone: Dial a few numbers in the general area of interest, ask “who could know”, follow the pointers. A day on the phone can do increadible things and it’s the primary source for journalists
Ask as generic as possible: Inexperienced people may be puzzled how far you can get with a what’s up? or where…erm? (literally). If you don’t say “it”, people assume you’re talking about the secret

Telepathy

https://github.com/proseltd/Telepathy-Community/
https://telepathydb.com/
https://www.prose.ltd/

Basic Telegram Enumeration tool. Pro / Company version currently in Beta, 14-Day free trial via Email registration.

bash

    ______     __                 __  __
    /_  __/__  / /__  ____  ____ _/ /_/ /_  __  __
    / / / _ \/ / _ \/ __ \/ __ `/ __/ __ \/ / / /
    / / /  __/ /  __/ /_/ / /_/ / /_/ / / / /_/ /
  /_/  \___/_/\___/ .___/\__,_/\__/_/ /_/\__, /
                  /_/                    /____/
  -- An OSINT toolkit for investigating Telegram chats.
  -- Developed by @jordanwildon | Version 2.3.2.

Basic Channel Enum

bash

$ telepathy -t channelname

Grab all Channel Messages

bash

$ telepathy -t channelname -c

 [!] Performing comprehensive scan
 [-] Fetching details for channelname...
  ┬  Chat details
  ├  Title: channelname
  ├  Description: "The channel was created for cybersecurity
                  specialists.• Offensive Security• RedTeam• Malware
                  Research• BugBounty• OSINT•
  ├  Translated Description: N/A
  ├  Total participants: 20049
  ├  Username: channelname
  ├  URL: http://t.me/channelname
  ├  Chat type: Channel
  ├  Chat id: 1w435634551
  ├  Access hash: -34645856834524
  ├  Scam: False
  ├  First post date: 2020-12-29T12:04:13+00:00
  └  Restrictions: None
 [!] Calculating number of messages...
 [-] Fetching message archive...
 [-] Progress:  |██████              | ▅▃▁ 2711/8999 [30%] in 22:58 (2.0/s, eta: 53:15)

Search for User Details

The params can be a bit confusing.

bash

$ telepathy -u -t @username

TeleRecon

More username focussed - sadly you need to have already a good chunk of information, which groups a user visits, to make full use of this tool.

bash

 __________________________________________________________________
   _______ ______ _      ______ _____  ______ _____ ____  _   _
  |__   __|  ____| |    |  ____|  __ \|  ____/ ____/ __ \| \ | |
     | |  | |__  | |    | |__  | |__) | |__ | |   | |  | |  \| |
     | |  |  __| | |    |  __| |  _  /|  __|| |   | |  | | . ` |
     | |  | |____| |____| |____| | \ \| |___| |___| |__| | |\  |
     |_|  |______|______|______|_|  \_\______\_____\____/|_| \_| v2.1
___________________________________________________________________


Welcome to Telerecon, a scraper and reconnaissance framework for Telegram

Please select an option:

1. Get user information
2. Check user activity across a list of channels
3. Collect user messages from a target channel
4. Collect user messages from a list of target channels
5. Scrape all messages within a channel
6. Scrape all t.me URL’s from within a channel
7. Scrape forwarding relationships into target channel
8. Scrape forwarding relationships into a list of target channel
9. Identify possible user associates via interaction network map
10. Parse user messages to extract selectors/intel
11. Extract GPS data from collected user media
12. Create visualization report from collected user messages
13. Extract named entities from collected user messages
14. Conduct a subscriber census across a list of target channels
15. Parse user messages to extract ideological indicators
16. Parse user messages to extract indicators of capability and violent intent

More Telegram Tools

https://github.com/cqcore/Telegram-OSINT - List of Telegram OSINT TTP in detail
https://github.com/cipher387/osint_stuff_tool_collection#telegram - List of TTP and search engines
https://github.com/thomasjjj/Telegram-Snowball-Sampling
https://www.telegramdb.org/search / @tgdb_bot
https://xtea.io/ts_en.html
https://tdirectory.me/ - Good browser-based channel and group search
https://tgram.io/ - Supports post content search
https://tgstat.com/channels/search
https://github.com/sockysec/Telerecon - Impressive cli OSINT tool
https://lyzem.com/
https://cse.google.com/cse?cx=006368593537057042503%3Aefxu7xprihg#gsc.tab=0
https://github.com/tdlib/telegram-bot-api
https://github.com/estebanpdl/telegram-tracker

Lots of the search engine bots are either in Russian Language, limit search results to a hand-full, or require payment, sadly. Python tools work well, if you manage to grab an API Key - we had success with this combo:

IP address matching phone number country code (with or without VPN)
Blank Firefox (Incognito Window, no add-ons)
Some patience and luck

If you get the non-descriptive ERROR trying to register a new app - it seems mostly those factors, plus a bit of luck / timing. Keep trying.

GeoMint

https://github.com/Alb-310/Geogramint

You need to set a profile picture and disable profile pic privacy, to make it work.
Then you can search via map or command line.

Geomint Telegram OSINT Geolocation

Google Dorks

Exploit-DB Google Hacking by OffSec
Issue with Google however is that it has grown less usable each year, throwing pages of “top results” at you, usually SEO overoptimized articles, while the niche stuff that used to sit at page 984 no longer exists.

Google Dork Generator

https://pentest-tools.com/information-gathering/google-hacking

Amazing, especially for less technical researchers.

How to find names in documents

"[name]" filetype:[filetype] OR filetype:[filetype] OR filetype:[filetype]

How to find comments from username

site:[socialmedia site] intext:[username]

Tip: Use other search engines

Youtube being one of them. Yahoo, Bing, Yandex, will give you a lot more than just Google. However, you need to Google, too, especially in CTFs the creators rely heavily on what Google puts on page 1 for a specific search term (like a blog article about a vulnerability).

CTFR

Finds subdomains based on SSL-Certificates. Supposed to be very silent, doesn’t touch target network

bash

$ python3 ctfr.py -d network-sec.de -o network-sec_subdomains.txt

Create Account images

https://imagine.meta.com (Needs USA IP, try New York)
https://this-person-does-not-exist.com/
https://www.playform.io/facemix
https://futuresight.org/webprojects/peoplemixer

CrossLinked

https://github.com/m8sec/CrossLinked
Basically a username / email / AD login generator. I’m not sure what it can do beyond that.

It worked reasonably well for one real test case, which we can’t show due to data protection.

bash

$ python3 crosslinked.py test_company -f '{first}.{last}@test_company.com'
$ cat names.txt
Max.Mustermann@test_company.com
Mille.Mustard@test_company.com
[...]

Without needing an API key or linkedIn account login, we managed to grab ~300 names. The company should have around 10k employees, though. We were able to verify, that at least a handful of the names were correct.

To conclude, the tool is awesome for a usual Pentesting & Redteaming gig, when time is a factor and you’re satisfied with a few hundred names to try bruteforce something. However, if you aim for precission, search for i.e. technical administrators, or need a tool that can do more, like enumerating relationships, showing the connections behind the curtain, this is not it.

Instagram Downloader

https://fastdl.app/

Download Instagram

Custom code approach.

This is not 100% guaranteed to work, if you’re not a tech guy it may be easier for you to just go on a click marathon.

Use F12 to open the console, switch to the actual console tab (second from the left in Firefox), make sure everything is off, except for Requests (Anfragen). Keep the console open while you browse the gallery, slowly scroll down and watch the GET requests with links appear in the console. Maybe scroll up again. In the end, just select all and hit Ctrl+c copy, paste them into a file.

You need to be fast between link collection and download attemt. Instagram may still block you, 403 - especially for high-res links. Keeping the url trash after the ”?” will increase chances for succesfull download but you won’t get the original filename and a lower resolution. Pic backgrounds for videos won’t download.

If you got an OSINT instagram account, you can add your cookie to wget —header=“Cookie: …” - that should work in all cases. Burp Suite is another safe bet.

bash

# This will clean the links, select the highest resolution and keep only unique items
$ cat pic_unclean.txt | grep -oPe "https:[^\" \?]*" | sort -u > pic_urls.txt

# This will download the previously created links
$ for line in $(cat pic_urls.txt); do wget $line; done

# If your download is a mess (strange filenames, no jpg extension) try this instead
$ i=0; for line in $(cat pic_urls.txt); do wget -O "$i.jpg" $line; i=$(($i + 1)); done

# Another attempt, if you get 403 with the previous attempt 
$ i=1001; for line in $(cat pic_unclean.txt | grep -oPe "https:[^\" ]*" | sort -u); do wget $line -O"$i.jpg"; i=$(($i+1));  done

SocInt Search Engine

https://www.flaru.com/

SatInt

https://systemweakness.com/satellite-osint-space-based-intelligence-in-cybersecurity-e87f9dca4d81

Haven’t tried yet, but we know for a fact that lots of satellites transmit unencrypted data. The only thing that stops you from, i.e. receiving images from a weather satellite, is the appropriate hardware. That can be impossible to get for a private person or small company, beyond a proof of concept where you receive a single image with a self-made antenna made from a Pringles box. On top, the value of such data is questionable, as the web is full of free APIs and high quality data.

But it’s a fun topic and you can easily build a correctly calculated antenna, connect it to your SDR and enjoy another miracle in our eternal universe. https://www.youtube.com/watch?v=yzLUsi8MsRQ

Prepared Smartphone

Youtube - Using Mobile Apps to Leverage OSINT Investigations

Phone Anti-Spam Apps - Reverse Phone Number Lookup

Hiya
Truecaller
RoboKiller
roboshield

Warning: Some of the following sites show NSFW ads.
There are more, some you can use by adding the number to the url, e.g.

https://www.numlookup.com/
https://spydialer.com/
https://www.emobiletracker.com/
https://getcontact.com/
https://www.nomorobo.com/lookup/1234-567-890
https://www.thisnumber.com/1234-567-890
https://www.fastpeoplesearch.com/1234-567-890
https://callapp.com/search
https://freecarrierlookup.com/
https://www.reversephonecheck.com/1-234/567/89/?number=01
https://sync.me/search/?number=123456789 (Requires Login)
https://www.ipqualityscore.com/free-phone-number-lookup (registration is worth it)
http://thefeedfoundation.org/srci/732/mid/680/p-8880 (great for automation, check how the number is put into the url)

Pro Tipp: Many of these apps have a Website with a search bar - for some you need the actual Android app.

Twilio API

Not the best in terms of Reverse Phone Lookup but can get some extra infos. Twilio is an interesting plattform anyways, so…

bash

$ curl -X GET 'https://lookups.twilio.com/v2/PhoneNumbers/{Number}' -u '<SID>:<AUTH_TOKEN>'

# More fields
$ curl -X GET 'https://lookups.twilio.com/v2/PhoneNumbers/{Number}?Fields=sim_swap%2Ccall_forwarding' -u '<SID>:<AUTH_TOKEN>'

# Carrier - Didn't work for our test number, while SpyDialer could identify the carrier
$ curl -X GET 'https://lookups.twilio.com/v2/PhoneNumbers/{Number}?Type=carrier' -u '<SID>:<AUTH_TOKEN>'

APILayer

Someone left this API key in his OSINT tool on GitHub, so whatever…

bash

$ curl http://apilayer.net/api/validate?access_key=fcafcb7ea4a426d0988c8427f3798db1&number=7326808880&country_code=us&format=1

beenverified

https://www.beenverified.com/

People, Vehicle, Property and Contact Info (mostly US, but has some international data)

Username Search

https://checkusernames.com/
https://github.com/thewhiteh4t/nexfil

DNS

DNSRecon

https://github.com/darkoperator/dnsrecon

bash

$ python3 dnsrecon.py -a -s -k -w -z -d $search_domain

Beste Tool.

Nameserver hopping

Here’s a method to find a domain, given you’d know another domain of the "potential threat actor" and are able to extract that one’s nameserver.

bash

$ for name in l{a..z}{a..z}{a..z};do res=$(dig +short A @..insert-real-nameserver..awsdns. "$name.in"); [[ ! -z "$res" ]] && echo "Found: $name.in" && break; done

Historic DNS

https://completedns.com/dns-history (no .de TLD)
https://securitytrails.com/dns-trails

DNS-Dumpster

https://dnsdumpster.com/

Simply amazing project from Hacker Target - see below for more tools from them.

Whois Extended

Since privacy rolled over WHOIS records, it became less and less of a reliable source.

Try instead:

WhoisFreaks

https://whoisfreaks.com
API with lots of free credits, registration required. Historical data.

bash

$ curl --location --request GET 'https://api.whoisfreaks.com/v1.0/whois?whois=historical&domainName=network-sec.de&apiKey=...'

Whois IP

Even though today we rarely get infos via whois, especially on .de domains, here’s one neat trick:

bash

$ dig network-sec.de
;; ANSWER SECTION:
network-sec.de.         3600    IN      A       217.160.188.86
$ whois -B -d -a 217.160.188.86
[...]

whois works also for IP addresses, although a domain may have privacy protection, often times the associated IP does not.

You can extend this by using historic DNS information from a DNS dumpster site, or find other IPs directly or indirectly connected to the domain you’re researching, like subdomains, other branches of the business, nameservers, traceroute info, mailservers, etc. A sidechannel attack. On top, the switches “-B -d -a” will return more infos.

We’ve published a whois-traceroute mutation we once created for fun, you can find it on our GitHub. This is probably the most interesting, when the target doesn’t use a classic vserver hosting but instead a VPC (virtual private cloud, like AWS).

Whois Privacy

To be clear, a dns privacy enabled hoster probably has enough experience to not fall into the IP trap.

For us, or rather our providers, the technique shown doesn’t return my personal infos, cause our providers did it right - but it truly does work pretty often and not many people are aware of this fact, when they secure and check their own domain or rely on the German Denic alone for their Domain privacy.

Traceroute

“Can we use traceroute to find an IP behind CloudFlare”?
Most likely not, as the trace usualy stops at the supplied end - which will be the domain name or CloudFlare IP, you supply to the traceroute tool. As you don’t know the next hop, you can’t even supply it to traceroute, no matter the method (ICMP, TCP, UDP).

Traceroute Tool	Example ICMP Usage	Example UDP Usage	Example TCP Usage	Method Information
traceroute	traceroute -I google.com	traceroute -U google.com (default UDP port is 53)	traceroute -T google.com	Uses ICMP by default, can use UDP with -U or TCP with -T option.
tcptraceroute	tcptraceroute google.com 80	N/A	N/A	Traces the route using TCP packets or UDP packets with -u option.
hping3	N/A	hping3 -2 -p 53 -c 3 google.com	N/A	Can be used for ICMP or UDP tracerouting with custom ports.
mtr	mtr google.com	N/A	N/A	Combines ping and traceroute functionalities.
pathping	pathping google.com	N/A	N/A	Provides more detailed path information on Windows.
lft	lft -n google.com	N/A	N/A	Traces using both ICMP and TCP, provides hop information.

IP Address OSINT

Connected Websites

bash

$ curl https://spyonweb.com/<IP or Domain>

General Info & Geolocation

https://www.ipaddress.com

bash

$ jwhois <ip>

Offline IP / Range Scanners (Windows)

https://www.dnsstuff.com/scan-network-for-device-ip-address#ip-address-manager

IP VPN & ASN Check

bash

$ curl https://vpnapi.io/api/<ip to check>?key=<your free api key>

CriminalIP

https://www.criminalip.io/

Built With

https://builtwith.com

Wapalyzer online alternative

Find Torrents of private IP

https://iknowwhatyoudownload.com/en/peer/

Google Dorks Generator

https://pentest-tools.com/information-gathering/google-hacking

Leak Lookup

https://leak-lookup.com/

Azure Tenant

https://tenantresolution.pingcastle.com

Search for Azure Tenant using its domain name or its ID

Hacker Target Toolkit

https://hackertarget.com/ip-tools/

Amazing kit that provides nmap scans and other standard tools, but originating from their server. On their membership page, they also use our famous server-room image. :D

More API Keys & Endpoints

Get these keys for free and add them to your toolkit!

YandexAPI
OpenAI
AWSCli
SecurityTrails
VPNAPI.io

SecurityTrails Default

bash

$ domain="network-sec.de"
$ curl "https://api.securitytrails.com/v1/domain/$domain" -H "APIKEY: ..."

SecurityTrails “DNS History”

bash

$ domain="network-sec.de"
$ curl "https://api.securitytrails.com/v1/history/$domain/dns/a" -H "APIKEY: ..."

SecurityTrails “Subdomains”

bash

$ curl "https://api.securitytrails.com/v1/domain/$domain/subdomains?children_only=false&include_inactive=true" -H "APIKEY: ..."

SecurityTrails “IPs Nearby”

Editor’s Pick!

bash

$ $ipaddr="8.8.8.8"
$ curl "https://api.securitytrails.com/v1/ips/nearby/$ipaddr" -H "APIKEY: ..."

More (some are premium): https://securitytrails.com/corp/api

VPNAPI.io

bash

$ curl "https://vpnapi.io/api/$ipaddr?key=...API KEY..."

The “is-VPN” determination isn’t very reliable. We usually check if it’s a VPN via Google, Bing and other tools.

HackerTarget

 request_urls = [
  "https://api.hackertarget.com/mtr/?q=",
  "https://api.hackertarget.com/nping/?q=",
  "https://api.hackertarget.com/dnslookup/?q=",
  "https://api.hackertarget.com/reversedns/?q=",
  "https://api.hackertarget.com/hostsearch/?q=",
  "https://api.hackertarget.com/findshareddns/?q=",
  "https://api.hackertarget.com/zonetransfer/?q=",
  "https://api.hackertarget.com/whois/?q=",
  "https://api.hackertarget.com/geoip/?q=",
  "https://api.hackertarget.com/reverseiplookup/?q=",
  "https://api.hackertarget.com/nmap/?q=",
  "https://api.hackertarget.com/subnetcalc/?q=",
  "https://api.hackertarget.com/httpheaders/?q=",
  "https://api.hackertarget.com/pagelinks/?q=",
 ]

Reverse IP

https://api.hackertarget.com/reverseiplookup/?q=88.99.111…

Provides quite more, also historic, entries.

More services

https://ipchaxun.com/88.99.111../
~~https://chapangzhan.com/88.99.111.0/24 - CIDR notation support~~ seems downn now
https://site.ip138.com/88.99.111../
https://rapiddns.io/s/88.99.111..#result

rapiddns also supports CIDR.

IP Footprinting

Enumeration a larger IP Range will give you very comprehensive insight into a company network, mostly passive and in almost no time. That is, if you have the right tools.

We built them for us, and for you. Using our 3 tools, we’re able to footprint a /16 CIDR range, or even larger ranges, in under 1 minute to get a 95% overview, in about 2h (script runtime) for a fully-detailed, 99% coverage.

Webhistory.info

https://webhistory.info

### Sorry, currently offline ###

Webhistory was one of our bigger projects, a frontend for Common Crawl, where we downloaded, parsed, indexed over 500 Million entries in almsot one year of work. Since we’re unemployed for almost 2 years now, we could no longer afford the electrical power bill for it’s server and only kept the frontend online, in hopes that we may be able to revive the project, once we got more money again. We left the info below for the same reason - but currently it’s unavailable.

Since we crossed the 500 Million entries mark, we now cannot only search for ages old server versions, but use the capabilities of the large Database for DNS History (DNS dumpster diving, albeit only A-records) and to resolve CIDR ranges.

How to search for CIDR ranges

We still lack a tutorial for this project of ours. Overall you should, due to the large database, always start with simple queries and few results (max-range-multiplier slider set to 1x) - then add switches like Individual, to only output one IP <-> Domain mapping and not multiple URL results for the same domain. For CIDR you need to turn on Regex and change the input type to IP, then enter only the part of the IP, like shown in the image.

As we got ten- to hundred-thousands of records, sometimes for a single domain, which are filtered down in a second step, you need to increase the max-range-multiplier to get results from other domains as well. This will happen in the background.

TL;DR: Do it like the image below is showing.

Webhistory.info CIDR Range example search

https://webhistory.info right now is the fastest way we know to footprint the DNS A-Record component (meaning: websites) of an entire IP Range, albeit the results may lack some details. For absolute precision, use instead our reverse_IP_OSINT.sh script (see below or on our GitHub) along with a bash-loop - however that will take a lot more time, probably 15 minutes for a /24 range, compared to 10 seconds on webhistory.

Note: For the following reasons

Project still alpha
Improvised, cheap infrastructure
Database still growing steadily, accumulating millions of new records each day

the project may experience short downtimes or malfunction. Usually operation is resumed in less than a day.

Webhistory is passive, except when searching for a single domain (non-regex) - in that case, also a realtime-crawler will make a single HEAD request to the target, output the entire HTTP Headers and DNS data, and add it to our database.

Reverse IP OSINT Script

https://github.com/Network-Sec/bin-tools-pub/blob/main/reverse_IP_OSINT.sh

We made a handy-dandy script that combines several services into a script, you can add to your bin-folder, then loop over IPs for fast Threat Intelligence.

Note: This script only accepts a single IP as input. You can utilize bash-loops or extend the script yourself. We also recommend to add a few more services, like the APIs listed above.

bash

$ for i in $( cat /tmp/smartphone.txt | grep -Po "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" | sort -u ); do reverse_IP_OSINT.sh $i; done

Example Output

reverse_IP_OSINT.sh output excerpt

Note: This script makes active requests, albeit to the DNS and 3rd party websites, never touching the target.

Subdomain Scanner

https://subdomainfinder.c99.nl/
https://urlscan.io/ip/-enter IP- - also provides screenshots and some URL snippets

IP Ranges and ASN

https://github.com/Network-Sec/bin-tools-pub/blob/main/IP_Range_Infos.py

Our command-line tool provides an easy way to query CIDR ranges and receive company, provider and location information.

Example: Resolving Germany-based ranges and receiving company infos (excerpt). IP_Range_Infos.py output excerpt

IP_Range_Infos.py is great for enumerating larger ranges. You could easily build a webservice upon this script.

Tip: Have a look into the CSV file - it will allow you to extract entire country ranges which you can save into a txt file and feed it into the script. We’re sure this, along with our webhistory site, is one of the fastest ways to footprint large ranges passively.

Download necessary databases and CSV for the script

CSV : https://datahub.io/core/geoip2-ipv4
MMDB: https://github.com/P3TERX/GeoLite.mmdb

You then need to adjust the path to the 4 files in the script.

Usage Examples - Footprinting

bash

# Single IP, -s(ummary) mode
$ IP_Range_Infos.py 192.168.177.1 -s

# Using GNU parallel to enum full range and search for company only in -a(sn)
$ parallel -a IP_Range_Germany.txt IP_Range_Infos.py -a -s | grep -i "Volkswagen\|Uniper" >> company_infos.txt

Example Output

IP_Range_Infos.py output

Command-line Options

bash

$ IP_Range_Infos.py -h
usage: IP_Range_Infos.py [-h] [--language LANGUAGE] [-s] [-l] [-c] [-a] [-r] ip

Query IP address or range against local MMDB databases and CSV data.

positional arguments:
  ip                   IP address or CIDR range to query

options:
  -h, --help           show this help message and exit
  --language LANGUAGE  Preferred language for names, default is English
  -s, --summarize      Summarize consecutive IPs with identical data
  -l, --location       Output only city table info (including lat and long location). When specifying one or more tables, only those will be searched. When ommiting any table, all will be searched.
  -c, --country        Output only country table info.
  -a, --asn            Output only asn table info. ASN is best when looking for companies
  -r, --ranges         Output only ranges (CSV) table info. Ranges will provide fastest results but only broad infos, like country

More Reverse IP Data

https://github.com/ipverse/rir-ip/tree/master

And other repos from ipverse you can feed into our scripts and gather information.

Google Analytics Reverse Lookup

https://hackertarget.com/reverse-analytics-search/
https://dnslytics.com/reverse-analytics

crt.sh

https://crt.sh

There are alternatives, but as many things today, the access to this information is getting more sparse.

Two interesting projects in that matter:
https://github.com/sergiogarciadev/ctmon
https://github.com/sergiogarciadev/libz509pq

The rate limit situation however is similar to Common Crawl:

bash

2023-09-29T21:35:53.289+0200  INFO  logclient/log.go:91   nessie2023   : 
Error   : (Too Many Requests: https://nessie2023.ct.digicert.com/log/ct/v1/get-entries?start=5376&end=5631)
[...]

This was a single request. We’re still waiting…

TLSX

https://github.com/projectdiscovery/tlsx

A tool that I now list in 3 categories, because it has so many different applications.

bash

$ echo 173.0.84.0/24 | tlsx -san -cn -silent

173.0.84.104:443 [uptycspay.paypal.com]
173.0.84.104:443 [api-3t.paypal.com]
173.0.84.104:443 [api-m.paypal.com]
173.0.84.104:443 [payflowpro.paypal.com]
173.0.84.104:443 [pointofsale-s.paypal.com]
173.0.84.104:443 [svcs.paypal.com]
173.0.84.104:443 [uptycsven.paypal.com]
173.0.84.104:443 [api-aa.paypal.com]
173.0.84.104:443 [pilot-payflowpro.paypal.com]
173.0.84.104:443 [pointofsale.paypal.com]
173.0.84.104:443 [uptycshon.paypal.com]
173.0.84.104:443 [api.paypal.com]
173.0.84.104:443 [adjvendor.paypal.com]
173.0.84.104:443 [zootapi.paypal.com]
173.0.84.104:443 [api-aa-3t.paypal.com]
173.0.84.104:443 [uptycsize.paypal.com]

Alternatives

https://developers.facebook.com/tools/ct/search/ - requires regular Facebook account
https://ui.ctsearch.entrust.com/ui/ctsearchui
https://sslmate.com/ct_search_api/
https://archive.org/
https://docs.securitytrails.com/docs - requires API key - untested
https://urlscan.io/docs/api/ - requires API key - not really worth registering
https://github.com/Findomain/Findomain - Freemium - untested
https://dnslytics.com/reverse-analytics - untested

More APIs

Certspotter

bash

$ search_domain="example.net"; 
$ curl "https://api.certspotter.com/v1/issuances?domain=$search_domain&expand=dns_names&expand=issuer&expand=revocation&expand=problem_reporting&expand=cert_der"

Results will be only a subset of all available data, sadly.

ThreatMiner

https://www.threatminer.org/api.php

bash

# You can do domain or IP
$ search_domain="example.net"; for i in {1..6}; do curl "https://api.threatminer.org/v2/domain.php?q=$search_domain&rt=$i"; echo ""; done

Results can be quite extensive, but only for well-known domains. Lesser known domains return nothing.

urlscan.io

https://urlscan.io/

osint.sh

https://osint.sh/#

Nice, similar to HackerTarget’s toolbox. The Reverse Analytics requires manual extraction of target ID though.

Shodan

Very limited options with free accounts.

bash

$ shodan init ...API KEY...
$ shodan info 

Query credits available: 0
Scan credits available: 0

Sublist3r

https://github.com/aboul3la/Sublist3r

TBH not really interesting anymore.

Anubis

https://github.com/jonluca/anubis

Nice and fast, but not much more than CT search.

bash

Testing for zone transfers
Searching Sublist3r
Searching HackerTarget
Searching for Subject Alt Names
Searching NetCraft.com
Searching crt.sh
Searching DNSDumpster
Searching Anubis-DB
Searching Shodan.io for additional information
Exception when searching sublist3r

Find IP behind CloudFlare (WordPress only)

WP Pingback <-> Real IP

Works well, also on modern WordPress installations, but only when PingBack is enanbled - it is by default, but e.g. Plesk will offer security options, that disable it. Most scanners also report xmlrpc enabled as potential vector.

reqbin.com

Open reqbin.com and webhook.site
Change address below to your webhook address, also add a WordPress article that exists
Add XML to reqbin and send POST to xmlrpc.php

Request

POST https://wordpress.network-sec.de/xmlrpc.php

<?xml version="1.0" encoding="iso-8859-1"?>
<methodCall>
<methodName>pingback.ping</methodName>
<params>
 <param>
  <value>
   <string>https://webhook.site/....(your-id)....</string>
  </value>
 </param>
 <param>
  <value>
   <string>https://wordpress.network-sec.de/any-valid-post-or-article/</string>
  </value>
 </param>
</params>
</methodCall>

Response (reqbin)

Check response output and headers.

webhook.site

Just open and paste site url in XML above.

Response

Requests  
GET 88.99.111.0000

Request Details
GET 	https://webhook.site/....(your-id)....
Host 	88.99.111.0000   Whois  Shodan  Netify  Censys
Size 	0 bytes
Files
Headers
connection 	close
x-pingback-forwarded-for 	66.77.88.999
user-agent 	WordPress; https://wordpress.network-sec.de; verifying pingback from 66.77.88.999
host webhook.site

No content

IP behind CloudFlare (non WordPress)

You basically need an outgoing request you can control, any type of SSRF, outgoing URL, etc. Hooking in Burp and carefully analysing the traffic may help, but in many cases it won’t.

Some articles say to look for the X-Forwarded-For and similar headers. Also a game of chance. Why a service would use CloudFlare but the forward it’s real IP… We don’t know. Mistakes are made.

Other than that, you can try historic data, like CommonCrawl, DNSDumpster, etc. Usually in the beginning of the service, the first weeks of appearance, you might get lucky and the service doesn’t yet use CloudFlare.

So, TL;DR:

SSRF or other, user-controlable, outgoing request from the server
X-Forwarded-For and other response headers
Historic data
Educated guessing the hoster & real server IP range
Subdomain enum
Forms that allow email response triggering, like Password Reset or confirmation for comment (if the same server sends email or the data is shown in the Email) - it’s good to have control over the receiving mail server to be able to see log data
In Email context: mailinator.com
Services like mxtoolbox, which offer to solve Email, issues may give insight
Upload features of the website, that allow URL input, not just file upload (also user account, profile, comment section, etc.)
Bruteforce IP range, if you happen to know the hoster - just add the correct Host: header and run through it.
Robots.txt and Sitemap.xml
APIs
Server Status pages, like 404s
Classic XXE - SOAP apps, any type of XML input: <!ENTITY xxe SYSTEM "http://yourip/">] and DTD

Example

Brutforcing a suspected server for a Host

Setup Link website

https://bio.link/

without paying for it.

Have You Been Pwnd?

https://haveibeenpwned.com/
https://www.dehashed.com/
https://pentester.com/
https://leak-lookup.com/

Look up, if your credentials are already in a leak - if so, change them immediately, make sure, you don’t re-use the same password on multiple accounts.

Recon-ng

https://github.com/lanmaster53/recon-ng

bash

$ python3 -m venv .   
$ source bin/activate
$ bin/pip install -r REQUIREMENTS
$ ./recon-ng 

[recon-ng][default] > marketplace install all
# All API plugins require keys and fail otherwise

urlquery.net

https://urlquery.net/

“urlquery.net is the ultimate URL whisperer, coaxing out the truth from URLs that think they can hide.”

May provide additional info and connections for stuff like:

IPs & domains
GitHub repos
Usernames

Modules

bash

[recon-ng][default] > modules search discovery
[*] Searching installed modules for 'discovery'...

  Discovery
  ---------
    discovery/info_disclosure/cache_snoop
    discovery/info_disclosure/interesting_files

[recon-ng][default] > modules load discovery/info_disclosure/interesting_files
[recon-ng][default][interesting_files] > options list

  Name      Current Value                                      Required  Description
  --------  -------------                                      --------  -----------
  CSV_FILE  .recon-ng/data/interesting_files.csv  yes   custom filename map
  DOWNLOAD  True                                               yes       download discovered files
  PORT      443                                                yes       request port
  PROTOCOL  http                                               yes       request protocol
  SOURCE    network-sec.de                                     yes       source of input (see 'info' for details)

Metacrawler and module dependencies

Some modules have dependencies and won’t work despite being installed.

bash

[recon-ng][default] > marketplace search meta
[*] Searching module index for 'meta'...

  +-------------------------------------------------------------------------------+
  |                Path                | Version |   Status  |  Updated   | D | K |
  +-------------------------------------------------------------------------------+
  | recon/domains-contacts/metacrawler | 1.1     | disabled  | 2019-06-24 | * |   |
  | recon/hosts-hosts/ssltools         | 1.0     | installed | 2019-06-24 |   |   |
  +-------------------------------------------------------------------------------+

  D = Has dependencies. See info for details.
  K = Requires keys. See info for details.

[recon-ng][default] > marketplace info metacrawler
[...check output to understand next step...]

Install dependencies

bash

$ bin/pip install olefile pypdf3 lxml
$ ./recon-ng

Don’t worry if the module still shows up with the Dependencies symbol in the marketplace overview. Do a modules search … instead, if it shows up there, it should be good to go, just try and load it now.

Modules

bash

[recon-ng][default] > modules search meta
[*] Searching installed modules for 'meta'...

  Recon
  -----
    recon/domains-contacts/metacrawler

[recon-ng][default] > modules load metacrawler

If you run into Captcha trouble, you can do it manually:

--------------
NETWORK-SEC.DE
--------------
[*] Searching Google for: site:network-sec.de filetype:doc OR filetype:xls OR filetype:ppt OR filetype:docx OR filetype:xlsx OR filetype:pptx OR filetype:pdf

Workspaces

Recon-NG supports workspaces, similar to Metasploit, and you’re highly encouraged to use them to keep your pentesting gigs in order.

Fast dirbuster alternative

Be sure to set https protocol and port!

bash

[recon-ng][default] > modules load discovery/info_disclosure/interesting_files

[recon-ng][default][interesting_files] > options set SOURCE network-sec.de
SOURCE => network-sec.de
[recon-ng][default][interesting_files] > options set protocol https
PROTOCOL => https
[recon-ng][default][interesting_files] > options set PORT 443
PORT => 443

[recon-ng][default][interesting_files] > run
[*] https://network-sec.de:443/robots.txt => 200. 'robots.txt' found!
[*] https://network-sec.de:443/sitemap.xml => 200. 'sitemap.xml' found but unverified.
[*] https://network-sec.de:443/sitemap.xml.gz => 200. 'sitemap.xml.gz' found but unverified.
[*] https://network-sec.de:443/crossdomain.xml => 200. 'crossdomain.xml' found but unverified.
[*] https://network-sec.de:443/phpinfo.php => 200. 'phpinfo.php' found but unverified.
[*] https://network-sec.de:443/test.php => 200. 'test.php' found but unverified.
[*] https://network-sec.de:443/elmah.axd => 403
[*] https://network-sec.de:443/server-status => 403
[*] https://network-sec.de:443/jmx-console/ => 200. 'jmx-console/' found but unverified.
[*] https://network-sec.de:443/admin-console/ => 200. 'admin-console/' found but unverified.
[*] https://network-sec.de:443/web-console/ => 200. 'web-console/' found but unverified.
[*] 1 interesting files found.
[*] Files downloaded to '.recon-ng/workspaces/default/'

Web Scraping

https://www.youtube.com/watch?v=7kbQnLN2y_I

https://r.jina.ai/full_url_you_want_to_scrape
https://www.firecrawl.dev/
https://spider.cloud
AgentQL
MultiOn

Common Crawl - sources for DNS Dumpsters

CommonCrawl is a copy of the entire internet from the frontend view, taken multiple times per year.

Since any LLM manufacturer on the planet hits the CommonCrawl index pretty hard, it requires a lot of patience to operate it. We made a script - but tbh, CC is a bit harder to work with. You really need to… 504 - rest of the sentence blocked by rate limit

The script as-is will crawl all indices and search for historic data, we were interested in Domain <-> IP mappings (historic A records). There’s also a Python lib called cdx, that allows for a few more (or different kind of) ops. It’s as slow, right now it’s unusable.

bash

$ pip3 install cdx-toolkit
$ cdxt --cc  iter '<domain you wanna search>/*'

There are a few alternative projects, all we tested had the same issue:
https://github.com/ikreymer/cdx-index-client

We were searching for data sources of historic DNS services like DNS-Dumpster or HackerTarget - but comparing the data it seems HackerTarget’s data doesn’t stem from CC (while they claim so on a subpage) - rather it’s pretty much identical to DNSRecon, when activating all switches, mostly the Cert Search.

We searched several domains in both CC, the mentioned services as well as DNSRecon / Cert Search and the results where always identical. CC hardly contains subdomains in the crawls (only those actively connected to an IP and hosting a public project for a certain amount of time).

Also, when you enter an IP to perform a reverse search, you don’t get anything on HackerTarget but you get the data from CC. Don’t get us wrong, we highly appreciate those services, but always fear they might vanish some day, so we want to know, how they work and be able to retain the ability long-term.

The only truly unique data we found while playing with all these tools and sources extensively, was the Hacker Target Reverse Analytics lookup - really an amazing tool. Will have to look into how this works under the hood.

Our plans to build our own “DNS Dumpster” as a Python Script failed at the point where CommonCrawl responds to singular requests that slowly (>1 min per request) - anything meaningful you probably can do better offline.

When you take the time, it’s still nice how much data you can find about a target in the CommonCrawl datasets:

Historic IPs
Full URL Path indices
Full frontpage HTML

You can pretty much build your own Wayback Machine. Again, if you have the patience. We did and started to create:

CC-Search

https://github.com/Network-Sec/CC-Search

Another one of our own tools, we made as already available solutions simply didn’t work.

Note: Work as much as you can with aws cli and directly go for the s3 buckets hosting CC. Sadly, the algorithm, how the URL Index and the offsets are calculated, so you only need to download that specific piece you search for, remains a mystery. AFAWK you cannot replicate this mechanism, which is the slowest part in all of this, via aws cli.

Tipp: Run the script multiple times, cause rate-limit will block quite often, and often times, at random.

bash

# Search all indices of 2020-2022 - always supply in backwards order!
$ cc_domain_search.py 79.22.21.65 --year 2022 2021 2020

# Search one index per year, from 2015-now
$ cc_domain_search.py network-sec.de --only

# Search all indices - may hit rate limit
$ cc_domain_search.py network-sec.de --only

Breakdown of the pseudo-json returned by the URL index search

Field	Value
urlkey	net,example)/
timestamp	20191210212835
url	https://example.net
mime	text/html
mime-detected	text/html
status	503
digest	3I42H3S6NNFQ2MSVX7XZKYAYSCX5QBYJ
length	513
offset	69591352
filename	crawl-data/CC-MAIN-2019-51/segments/1575540529006.88/crawldiagnostics/CC-MAIN-20191210205200-20191210233200-00043.warc.gz

The deconstructed components of the `filename`:

Component	Value
Common Crawl Dataset	CC-MAIN-2019-51
Segment Information	segments/1575540529006.88
Segment ID	1575540529006.88
Subdirectory	crawldiagnostics
WARC File Information	CC-MAIN-20191210205200-20191210233200-00043.warc.gz
WARC File Name	CC-MAIN-20191210205200-20191210233200-00043.warc.gz

CC aws cli

Sadly, nothing you can do with aws cli, which is a lot faster than their website, will give you meaningful information for a specific domain. The closest you can get is:

Download each cluster.idx (which will contain less than 1% of crawl data)
As cluster.idx is at least alphabetical, you can educated-guess your way to the corresponding index file (like 00000-002999 maps to A-Z for each CC Crawl year / number)
Downloading the entire index files (~ 1GB) via aws cli is faster than using their index-offset URL calculator - you’ll still be hit with eventual blocks & throttles accusing you of exceeding the rate limit with your single request

If you really want to build your own historic data service, make a script that downloads it all (several Petabyte total), but only one file at the time, extract the pieces you need (DNS data isn’t much data, far less than entire website HTML), index it into a DB, let it run and wait a year before your Historic DNS project can go life.

Some random example commands

Just ls and cp your way through it.

bash

$ aws s3 cp s3://commoncrawl/cc-index/collections/CC-MAIN-2017-30/indexes/cdx-00000.gz .
$ aws s3 cp --recursive s3://commoncrawl/crawl-analysis/CC-MAIN-2017-30/count/ .
$ aws s3 cp s3://commoncrawl/cc-index/collections/CC-MAIN-2017-30/indexes/cluster.idx ./CC-MAIN-2017-30_cluster.idx

Your best bet

Download all cluster.idx and perform a loose A-Z mapping

Note that we put the crawl date-numbers (“CC-MAIN-2021-21”) into a newline separated list.

powershell

$ foreach($i in (Get-Content -Path .\Indices.txt)) { mkdir $i; aws s3 cp "s3://commoncrawl/cc-index/collections/$i/indexes/cluster.idx" "D:\CommonCrawl\$i\" }

We don’t show the mapping, but if you look into the file, you’ll get the point.

If you still want to try and download “everything” relevant, you got two choices:

Download all indices for a crawl (maybe limit it to 1-2 crawls per years)

These are those:

s3://commoncrawl/cc-index/collections/CC-MAIN-2008-2009/indexes/cdx-00001.gz

This will give you all the offsets in about 30GB per crawl, gzip compressed. You can use zgrep to search for a domain SURL format, like

bash

$ zgrep "aab,careers" cdx-0*.gz

If you’re after more data, like the HTML content of a page, this is your method!

In total you could get away with about 3 Terrabyte for a good 30% of all crawl data available - but after extraction via zgrep, you still need to download the offset-chunks to get the actual data.

Example result from zgrep:

json

cdx-00000.gz:abb,careers)/bulgaria/bg/job/83665415/global-product-manager-power-cabinets 20220927120023 {"url": "https://careers.abb/bulgaria/bg/job/83665415/Global-Product-Manager-Power-Cabinets", "mime": "text/html", "mime-detected": "text/html", "status": "200", "digest": "6V4QPKWKW4225UZ3XVURRI3K7246THBN", "length": "136568", "offset": "209602196", "filename": "crawl-data/CC-MAIN-2022-40/segments/1664030335004.95/warc/CC-MAIN-20220927100008-20220927130008-00789.warc.gz", "charset": "UTF-8", "languages": "eng,bul", "truncated": "length"}

Has the upside of less data but the downside of the postprocessing steps (zgrep, succesive download) taking probably as long as the request to the CommonCrawl index-url page, if not even longer. You could of course do additional steps of pre-indexing, like the previously described mapping of index-number <-> A-Z (first letter of the domain name) and thus gain lots of speed.

Download all crawldiagnostics

Note: The crawldiagnostics folder appears first in 2017 - and it took us over 60h to download one complete crawl from a single year. Note that these are only the indices, not the crawl data!

It takes so long, because they’re all split into roughly 10.000 files and directories, amounting to about 100GB - that also makes it pretty unfun for everyday internet usage, your router can no longer prioritize media traffic correctly in parallel.

powershell

$ foreach($i in (Get-Content -Path .\Indices_2_per_year.txt)) { foreach($a in (aws s3 ls s3://commoncrawl/crawl-data/$i/segments/ | Select-String -Pattern "[\d\.]*"  -AllMatches).Matches) { if ($a.Length -gt 10) { $j=$a.Value; if ((aws s3 ls s3://commoncrawl/crawl-data/$i/segments/$j/ | Select-String -Pattern "crawldiagnostics" ).Matches.Success) { aws s3 cp --recursive "s3://commoncrawl/crawl-data/$i/segments/$j/crawldiagnostics/" "D:\CommonCrawl\$i\" } else { echo "$i $j doesnt contain crawldiagnostics"; }} } }

This is likely more data than the indices and has the downside you don’t get the warc, wat, wet, robotstxt and other directories (crawldiagnostics contains also warc files, not to be confused) - which you would get with the first method. For DNS-like purposes, those aren’t interesting in our oppinion.

Has the upside, you already own all the data and can extract Domain:IP locally, enter it into a DB and be happy.

Checking Reverse Analytics from a single crawl via API

This will exceed free requests after a while - upgrade, add another API service endpoint or switch source IP.

bash

$ for i in $(grep -Poe "UA-\d*-\d" CC-MAIN-20171210235516-20171211015516-00021.warc 2>/dev/null | sort -u); do curl https://api.hackertarget.com/analyticslookup/?q=$i; echo -e "\n"; sleep 5; done

Collecting DNS info from entire crawl

bash

$ find . -iname "*.gz" -exec zgrep "WARC-IP" {} -A 1 \; 2>/dev/null

Grep out data

bash

#!/bin/bash

for filename in ./*.warc.gz; do
    # We tried with zgrep for hours but the files are windows (?), at least broken it seems
    # (sometimes strange chars, terminators, grep sees binary etc...)
    entries=$(gunzip -c -q -k $filename | dos2unix -f | grep "WARC-IP"  -A 1 -B 10 2>/dev/null)

    # Data is separated by random number of newlines and two dashes
    # Most bash tools only support a single separator, so I replace 
    # the double-dash -- where it's a seperator  with a unique 
    # char that hopefully is nowhere else in the data
    prep_ent=$(echo -n "$entries" | awk 'BEGIN {RS="--\n"} {print $0"‡"}')
    mapfile -d '‡' parts <<< "$prep_ent"

    # Now we can loop through the individual parts and know
    # for sure, the data belongs to the same block
    for entry in "${parts[@]}"; do
        # You could also try grep -w for simpler, whole-word matches
        # We'm ok with Posix Regex, char classes and lookbehind
        IP=$(echo "$entry" | grep -is "WARC-IP.*" | grep -Pose "\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}")
        FULL_URL=$(echo "$entry" | grep -is "WARC-.*URI.*" | grep -Pose "(?<=: ).*")
        DOMAIN=$(echo "$entry" | grep -is "WARC-.*URI.*" | grep -Pose "(?<=://)[^/]*")
        DATE=$(echo "$entry" | grep -Pose "(?<=WARC-Date: ).*")

        res="$DATE $IP $DOMAIN $FULL_URL"
        echo -e "$res"
    done
done

Other grep patterns

You can find cookies in CC, but I’m not interested in account takeover - We only look for OSINT data. We stumbled over some Azure passwords, along with username, Google Adwords IDs and pretty much anything you can find.

Simple Password

bash

$ grep -i "passw\|pwd"

German Licence Plate

bash

$ grep -Pse "([A-Z]{1,2}-){2}\d{1,5}"

Automatically Collect OSINT

while you browse and do research, including LinkedIn and other plattforms. Works via mitm proxy.

Amazing project created by Bellingcat Hackathon 2023:
https://github.com/elehcimd/stratosphere

Needs a local Docker / webserver. Sadly, the project is not fully functional and development seems to have ceased. Since the architecture is rather uncommon, using Jupyther-Notebooks as “microservices”, it’s pretty hard to understand, how it was originally intended. If someone would fix the LinkedIn part, if would be phantastic.

Company and people research

Reverse image search

https://tineye.com/

(serches just for exact duplicates and sources)

Companies

https://www.northdata.com/
https://www.dnb.com/de-de/firmenverzeichnis
https://firmeneintrag.creditreform.de/
https://www.bundesanzeiger.de

People (insolvent)

https://neu.insolvenzbekanntmachungen.de/ap/suche.jsf - Allows wildcard search

People (Mostly US Fraudsters)

https://socialcatfish.com/
https://www.instantcheckmate.com/

Doxes

https://doxbin.com

Lawyers

https://bravsearch.bea-brak.de/bravsearch/index.brak
https://www.notar.de/

More Justice online services

https://justiz.de/onlinedienste/index.php

Find leads for your story

https://aleph.occrp.org/

The Aleph data platform brings together a vast archive of current and historic databases, documents, leaks and investigations.

Panama Papers

https://offshoreleaks.icij.org/
https://github.com/ICIJ/offshoreleaks-data-packages

Also contains newer publications, can be downloaded as CSV. Contains GitHub repo with instructions, how to download, import and setup your own, local neo4j version of the papers. Very nice as evening project, to learn Graph-Databases.

Bypass Paywall

https://sci-hub.st/

Also more data & publications - but recently in legal proceedings and public media critique.

Spidermaps

https://littlesis.org/oligrapher

Police Investigations and Missing Persons Germany / Europe

https://fahndung.polizei-bw.de

You’ll find dedicated domains / subdomains for most federal states of Germany.

Public Research Service Overview

Resource	Description	URL
BKA Fahndungsliste	Federal Criminal Police Office’s list of wanted and missing persons.	bka.de
Polizeiliche Kriminalstatistik	Police crime statistics for crime trends and patterns.	bka.de/DE/AktuelleInformationen/StatistikenLagebilder
Bundesanzeiger	Federal Gazette for company information, financial statements, and insolvency notices.	bundesanzeiger.de
Handelsregister	Commercial Register with information about German companies.	handelsregister.de
Bundeszentralregister	Federal Central Criminal Register, accessible only to authorized entities.	N/A
Landesrechtsprechungsdatenbanken	Regional court decision databases (varies by state).	Varies by state
Destatis	Federal Statistical Office providing a range of statistics.	destatis.de
Insolvenzbekanntmachungen	Database of insolvency announcements.	insolvenzbekanntmachungen.de
OpenStreetMap	Mapping service for geographical and locational analysis.	openstreetmap.org
Fahrzeugregister des Kraftfahrt-Bundesamtes (KBA)	Federal Motor Transport Authority’s vehicle register.	kba.de
Einwohnermeldeamt (Residents’ Registration Office)	Registration office for resident information (access restricted to authorities).	N/A
Zollfahndung	Customs Investigation Bureau’s website for smuggling and customs fraud.	zoll.de
Bundesnetzagentur	Federal Network Agency for telecommunications investigations.	bundesnetzagentur.de

Anonymous Whistleblower Platforms

State Security

https://www.bkms-system.net/lkabw-staatsschutz

Corruption und Business crime

https://www.bkms-system.net/bw-korruption

AI Image Sharpen & upscale

https://www.artguru.ai/
https://deep-image.ai/ https://github.com/xinntao/Real-ESRGAN
https://github.com/jarrellmark/ai_upscaler_for_blender

Person & Face search

https://facecheck.id/
https://pimeyes.com
https://www.instantcheckmate.com/ (USA only)

Finding Discord User Content

Using Archive.org we can find images from Discord CDN, to do this we type in the search-bar:

https://cdn.discordapp.com/attachments/

Adding number combinations like this:

https://cdn.discordapp.com/attachments/1010011814140067841

to the link is like a filter for specific users on discord.

To find these user IDs you just need an image, the user sent, then you click on:

Open in browser

Use the first number string after attachments in the link

Theres your user ID - to find every file, the user sent on discord.

BitCoin and Crypto Currency

Bitcoin Historical Data

https://www.kaggle.com/datasets/mczielinski/bitcoin-historical-data

Bitcoin data at 1-min intervals from select exchanges, Jan 2012 to Present.

Kaggle of course offers a multitude of other datasets. Awesome.

Geo & Location Tools

Youtube Geolocation Search

https://mattw.io/youtube-geofind/location

Highres Maps

not Google Maps…

Israel / Palestina

https://www.govmap.gov.il/?c=171515.25,640812.95&z=9&b=2

3D Buildings, Daytime Shadows

https://app.shadowmap.org/?lat=47.98846&lng=7.78979

Global map of 3D buildlings and the shadows they cast at a specific time a day

Geographic landscape photos

https://geo-en.hlipp.de/

Germany, Britan, Ireland, Island.
“The Geograph project aims to collect geographically representative photographs and information for every square kilometre of Germany.”

AI location detection

https://labs.tib.eu/geoestimation/

Bing reverse image search

https://www.bing.com/images/feed

seems to be good at identifying animals

Cleanup Window Frames

https://cleanup.pictures

using AI

Other methods

Identifying Birds

https://merlin.allaboutbirds.org/photo-id/

(alternateive methode for locating places on images)

Identifying Flags

https://flagid.org/

good for locating countrys, states and citys

OCR foreign languages

Google Translate can read foreign langs quite well, also on images.

Find photo location

when person is on the photo in foreground: Pixelate the person so reverse image search tools won’t focus on the person but on the location instead.

Increase resolution

Mirror, upscale, rotate, perspective correction (either Affinity Photo / Photoshop or Blender 3D). You can use the Clone brush to fill missing parts, color filters can help as well.

Drones

From entry-level, smartphone controlled cheap drones to 6k FPV racers with 5 km reach, 4k Video recording, 15-20min air time per battery and up to 100 km/h speed… things became easy to access and a bit more scary, than in those science fiction stories (we’re used to that by now).

Having real-time aerial Spies in the sky capabilities in your EDC backback isn’t the end of the line here: Pro’s have long stepped up another level to bird-like, silent drones that you can’t tell appart from real birds with the naked eye.

Final Words

We’re aware, that was a lot, and far away from our new bite sized content, we introduced with this blog. However, there are topics you can’t reasonably split as small.

That doesn’t mean, we won’t go into further details on one or the other subtopics in OSINT or Investigation.

Outlook

It’s fixed on our roadmap showing GHunt in action, step by step. We also found a way, to enumerate Facebook Friend Profiles - without being logged in. Check back soon.

However - after the Facebook post we’ll take a much-needed break for Easter Holidays.

Toolkits / Collections

Awesome OSINT

Overview & Desktop Background Poster

Most used Tools

More Tools, Apps & Providers

People Data APIs

APIs

VPN Region checker

Youtube region block

Netflix availabilty / Global search

Random User Generator API

Business Name & Logo Generator

Yandex API Reverse Image Search

Telegram

Telemetrio

Precautions

Techniques

How to find initial keywords?

Telepathy

Basic Channel Enum

Grab all Channel Messages

Search for User Details

TeleRecon

More Telegram Tools

GeoMint

Google Dorks

Google Dork Generator

How to find names in documents

How to find comments from username

Tip: Use other search engines

CTFR

Create Account images

CrossLinked

Instagram Downloader

Download Instagram

SocInt Search Engine

SatInt

Prepared Smartphone

Phone Anti-Spam Apps - Reverse Phone Number Lookup

Twilio API

APILayer

beenverified

Username Search

DNS

DNSRecon

Nameserver hopping

Historic DNS

DNS-Dumpster

Whois Extended

WhoisFreaks

Whois IP

Whois Privacy

Traceroute

IP Address OSINT

Connected Websites

General Info & Geolocation

Offline IP / Range Scanners (Windows)

IP VPN & ASN Check

CriminalIP

Built With

Find Torrents of private IP

Google Dorks Generator

Leak Lookup

Azure Tenant

Hacker Target Toolkit

More API Keys & Endpoints

SecurityTrails Default

SecurityTrails “DNS History”

SecurityTrails “Subdomains”

SecurityTrails “IPs Nearby”

VPNAPI.io

HackerTarget

Reverse IP

More services

IP Footprinting

Webhistory.info

### Sorry, currently offline ###

How to search for CIDR ranges

Reverse IP OSINT Script

Example Output

The deconstructed components of the `filename`: