System activities and network traffic to train a machine learning model


As part of reading for MSc degree, I am working on understanding the applicability and measuring the usefulness of machine learning towards cybersecurity. While almost every cybersecurity product has machine learning to optimise its capabilities focusing on detection, there are no products that apply machine learning towards "active" (counter) measures. This is may be due to a lack of confidence in the machine learning model in an enterprise environment.

What if machine learning could detect anomaly in network traffic and apply active measures? Starting from containment of an attack to completely predict an impending attack and blocking IPs automatically? Hence as part of one of my dissertations, I am trying to measure the usefulness and finding the application at the network layer (firewall / IPS / layer 4-7). I am hoping for real-world datasets. I will try and combine this with system activities to train the model, however I will as of now; limit the scope to the network layer.

How can you assist?

I am inviting those in cybersecurity or otherwise to try out your cracking abilities (go - red teamers, skiddies and crackers) on my IP: It does not matter if you use default tools or customised versions of your cracking arsenal. What matters is that I get traffic which I can feed to my machine learning model.

There are periodic scans that many online scanners carry out such as - I want to use traffic from such scanners with real-world attempts to mature the model to detect between an actual attack and a harmless scan, which may later be used for an attack.


You may choose my IP to play around with your cracking abilities. There are no penalties if you end up breaking something :) - so go for it! Refer to ROE rule 1 though.

ROE (Rules Of Engagement):

  1. No DoS or DDoS, I am testing this on my home internet connection. I do not have the budget to cloud host this (as I need the credits to compute the model). This is the only internet connection I have. If D/DoSed, my wife will get furious because of the lack of NetFlix and Instagram. If that doesn't convince you, be advised that I use the same connection to play Starcraft 2, CS:GO and DoTA 2. Don't be that person who causes D/DoS to prove a point. It's lame. Finally, ISP may give up on me and that will cause severe issues in my research and education. So no D/DoS - Please.

  2. To have a comprehensive dataset, I have put a file inside systeroot directory of the host machine. If you can get its content, message me and ask for the bounty. Filename: hackme.txt I have set a scheduled task for checking the last access time of the file - just in case :).

  3. There are no planted vulnerabilities in the system. This isn't a designed "hackthebox." I am looking for real-world traffic. There is an OS which hasn't been patched for about four months, with three web-facing applications all running in default install without any hardening. What would an attacker do and how can I use it to train a model is what I am interested in. From access to this very webpage to scanning, the system is collecting dataset from all possible locations.

  4. The system is interactive, once you break in any of the web-facing applications, you will get interactive access. At this point, commands you type will be recorded and responded to. This is the crucial part as I will be recording system output with network traffic output. The fun starts at combining these two layers. Don't be disheartened if you don't get to systemroot in one go. Remember, there is a real possibility that others will be trying at the same time, and the system can only handle five concurrent connections.

  5. Use any generic tool you prefer and if possible, let me know your IP address. In a few cases, I may contact you to identify the tool used along with the parameters.

​​What is in it for you?

As a self-funded student, I don't have a plentiful bounty to give out. I can, however, offer you a brand new Raspberry Pi 4 Model B (RPi) with the configuration of your choosing (albeit choosing the one with 4 GB RAM would be the most prudent choice.) Should you decide not to take the RPi, I may suggest something more rudimentary like alcohol for the spirited hacker inside you (yup, a pun). Or an Oxford University hoodie - I am willing to offer anything equal in the monetary value an RPi (approximately 54 GBP).


I will include your name in the citation of my paper.

Thank you very much for your assistance and may the force be with you.