skip navigation
skip mega-menu
Posts

Employee Spotlight: My time as a Netacea data science intern

Data science internship

As part of our Employee Spotlight series, I sat down with Sam, a recent PhD intern at Netacea, to find out what drew him to a bot management company and how the projects he worked on in our Data Science team help shape his thesis and career beyond. 

Tell me a bit about your academic background and how you ended up on your PhD course.

I did an integrated master’s in mathematics at the University of Liverpool with dissertations that focused on integer linear programming and topological data analysis. When I’d finished, I was approached by the life sciences department from the uni – in fact, they’d contacted me by mistake but thought I’d be a good fit for an electro-physiology deep learning PhD. I wanted to apply what I’d learned during my master’s to something directly useful and I’d always had an interest in the field. I went on to join the department in 2019 and I’m now in the final year of my PhD.

Did the pandemic affect your PhD progress?

I’ve faced delays and extensions because of Covid, but it didn’t directly affect my own research, luckily, despite setting back students in my cohort. I did end up learning lab techniques over WhatsApp for a while, however, which was interesting!

Did you have an interest in cybersecurity at this point?

Outside of my degree I’d been teaching Python to anyone who wanted to learn, and I was part of the cybersecurity society all through uni.

Were you seeking out cybersecurity companies like Netacea for your internship?

The uni offered pre-boxed internships but these were mostly engagement based and I was keen to do something in AI and data science. I began emailing round businesses doing data science and found the previous head of the department at Netacea looking for data scientists on Twitter. It was an easy choice to join for a three-month internship.

What did you work on while you were here?

I worked on two main projects in my time at Netacea. The first focused on widely distributed bot attacks and separating benign and malicious web requests. Doing this with botnets is more difficult; typically, if attackers are connecting botnets from residential data centers it becomes well disguised as normal traffic. We needed a way of detecting malicious users not by category but by the pattern of their requests over time. The results saw 90% of malicious requests blocked and 90% of the attacks we blocked as malicious, giving us a low false positive and false negative rate. This we could tweak depending on client needs.

The second project came off the back of a PhD paper I published on anonymized synthetic and labeled data. Generative adversarial networks (GANs) produce fake data by learning patterns of real data and reproducing them. We didn’t quite get there with the deployment of the model, unfortunately, but I was able to directly apply the problems and solutions I’d addressed in my paper to the project.

Aside from Data Science, did you get a chance to work with other teams?

The Threat Research team was really valuable for sharing ideas on how the attacks we were looking for work on a fundamental level, and suggested ways of using the model for multiple contexts, both internally and externally. The Platform team was also fantastic – before the internship, I didn’t have experience in AWS and platform tools, but the team got me up and running.

What are your key learnings from the internship?

During a theoretical degree, you don’t learn how to put your research into practice. At Netacea I received mentorship and used technologies that aren’t as widely used in academia. I’d argue because of the nature of research you can get away with low quality code – but you can’t get away from that in industry. I learned so much about how sustainable development should be done.

Has the experience changed your PhD and thesis?

While the writing is well underway for my thesis, it’s changed my attitude on my PhD. Like I said, what was validated at Netacea was the importance of code quality. I’d say my code was 5/10 on a quality scale in the lab – but I’ve now started to go back and change some of the code so it’s better quality for my research.

Netacea’s approach to the bot management problem is state of the art, really at the forefront of the industry. The teams try and solve old problems in new ways which is a great philosophy for a company to have. Approaching a problem fundamentally in a different way to others is always a good thing.

Do you think you’ll enter a business similar to Netacea after your PhD?

I’d love to. There’s a real internal drive for better code quality at Netacea and an attitude of doing everything properly so it’s maintainable in years to come – which was amazing to see.

What are you planning to do now?

So, I’m currently writing my thesis which will end up being a few hundred pages. Before the internship I didn’t take any time off, but now I’m going to take some time to go on a couple of trips.

Biggest challenge?

A two-week period where I was wrestling with AWS, trying to get my code deployed. Day after day there was a problem with a package, a problem with GitHub, a problem with permissions – all problems that weren’t data science. The barriers were frustrating though they enforced good habits.

Biggest highlight?

When I was doing distributed attack work, my highlight was the first moment I saw my model was possible – when the graph showed characteristics of client IPs activity over time, and we could easily see the clients we knew were bad were clustered together, separate from everything else. Clean separation between malicious and benign activity. Seeing an idea work in the flesh.

Interested in data science or joining another team at Netacea? Check out our current vacancies.

Subscribe to our newsletter

Sign up here