Skip to main content

A Brief Introduction to Semgrep

Introduction

Semgrep is an amazing static analysis tool that we are excited about. This is part 1 of a 2 part series. Josiah’s part 2 dives much deeper into the details of the rules and how to hunt for specific vulnerabilities. This post is to provide high-level information and show you how to run Semgrep.

Semgrep was created from an open-source facebook project, pfff.  All of the current functionality is available for free and is open source, although ReturnToCorp, the makers of Semgrep, have plans to commercialize enterprise usages, such as for CI/CD integrations. A similar tool exists called lgtm, and it integrates with GitHub/BitBucket. It is made by Semmle, which was bought by Microsoft/GitHub. It is worth looking into if Semgrep sounds interesting to you.

Semgrep vs Commercial Static Analysis Tools

I’ve yet to use a commercial static analysis tool that I would recommend for penetration testers. I would be hesitant to recommend many commercial solutions for developers as well. The problem is not that they don’t find anything, it’s generally that they are difficult to use and flag every 10th line of code as a potential vulnerability. Some portion of the findings require security expertise to review (e.g. is using MD5 really a vulnerability in this case?). It takes someone who understands the code, the tool, and application security to accurately run most tools, which is not insurmountable, but for fairly mediocre output, it just generally is not worth it. Additionally, some commercial tools excel in a few languages but fall flat on others. It is very important to test out any commercial tool against your code when evaluating it.

For a pentester’s purposes, commerical tools just take too much time to get configured and working, especially considering the pretty weak results. It would be like spending a day getting Word’s spellcheck to work, just to find some errors in a 30-page document. It is much better to spend time manually bug hunting than dealing with errors and requirements of tools. Many commercial static tools also require the project to successfully build, which can take several days depending on the project and its complexity. Conversely, the first time I used Semgrep, I had results to work from with just a few minutes of effort.

Semgrep vs Grep

Given the trouble with commercial tools, penetration testers and others often use grep to find some keywords and search for areas of concern across the codebase. This is great for finding potentially vulnerable function calls, although it can generate a lot of results and miss some cases. In contrast, Semgrep understands the languages it is searching for and has the ability to detect vulnerabilities that span multiple lines.

Example: 

Let’s start out by running the java ruleset on Android-InsecureBankv2:

The results seem okay, but not great. Some of this may be due to the rules focusing on Java, but do not apply to Android. r2c’s security audit rules, which is a good ruleset that covers a lot of languages, provides the same results as the java ruleset. It is safe to assume that they share many or possibly all of the same Java rules.

Let’s try another ruleset, findsecbugs, to see how the results differ:

The results seem to be a subset of the previous attempts. None of these three rulesets generated too many findings.

Let’s try another codebase, the Damn Vulnerable Java (EE) Application which has all of the OWASP Top 10 vulnerabilities. The https://semgrep.dev/c/p/java and https://semgrep.dev/p/r2c-security-audit rulesets were able to find the same two SQL Injection findings (one of which is not officially listed in the solutions):

Finding two vulnerabilities is good for a couple of minutes of work, but only finding one vulnerability out of at least twelve (there are two types of injection and two types of XSS) is not so great. By trying the XSS ruleset, we are able to find the reflected XSS, but not the stored XSS:
We were only able to find three types of vulnerabilities out of at least twelve. A couple of these are going to be nearly impossible to identify with automated tools, and several more are difficult. I think the coverage is very reasonable given these constraints and the time I have to put into it. It is also important to understand that semgrep is not going to find everything, but will provide a layer of coverage. grep and heavyweight commercial tools have their place, but semgrep occupies the sweet spot for many common use cases.

Further Learning

https://tldrsec.com/blog/tldr-sec-035/ & https://tldrsec.com/blog/tldr-sec-037/ These tl;dr sec newsletters are where I first learned about semgrep. It is a great resource for modern application security, security automation, and devsecops.

We45 has a good video on semgrep. We45 also has many other great application security videos on their channel.

Alex Lauerman

Alex is a penetration tester based in Overland Park, Kansas, which is a suburb of Kansas City. Alex is thankful for being able to spend over 15 years of his life building and breaking applications.