A Brief Introduction to Semgrep (Part 1) — TrustFoundry Blog

Introduction

Semgrep is an amazing static analysis tool that we are excited about. This is part 1 of a 2 part series. Josiah’s part 2 dives much deeper into the details of the rules and how to hunt for specific vulnerabilities. This post is to provide high-level information and show you how to run Semgrep.

Semgrep was created from an open-source facebook project, pfff. All of the current functionality is available for free and is open source, although ReturnToCorp, the makers of Semgrep, have plans to commercialize enterprise usages, such as for CI/CD integrations. A similar tool exists called lgtm, and it integrates with GitHub/BitBucket. It is made by Semmle, which was bought by Microsoft/GitHub. It is worth looking into if Semgrep sounds interesting to you.

Semgrep vs Commercial Static Analysis Tools

I’ve yet to use a commercial static analysis tool that I would recommend for penetration testers. I would be hesitant to recommend many commercial solutions for developers as well. The problem is not that they don’t find anything, it’s generally that they are difficult to use and flag every 10th line of code as a potential vulnerability. Some portion of the findings require security expertise to review (e.g. is using MD5 really a vulnerability in this case?). It takes someone who understands the code, the tool, and application security to accurately run most tools, which is not insurmountable, but for fairly mediocre output, it just generally is not worth it. Additionally, some commercial tools excel in a few languages but fall flat on others. It is very important to test out any commercial tool against your code when evaluating it.

For a pentester’s purposes, commerical tools just take too much time to get configured and working, especially considering the pretty weak results. It would be like spending a day getting Word’s spellcheck to work, just to find some errors in a 30-page document. It is much better to spend time manually bug hunting than dealing with errors and requirements of tools. Many commercial static tools also require the project to successfully build, which can take several days depending on the project and its complexity. Conversely, the first time I used Semgrep, I had results to work from with just a few minutes of effort.

Semgrep vs Grep

Given the trouble with commercial tools, penetration testers and others often use grep to find some keywords and search for areas of concern across the codebase. This is great for finding potentially vulnerable function calls, although it can generate a lot of results and miss some cases. In contrast, Semgrep understands the languages it is searching for and has the ability to detect vulnerabilities that span multiple lines.

Example:

Let’s start out by running the java ruleset on Android-InsecureBankv2:

docker run --rm -v "C:\source\Android-InsecureBankv2:/src" returntocorp/semgrep --config "https://semgrep.dev/c/p/java"
using config from https://semgrep.dev/p/java. Visit https://semgrep.dev/registry to see all public rules.
downloading config...
running 28 rules...
InsecureBankv2/app/src/main/java/com/android/insecurebankv2/ChangePassword.java
severity:warning rule:java.lang.security.audit.crypto.ssl.defaulthttpclient-is-deprecated.defaulthttpclient-is-deprecated: DefaultHttpClient is deprecated. Further, it does not support connections
using TLS1.2, which makes using DefaultHttpClient a security hazard.
Use SystemDefaultHttpClient instead, which supports TLS1.2.
128: HttpClient httpclient = new DefaultHttpClient();
autofix: s/DefaultHttpClient/SystemDefaultHttpClient/g
InsecureBankv2/app/src/main/java/com/android/insecurebankv2/CryptoClass.java
severity:warning rule:java.lang.security.audit.cbc-padding-oracle.cbc-padding-oracle: Using CBC with PKCS5Padding is susceptible to padding orcale attacks. A malicious actor
could discern the difference between plaintext with valid or invalid padding. Further,
CBC mode does not include any integrity checks. See https://find-sec-bugs.github.io/bugs.htm#CIPHER_INTEGRITY.
Use 'AES/GCM/NoPadding' instead.
55: cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
autofix: javax crypto Cipher.getInstance("AES/GCM/NoPadding");
77: Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
autofix: javax crypto Cipher.getInstance("AES/GCM/NoPadding");
InsecureBankv2/app/src/main/java/com/android/insecurebankv2/DoLogin.java
severity:warning rule:java.lang.security.audit.crypto.ssl.defaulthttpclient-is-deprecated.defaulthttpclient-is-deprecated: DefaultHttpClient is deprecated. Further, it does not support connections
using TLS1.2, which makes using DefaultHttpClient a security hazard.
Use SystemDefaultHttpClient instead, which supports TLS1.2.
116: HttpClient httpclient = new DefaultHttpClient();
autofix: s/DefaultHttpClient/SystemDefaultHttpClient/g
InsecureBankv2/app/src/main/java/com/android/insecurebankv2/DoTransfer.java
severity:warning rule:java.lang.security.audit.crypto.ssl.defaulthttpclient-is-deprecated.defaulthttpclient-is-deprecated: DefaultHttpClient is deprecated. Further, it does not support connections
using TLS1.2, which makes using DefaultHttpClient a security hazard.
Use SystemDefaultHttpClient instead, which supports TLS1.2.
131: HttpClient httpclient = new DefaultHttpClient();
autofix: s/DefaultHttpClient/SystemDefaultHttpClient/g
262: HttpClient httpclient = new DefaultHttpClient();
autofix: s/DefaultHttpClient/SystemDefaultHttpClient/g
wip-attackercode/ExploitAES/app/src/main/java/com/android/dns/exploitaes/MainActivity.java
severity:warning rule:java.lang.security.audit.cbc-padding-oracle.cbc-padding-oracle: Using CBC with PKCS5Padding is susceptible to padding orcale attacks. A malicious actor
could discern the difference between plaintext with valid or invalid padding. Further,
CBC mode does not include any integrity checks. See https://find-sec-bugs.github.io/bugs.htm#CIPHER_INTEGRITY.
Use 'AES/GCM/NoPadding' instead.
115: Cipher cipher = Cipher.getInstance("AES/CBC/PKCS5Padding");
autofix: javax crypto Cipher.getInstance("AES/GCM/NoPadding");

The results seem okay, but not great. Some of this may be due to the rules focusing on Java, but do not apply to Android. r2c’s security audit rules, which is a good ruleset that covers a lot of languages, provides the same results as the java ruleset. It is safe to assume that they share many or possibly all of the same Java rules.

Let’s try another ruleset, findsecbugs, to see how the results differ:

docker run --rm -v "C:\source\Android-InsecureBankv2:/src" returntocorp/semgrep --config "https://semgrep.dev/p/findsecbugs"
using config from https://semgrep.dev/p/findsecbugs. Visit https://semgrep.dev/registry to see all public rules.
downloading config...
running 43 rules...
InsecureBankv2/app/src/main/java/com/android/insecurebankv2/ChangePassword.java
severity:warning rule:java.lang.security.audit.crypto.ssl.defaulthttpclient-is-deprecated.defaulthttpclient-is-deprecated: DefaultHttpClient is deprecated. Further, it does not support connections
using TLS1.2, which makes using DefaultHttpClient a security hazard.
Use SystemDefaultHttpClient instead, which supports TLS1.2.
128: HttpClient httpclient = new DefaultHttpClient();
autofix: s/DefaultHttpClient/SystemDefaultHttpClient/g
InsecureBankv2/app/src/main/java/com/android/insecurebankv2/DoLogin.java
severity:warning rule:java.lang.security.audit.crypto.ssl.defaulthttpclient-is-deprecated.defaulthttpclient-is-deprecated: DefaultHttpClient is deprecated. Further, it does not support connections
using TLS1.2, which makes using DefaultHttpClient a security hazard.
Use SystemDefaultHttpClient instead, which supports TLS1.2.
116: HttpClient httpclient = new DefaultHttpClient();
autofix: s/DefaultHttpClient/SystemDefaultHttpClient/g
InsecureBankv2/app/src/main/java/com/android/insecurebankv2/DoTransfer.java
severity:warning rule:java.lang.security.audit.crypto.ssl.defaulthttpclient-is-deprecated.defaulthttpclient-is-deprecated: DefaultHttpClient is deprecated. Further, it does not support connections
using TLS1.2, which makes using DefaultHttpClient a security hazard.
Use SystemDefaultHttpClient instead, which supports TLS1.2.
131: HttpClient httpclient = new DefaultHttpClient();
autofix: s/DefaultHttpClient/SystemDefaultHttpClient/g
262: HttpClient httpclient = new DefaultHttpClient();
autofix: s/DefaultHttpClient/SystemDefaultHttpClient/g

The results seem to be a subset of the previous attempts. None of these three rulesets generated too many findings.

Let’s try another codebase, the Damn Vulnerable Java (EE) Application which has all of the OWASP Top 10 vulnerabilities. The https://semgrep.dev/c/p/java and https://semgrep.dev/p/r2c-security-audit rulesets were able to find the same two SQL Injection findings (one of which is not officially listed in the solutions):

docker run --rm -v "C:\source\dvja:/src" returntocorp/semgrep --config=https://semgrep.dev/p/r2c-security-audit
using config from https://semgrep.dev/p/r2c-security-audit. Visit https://semgrep.dev/registry to see all public rules.
downloading config...
running 202 rules...
ran 202 rules on 23 files: 2 findings
src/main/java/com/appsecco/dvja/services/ProductService.java
severity:warning rule:java.lang.security.audit.formatted-sql-string.formatted-sql-string: Detected a formatted string in a SQL statement. This could lead to SQL
injection if variables in the SQL statement are not properly sanitized.
Use a prepared statements (java.sql.PreparedStatement) instead. You
can obtain a PreparedStatement using 'connection.prepareStatement'.
48: Query query = entityManager.createQuery("SELECT p FROM Product p WHERE p.name LIKE '%" + name + "%'");
src/main/java/com/appsecco/dvja/services/UserService.java
severity:warning rule:java.lang.security.audit.formatted-sql-string.formatted-sql-string: Detected a formatted string in a SQL statement. This could lead to SQL
injection if variables in the SQL statement are not properly sanitized.
Use a prepared statements (java.sql.PreparedStatement) instead. You
can obtain a PreparedStatement using 'connection.prepareStatement'.
75: Query query = entityManager.createQuery("SELECT u FROM User u WHERE u.login = '" + login + "'");

Finding two vulnerabilities is good for a couple of minutes of work, but only finding one vulnerability out of at least twelve (there are two types of injection and two types of XSS) is not so great. By trying the XSS ruleset, we are able to find the reflected XSS, but not the stored XSS:

docker run --rm -v "C:\source\dvja:/src" returntocorp/semgrep --config=https://semgrep.dev/p/xss
using config from https://semgrep.dev/p/xss. Visit https://semgrep.dev/registry to see all public rules.
downloading config...
running 60 rules...
ran 60 rules on 172 files: 4 findings
src/main/webapp/WEB-INF/dvja/ProductList.jsp
severity:warning rule:java.lang.security.audit.xss.jsp.no-scriptlets.no-scriptlets: JSP scriptlet detected. Scriptlets are difficult to use securely and
are considered bad practice. See https://stackoverflow.com/a/3180202.
Instead, consider migrating to JSF or using the Expression Language
'${...}' with the escapeXml function in your JSP files.
23:<%= request.getParameter("searchQuery") %>
src/main/webapp/WEB-INF/dvja/common/Footer.jsp
severity:warning rule:java.lang.security.audit.xss.jsp.use-escapexml.use-escapexml: Detected an Expression Language segment that does not escape
output. This is dangerous because if any data in this expression
can be controlled externally, it is a cross-site scripting
vulnerability. Instead, use the 'escapeXml' function from
the JSTL taglib. See https://www.tutorialspoint.com/jsp/jstl_function_escapexml.htm
for more information.
1:${request.contextPath}
src/main/webapp/WEB-INF/dvja/common/Head.jsp
severity:warning rule:java.lang.security.audit.xss.jsp.use-escapexml.use-escapexml: Detected an Expression Language segment that does not escape
output. This is dangerous because if any data in this expression
can be controlled externally, it is a cross-site scripting
vulnerability. Instead, use the 'escapeXml' function from
the JSTL taglib. See https://www.tutorialspoint.com/jsp/jstl_function_escapexml.htm
for more information.
10:${request.contextPath}
13:${request.contextPath}

We were only able to find three types of vulnerabilities out of at least twelve. A couple of these are going to be nearly impossible to identify with automated tools, and several more are difficult. I think the coverage is very reasonable given these constraints and the time I have to put into it. It is also important to understand that semgrep is not going to find everything, but will provide a layer of coverage. grep and heavyweight commercial tools have their place, but semgrep occupies the sweet spot for many common use cases.

Further Learning

https://tldrsec.com/blog/tldr-sec-035/ & https://tldrsec.com/blog/tldr-sec-037/ These tl;dr sec newsletters are where I first learned about semgrep. It is a great resource for modern application security, security automation, and devsecops.

We45 has a good video on semgrep. We45 also has many other great application security videos on their channel.