Log4j Log4Shell 0-day: find, fix, and track affected code
The steps to identify and fix/mitigate the log4j 0-day (CVE-2021-44228) ("Log4Shell") in your code have been widely reported (1 2 3). But the steps are manual and tedious, and it's hard to track the progress of fixes/mitigations across all your code. To help, we're publishing queries, scripts, and instructions for using code search to:
- Find everywhere log4j is used across all your code
- Automate PRs to fix/mitigate the log4j 0-day across all your code
- Track progress of applying fixes/mitigations for the log4j 0-day
We've documented how to do these things in Sourcegraph below and will be adding instructions (where possible) soon for other code search tools.
Find everywhere log4j is used across all your code
Run these queries on Sourcegraph to quickly determine which projects directly depend on vulnerable versions of log4j. The following links show results on Sourcegraph Cloud across 2M public repositories.
- Direct dependencies on vulnerable log4j versions specified in common build systems:
- Broader queries (with more false positives):
- Any file containing
org.apache.logging.log4j
followed by a vulnerable version number - All files or all repositories that contain
org.apache.logging.log4j
- All files or all repositories that contain
log4j
- Any file containing
To search across your organization's private code:
- On Sourcegraph Cloud, run the queries linked above in your user search context (
context:@username
), after you've synced all of the org repositories you want to search (in Settings > Account > Your repositories). - On a self-hosted Sourcegraph instance, copy and paste those queries above into the search box on your instance. After pasting, ensure the
.*
(regexp search) button is on for queries that contain regular expressions.
Once you've found where vulnerable log4j dependency versions are used, you can:
- Automate the creation of pull requests to fix/mitigate the issues (see the next section).
- Share the search URLs with your team to work on eliminating all unsafe deps (getting to "0 results"). With Code Insights, you also get line charts of the progress (see below).
- Get the raw dataset of all results: export the results to CSV or a spreadsheet with the sourcegraph/search-export extension, or use the Sourcegraph GraphQL API or
src
CLI. - Use a search notebook to compile all of the queries your team is using to identify potentially vulnerable code.
Although code search is a fast and versatile tool for assessing the impact of a novel vulnerability, it's not perfect. Here are some limitations:
- These build systems have no convention for dependency lockfiles, so the above queries won't find projects where log4j is a transitive (indirect) dependency (because there's no file committed to Git that lists the fully resolved dependencies and versions). See the next section for how Sourcegraph can invoke your build tool to get a precise set of transitive dependencies (and then automate PRs to fix/mitigate the issue).
- The queries above won't find other indirect usage of log4j, such as a test script that downloads and runs other programs that use log4j. There's no general way to find and fix that type of issue. However, if you know what to look for (such as specific old versions of Elasticsearch that use vulnerable log4j versions), then code search is quite helpful.
Automate PRs to fix/mitigate the log4j 0-day across all your code
Use the following batch change specs to programmatically create GitHub pull requests (or GitLab merge requests) to apply the following fixes/mitigations across all of your code:
- upgrade-log4j-gradle: Force usage of safe log4j dependency versions (including for transitive dependencies) in all Gradle projects that use affected log4j dependency versions.
- detect-log4j-gradle: Detect Gradle projects (using
build.gradle
files) that use affected log4j dependency versions and open a pull request with afixme
file. - detect-log4j-maven: Detect Maven projects (using
pom.xml
files) that use affected log4j dependency versions and and open a pull request with afixme
file. - We'll be adding more (and let us know if you have specific requests). You can also customize the existing specs for your needs, or write your own batch change spec.
After you preview and create a batch change, you can see all of the pull requests and track their progress:
Here's a video walkthrough of how to use Batch Changes to fix/mitigate log4j vulnerabilities in your code:
To use Batch Changes on your organization's private code:
This feature requires a self-hosted Sourcegraph instance and is usually part of an enterprise plan. We’re giving out temporary license keys to use Batch Changes for log4j-related fixes. Email log4j-incident-response-help@sourcegraph.com and we'll reply quickly with a temporary key.
Track progress of applying fixes/mitigations for the log4j 0-day across all your code
If you have a lot of projects that need to be patched, and a lot of people working in parallel to apply patches, it's important to know:
- Which projects are still vulnerable?
- How many applications have been patched so far?
- Who's applying the patches, and what are the actual diffs?
Given any search query (such as the ones linked at the top of the post), you can use Code Insights to automatically track progress and changes over time. You can also drill into any data point to see the commits that were responsible for the changes. Unlike manually tracking progress in an issue tracker or spreadsheet, this takes no time to maintain and is always up to date.
To get this code insight on your organization's private code:
- This feature requires a self-hosted Sourcegraph instance and is usually part of an enterprise plan. We’re giving out temporary license keys to use Code Insights for log4j-related fixes. Email log4j-incident-response-help@sourcegraph.com and we'll reply quickly with a temporary key.
- Go to Insights > Create new insight > Create search insight.
- Select the specific repositories in which to measure progress (or all repositories).
- Add the 3 data series shown in the screenshot above. The queries used above are defined as follows, but you can customize them as needed (using the query links at the start of this post for inspiration):
- Vulnerable log4j versions =
lang:gradle org\.apache\.logging\.log4j['"] 2\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15|16)(\.[0-9]+) patterntype:regexp
- Upgraded log4j versions =
lang:gradle org\.apache\.logging\.log4j['"] 2\.(17)(\.[0-9]+) patterntype:regexp
- formatMsgNoLookups =
-Dlog4j2.formatMsgNoLookups=true
- Vulnerable log4j versions =
- Give the insight a name and save it.
Getting started with Sourcegraph
Free accounts on Sourcegraph Cloud allow you to search your org private code from GitHub.com and GitLab.com. You can choose which orgs and repositories to sync when signing up or later by visiting Settings > Account > Your repositories.
To use Batch Changes and Code Insights to apply mass fixes and track progress, or if you want to run it on your own laptop or infrastructure, set up a self-hosted Sourcegraph instance. These features are usually part of an enterprise plan, but we're giving out temporary license keys to use these features for log4j-related fixes. Email log4j-incident-response-help@sourcegraph.com and we'll reply quickly with a temporary key.
Thanks to the following people for helping with this post: Olaf Geirsson, Rebecca Dodd, Thorsten Ball, Erica Lindberg, Malo Marrec, Victoria Yunger, Beyang Liu. We welcome edits to this post.
About the author
Quinn Slack is the CEO and co-founder of Sourcegraph, the code intelligence platform for dev teams and making coding more accessible to more people. Prior to Sourcegraph, Quinn co-founded Blend Labs, an enterprise technology company dedicated to improving home lending and was an egineer at Palantir, where he created a technology platform to help two of the top five U.S. banks recover from the housing crisis. Quinn has a BS in Computer Science from Stanford, you can chat with him on Twitter @sqs.