[Code Quality] Check Code using Semgrep with Regular Expression

2025-01-11 hit count image

Let's see how to check the code quality by using Semgrep with regular expression.

Outline

When developing, we often use static analysis tools like Lint to improve code quality. However, static analysis tools only provide general rules and do not provide all kinds of rules. So, when you want to add rules that static analysis tools do not provide, you often add custom rules to Lint.

Custom rules added in this way can only check the code that is the target of Lint, and cannot check parts such as comments. In such cases, you can create custom rules using regular expressions with Semgrep.

Semgrep

Semgrep is a static analysis tool that automatically checks and improves the safety, quality, and style of code. It is mainly used to analyze source code and find security vulnerabilities, bugs, and code quality issues.

Semgrep allows you to write custom rules in regular expression patterns. This allows you to check parts that are difficult to check with static analysis tools like Lint.

In this blog post, I will introduce how to check the code using Semgrep with regular expressions.

Install Semgrep

Semgrep runs on Python, so Python must be installed. I will omit how to install Python on each OS.

Semgrep can be installed as a Python package. To install Semgrep, run the following command.

pip install semgrep

After installing, create a requirements.txt to use it elsewhere.

pip freeze > requirements.txt

When you open requirements.txt, you can see the following content added.

semgrep==1.97.0

You can use this requirements.txt to install Semgrep elsewhere with the following command.

pip install -r requirements.txt

Configure Semgrep Rules

Now, let’s create code rules to check the code using regular expressions with Semgrep. First, create a code-rules.yaml file and write the following code.

rules:
  - id: missing-param-description
    severity: ERROR
    message: 【Comment error】 Add correct comment.
    languages:
      - javascript
      - typescript
    patterns:
      - pattern-regex: \*\s*@.*-\s*(\n|\r\n|$)

This rule checks the following content.

❌ Incorrect

When you write a comment with the - character as follows without writing the description of the parameter, an error occurs.

  /**
   * Add two numbers
   * @param {number} a - number for addition
   * @param {number} b -
   * @returns {number}
   */

✅ Correct

When all comments are written as follows, no error occurs.

  /**
   * Add two numbers
   * @param {number} a - number for addition
   * @param {number} b - number for addition
   * @returns {number}
   */

Run Semgrep

Next, let’s check the code using Semgrep. Run the following command to check the code using Semgrep and output the results.

semgrep --config=code-rules.yaml --error

When you run this command, you can see the following results.


┌──── ○○○ ────┐
│ Semgrep CLI │
└─────────────┘


Scanning 7035 files (only git-tracked) with 1 Code rule:

  CODE RULES
  Scanning 5089 files.

  SUPPLY CHAIN RULES

  No rules to run.


  PROGRESS

  ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00


┌──────────────────┐
│ 93 Code Findings │
└──────────────────┘

    apps/sample/view/index.ts
   ❯❯❱ missing-param-description
          【Comment error】 Add correct comment.

           12┆ * @param {Params} params -
           13┆ * @return {number}
...

GitHub Actions

Now, you can use Semgrep to check the code quality and also use it in GitHub Actions. Create the .github/workflows/semgrep.yml file and write the following content.

name: Check code by Semgrep

on:
  pull_request:

jobs:
  semgrep:
    name: Check code by Semgrep
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: actions/setup-python@v5
        with:
          python-version: '3.13'
          cache: 'pip'
      - run: pip install -r requirements.txt
      - name: Run Semgrep
        run: |
          semgrep --config=code-rules.yaml --error

Now, when you create a PR in GitHub Actions, you can see the following error.

Run semgrep --config=code-rules.yaml --error
METRICS: Using configs from the Registry (like --config=p/ci) reports pseudonymous rule metrics to semgrep.dev.
To disable Registry rule metrics, use "--metrics=off".
Using configs only from local files (like --config=xyz.yml) does not enable metrics.

More information: https://semgrep.dev/docs/metrics



┌─────────────┐
│ Scan Status │
└─────────────┘
  Scanning 7034 files tracked by git with 1 Code rule:
  Scanning 5087 files.


┌──────────────────┐
│ 93 Code Findings │
└──────────────────┘

    apps/agencyTool/src/feature/AdminStaffDetailDialog/controller/actions/initialize/index.ts
   ❯❯❱ missing-param-description
          【Comment error】 Add correct comment.

           12┆ * @param {Params} params -
           13┆ * @return {number}
...

Completed

Done! We’ve seen how to check the code quality using Semgrep with regular expressions. By using Semgrep, you can check parts that are difficult to check with static analysis tools like Lint.

If you find the same problem repeatedly in code reviews, first check if there is a rule about the problem in static analysis tools like ESLint. If there is, add the rule.

If there is no rule and the problem is a standardized pattern, try using Semgrep to check the code with regular expressions.

I hope this blog post helps you improve code quality using Semgrep.

Was my blog helpful? Please leave a comment at the bottom. it will be a great help to me!

App promotion

You can use the applications that are created by this blog writer Deku.
Deku created the applications with Flutter.

If you have interested, please try to download them for free.

Posts