A Copy/Paste Detector CLI for PHP 8.5+

Last updated on by

A Copy/Paste Detector CLI for PHP 8.5+ image

phpcpd-next scans your PHP code and reports blocks that have been copy/pasted from one place to another — the kind of duplication that's easy to miss in review and painful to keep in sync later.

It's maintained by Luciano Federico Pereira as a successor to Sebastian Bergmann's phpcpd (archived), and stays a drop-in replacement with the same phpcpd command.

What's new is that it catches more than word-for-word copies: it also flags duplicates where the lines were reordered, or where a statement was added or removed between two otherwise identical blocks:

  • Three detection engines — Rabin-Karp (exact), TokenBag (reordered), and an opt-in suffix tree (gapped Type-3), with rename-insensitive --fuzzy matching on top
  • Four output formats — console text, PMD-CPD XML, JSON, and SARIF 2.1.0 for GitHub Code Scanning
  • A headless API for calling detection in-process, plus a PHPUnit trait that turns duplication into a test assertion
  • CI features — meaningful exit codes, full result caching, and per-file incremental indexing
  • Framework presets including Laravel, with CLI flags that override preset defaults
  • PHP 8.5+ with zero Composer runtime dependencies and deterministic results

Three Engines, Run Together by Default

Most copy/paste detectors only find exact duplication. phpcpd-next runs Rabin-Karp (exact contiguous matches) and TokenBag (order-invariant overlap, so shuffled statements still register) together on every default run. A suffix-tree engine for gapped clones — where a statement was inserted or removed between otherwise identical blocks — is opt-in:

# Default: exact + reordered detection
phpcpd src/
 
# Rabin-Karp only (faster, no reorder detection)
phpcpd --rk src/
 
# Gapped Type-3 clones via suffix tree
phpcpd --algorithm=suffixtree src/

The console output points at the duplicated ranges and suggests a refactor rather than just listing line numbers:

Found 2 code clones with 21 duplicated lines in 2 files:
 
- app/Services/Billing.php:12-33 (21 lines)
app/Services/Invoicing.php:40-61
→ Consider extracting the shared lines into a reusable method or constant.
 
37.50% duplicated lines out of 56 total lines of code.

SARIF Output for GitHub Code Scanning

Alongside PMD-CPD XML and JSON, phpcpd-next writes SARIF 2.1.0, so clones show up in the GitHub Security tab. Inconsistent (diverged) clones map to warning severity and exact clones to note:

- name: Detect duplicated code
run: vendor/bin/phpcpd --log-sarif=phpcpd.sarif src/ || true
 
- name: Upload results
uses: github/codeql-action/upload-sarif@v3
with:
sarif_file: phpcpd.sarif

Headless API and PHPUnit Assertions

Beyond the CLI, detection runs in-process through a static detect() call — no shelling out, no report files:

use LucianoPereira\PhpcpdNext\Phpcpd;
 
$clones = Phpcpd::detect(
paths: 'app',
minTokens: 60,
algorithm: null, // null = Rabin-Karp + TokenBag
preset: 'laravel',
);
 
foreach ($clones as $clone) {
echo $clone->numberOfLines(), " lines\n";
}

A bundled trait turns that into a test, so duplication becomes a regression check that fails with clone locations:

use LucianoPereira\PhpcpdNext\PHPUnit\AssertNoDuplication;
use PHPUnit\Framework\TestCase;
 
final class DuplicationTest extends TestCase
{
use AssertNoDuplication;
 
public function test_app_is_dry(): void
{
$this->assertNoDuplication(__DIR__ . '/../app', minTokens: 70);
}
}

Incremental Caching for CI

For larger codebases, --cache stores results keyed by a configuration fingerprint and file-manifest hash, replaying the cached result when nothing changed. --incremental goes further, re-tokenizing only changed files and reusing the rest from a per-file index (Rabin-Karp only), printing a summary like (incremental index: 412 reused, 3 scanned):

- uses: actions/cache@v4
with:
path: .phpcpd-cache
key: phpcpd-${{ hashFiles('**/*.php') }}
restore-keys: phpcpd-
- run: vendor/bin/phpcpd --incremental --cache-dir .phpcpd-cache src/

Installation

The tool requires PHP 8.5+, ext-dom, and ext-mbstring, and installs as a dev dependency:

composer require --dev phpcpd-next/phpcpd
vendor/bin/phpcpd src/

A Laravel preset scans app, routes, database, and config while excluding vendor code, Blade views, migrations, and IDE-helper files:

vendor/bin/phpcpd --preset=laravel app/Services --min-tokens=60

You can find the source and full documentation on GitHub.

Paul Redmond photo

Staff writer at Laravel News. Full stack web developer and author.

Filed in:
Cube

Laravel Newsletter

Join 40k+ other developers and never miss out on new tips, tutorials, and more.

image
SerpApi

The Web Search API for Your LLM and AI Applications

Visit SerpApi
Harpoon: Next generation time tracking and invoicing logo

Harpoon: Next generation time tracking and invoicing

The next generation time-tracking and billing software that helps your agency plan and forecast a profitable future.

Harpoon: Next generation time tracking and invoicing
No Compromises logo

No Compromises

Joel and Aaron, the two seasoned devs from the No Compromises podcast, are now available to hire for your Laravel project. ⬧ Flat rate of $9500/mo. ⬧ No lengthy sales process. ⬧ No contracts. ⬧ 100% money back guarantee.

No Compromises
Shift logo

Shift

Running an old Laravel version? Instant, automated Laravel upgrades and code modernization to keep your applications fresh.

Shift
Acquaint Softtech logo

Acquaint Softtech

Acquaint Softtech offers AI-ready Laravel developers who onboard in 48 hours at $3000/Month with no lengthy sales process and a 100 percent money-back guarantee.

Acquaint Softtech
Lucky Media logo

Lucky Media

Get Lucky Now - the ideal choice for Laravel Development, with over a decade of experience!

Lucky Media
SaaSykit: Laravel SaaS Starter Kit logo

SaaSykit: Laravel SaaS Starter Kit

SaaSykit is a Multi-tenant Laravel SaaS Starter Kit that comes with all features required to run a modern SaaS. Payments, Beautiful Checkout, Admin Panel, User dashboard, Auth, Ready Components, Stats, Blog, Docs and more.

SaaSykit: Laravel SaaS Starter Kit
Tinkerwell logo

Tinkerwell

The must-have code runner for Laravel developers. Tinker with AI, autocompletion and instant feedback on local and production environments.

Tinkerwell
PhpStorm logo

PhpStorm

The go-to PHP IDE with extensive out-of-the-box support for Laravel and its ecosystem.

PhpStorm
Laravel Cloud logo

Laravel Cloud

Easily create and manage your servers and deploy your Laravel applications in seconds.

Laravel Cloud
Kirschbaum logo

Kirschbaum

Providing innovation and stability to ensure your web application succeeds.

Kirschbaum

The latest

View all →
Commune: A Private Community for Founders and Builders image

Commune: A Private Community for Founders and Builders

Read article
 Laravel AI Tasks: An AI Orchestration Package for Queues, Logging, and Cost Control image

Laravel AI Tasks: An AI Orchestration Package for Queues, Logging, and Cost Control

Read article
Worker Metrics on the WorkerStopping Event in Laravel 13.18 image

Worker Metrics on the WorkerStopping Event in Laravel 13.18

Read article
Clonio CLI: Clone Production Databases With Anonymized Data image

Clonio CLI: Clone Production Databases With Anonymized Data

Read article
Ship AI with Laravel: Test Your AI System with Zero API Calls image

Ship AI with Laravel: Test Your AI System with Zero API Calls

Read article
Laravel WhatsApp: Two Backends Behind One Facade image

Laravel WhatsApp: Two Backends Behind One Facade

Read article