Laravel Cloud is here! Zero-config managed infrastructure for Laravel apps. Deploy now.

Hypertext

stevebauman/hypertext image

Hypertext stats

Downloads
9
Stars
31
Open Issues
0
Forks
3

View on GitHub →

The best HTML to text transformer

Hypertext

A PHP HTML to pure text transformer that beautifully handles various and malformed HTML.


Hypertext is excellent at pulling text content out of any HTML based document and automatically:

  • Removes CSS
  • Removes scripts
  • Removes headers
  • Removes non-HTML based content
  • Preserves spacing
  • Preserves links (optional)
  • Preserves new lines (optional)

It is directed at using the output in LLM related tasks, such as prompts and embeddings.

Installation

composer require stevebauman/hypertext

Usage

use Stevebauman\Hypertext\Transformer;
 
$transformer = new Transformer();
 
// (Optional) Retain new line characters.
$transformer->keepNewLines();
 
// (Optional) Retain anchor tags and their href attribute.
$transformer->keepLinks();
 
$text = $transformer->toText($html);

Example

For larger examples, please view the tests/Fixtures directory.

Input:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>My Blog</title>
</head>
<body>
<h1>Welcome to My Blog</h1>
<p>This is a paragraph of text on my webpage.</p>
<a href="https://blog.com/posts">Click here</a> to view my posts.
</body>
</html>

Output (Pure Text):

echo (new Transformer)->toText($input);
Welcome to My Blog This is a paragraph of text on my webpage. Click here to view my posts.

Output (Keep New Lines):

echo (new Transformer)->keepNewLines()->toText($input);
Welcome to My Blog
This is a paragraph of text on my webpage.
Click here to view my posts.

Output (Keep Links):

echo (new Transformer)->keepLinks()->toText($input);
Welcome to My Blog This is a paragraph of text on my webpage. <a href="https://blog.com/posts">Click Here</a> to view my posts.

Output (Keep Both):

echo (new Transformer)
->keepLinks()
->keepNewLines()
->toText($input);
Welcome to My Blog
This is a paragraph of text on my webpage.
<a href="https://blog.com/posts">Click Here</a> to view my posts.
stevebauman photo

I like to build things on the web 💻

Cube

Laravel Newsletter

Join 40k+ other developers and never miss out on new tips, tutorials, and more.


Stevebauman Hypertext Related Articles

Laravel Htmx image

Laravel Htmx

Read article
How to Convert HTML to Plain Text in PHP image

How to Convert HTML to Plain Text in PHP

Read article
Laravel 7.11 Released image

Laravel 7.11 Released

Read article
How To: Optimizing SSL on Laravel Forge image

How To: Optimizing SSL on Laravel Forge

Read article
Curotec logo

Curotec

World class Laravel experts with GenAI dev skills. LATAM-based, embedded engineers that ship fast, communicate clearly, and elevate your product. No bloat, no BS.

Curotec
Shift logo

Shift

Running an old Laravel version? Instant, automated Laravel upgrades and code modernization to keep your applications fresh.

Shift
CodeKudu logo

CodeKudu

Stand-ups, Retrospectives, and 360° Feedback for the entire team. 50% off with code LARAVELNEWS.

CodeKudu
Typesense Search logo

Typesense Search

Typesense is an open source, blazing-fast search engine, optimized for helping you build delightful search experiences for your sites and apps. Natively integrated with Laravel Scout.

Typesense Search
SaaSykit: Laravel SaaS Starter Kit logo

SaaSykit: Laravel SaaS Starter Kit

SaaSykit is a Multi-tenant Laravel SaaS Starter Kit that comes with all features required to run a modern SaaS. Payments, Beautiful Checkout, Admin Panel, User dashboard, Auth, Ready Components, Stats, Blog, Docs and more.

SaaSykit: Laravel SaaS Starter Kit
Kirschbaum logo

Kirschbaum

Providing innovation and stability to ensure your web application succeeds.

Kirschbaum