Homebrew Tap for justhtml

This tap provides the justhtml CLI via Homebrew.

justhtml is an HTML5 parser CLI with CSS selectors and full html5lib compliance.

Install

brew install diffen/justhtml/justhtml

Verify

justhtml --version

CLI Documentation

The section below is synced from diffen/justhtml-php/CLI.md. Commands are rewritten to use justhtml for Homebrew.

CLI

The justhtml CLI parses HTML, optionally selects nodes with a CSS selector, and outputs HTML, text, or Markdown. It accepts either a file path or - for stdin.

Run it:

From this repo: justhtml
From a Composer install: justhtml

Sample input used below

Create a small input file:

cat > sample.html <<'HTML'
<!doctype html>
<html>
  <body>
    <article id="post">
      <h1>Title</h1>
      <p class="lead">Hello <em>world</em>!</p>
      <p>Second <span>para</span>.</p>
    </article>
  </body>
</html>
HTML

Create a whitespace-focused file:

cat > whitespace.html <<'HTML'
<!doctype html>
<html><body>
  <p class="sep">Alpha<span>Beta</span>Gamma</p>
  <p class="ws">  Hello <span> world </span> ! </p>
</body></html>
HTML

--selector

Select matching nodes (single selector):

justhtml sample.html --selector "p.lead" --format text

Output:

Hello world!

Select multiple selectors with a comma-separated list:

justhtml sample.html --selector "h1, p.lead" --format text

Output:

Title
Hello world!

--format

Choose output format: html, text, or markdown.

HTML output:

justhtml sample.html --selector "p.lead" --format html

Output:

<p class="lead">
  Hello
  <em>world</em>
  !
</p>

Text output:

justhtml sample.html --selector "p.lead" --format text

Output:

Hello world!

Markdown output:

justhtml sample.html --selector "p.lead" --format markdown

Output:

Hello *world*!

--outer / --inner

HTML output uses outer HTML by default. Use --inner to print only the matched node's children (inner HTML). --outer is a no-op that makes the default explicit. These flags only affect --format html.

justhtml sample.html --selector "p.lead" --format html --inner

Output:

Hello
<em>world</em>
!

--attr / --missing

Extract attribute values from matched nodes. Repeat --attr to output multiple attributes per node (tab-separated by default). Missing attributes are replaced with __MISSING__ by default; override with --missing.

justhtml sample.html --selector "p" --attr class --attr id

Output (tab-separated):

lead	__MISSING__
__MISSING__	__MISSING__

Use --separator to change the field separator:

justhtml sample.html --selector "p" --attr class --attr id --separator ","

--attr cannot be combined with --format, --inner, --outer, or --count.

--first

Limit to the first match:

justhtml sample.html --selector "p" --format text

Output:

Hello world!
Second para.

justhtml sample.html --selector "p" --format text --first

Output:

Hello world!

--first is equivalent to --limit 1 and cannot be combined with --limit.

--limit

Limit to the first N matches. This is equivalent to --first when N is 1.

justhtml sample.html --selector "p" --format text --limit 2

Output:

Hello world!
Second para.

--count

Print the number of matching nodes:

justhtml sample.html --selector "p" --count

Output:

--count cannot be combined with --first, --limit, --format, or --attr.

--separator

Join text nodes with a custom separator (text output only). In --attr mode, this controls the field separator (default: tab).

justhtml whitespace.html --selector ".sep" --format text

Output:

Alpha Beta Gamma

justhtml whitespace.html --selector ".sep" --format text --separator ""

Output:

AlphaBetaGamma

--strip / --no-strip

By default, each text node is trimmed and empty nodes are dropped (--strip). Use --no-strip to preserve the original whitespace within text nodes.

Default (strip on):

justhtml whitespace.html --selector ".ws" --format text

Output:

Hello world !

Preserve whitespace:

justhtml whitespace.html --selector ".ws" --format text --no-strip

Output (spaces shown between | markers):

|  Hello   world   ! |

Stdin

Read from stdin by passing - as the path:

cat sample.html | justhtml - --selector "p.lead" --format text

Output:

Hello world!

Piping examples (real pages)

These examples use a live page and pipe HTML into justhtml.

# Extract the first non-empty paragraph as text
curl -s https://en.wikipedia.org/wiki/Earth | \
  justhtml - --selector "#mw-content-text p:not(:empty)" --format text --first

# Extract links from the lead section (first 10 hrefs)
curl -s https://en.wikipedia.org/wiki/Earth | \
  justhtml - --selector "#mw-content-text p a" --attr href --limit 10 --separator "\n"

# Get the lead section as Markdown
curl -s https://en.wikipedia.org/wiki/Earth | \
  justhtml - --selector "#mw-content-text" --format markdown --first

# Count images on the page
curl -s https://en.wikipedia.org/wiki/Earth | \
  justhtml - --selector "img" --count

# Output the infobox as HTML (outer HTML)
curl -s https://en.wikipedia.org/wiki/Earth | \
  justhtml - --selector "table.infobox" --format html --outer --first

# Preserve whitespace and separate paragraphs
curl -s https://en.wikipedia.org/wiki/Earth | \
  justhtml - --selector "#mw-content-text p" --format text --no-strip --separator "\n\n" --limit 3

# Build a quick table of contents from headings
curl -s https://en.wikipedia.org/wiki/Earth | \
  justhtml - --selector "#mw-content-text h2, #mw-content-text h3" --format text --separator "\n"

--version and --help

justhtml --version

Output:

justhtml dev

justhtml --help

Output: prints the full usage/help text.

Upgrading

brew upgrade justhtml

Uninstall

brew uninstall justhtml

If you installed via the tap and want to remove it:

brew untap diffen/justhtml

Troubleshooting

“justhtml: command not found”

Make sure your Homebrew prefix is on PATH:

brew --prefix

Then ensure $(brew --prefix)/bin is on your PATH.

Xdebug warning on `justhtml --version`

If you see an Xdebug warning from your PHP configuration, you can disable it for a single run:

XDEBUG_MODE=off justhtml --version

Formula

The formula lives at:

Formula/justhtml.rb

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Formula		Formula
scripts		scripts
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Homebrew Tap for justhtml

Install

Verify

CLI Documentation

CLI

Sample input used below

--selector

--format

--outer / --inner

--attr / --missing

--first

--limit

--count

--separator

--strip / --no-strip

Stdin

Piping examples (real pages)

--version and --help

Upgrading

Uninstall

Troubleshooting

“justhtml: command not found”

Xdebug warning on `justhtml --version`

Formula

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Homebrew Tap for justhtml

Install

Verify

CLI Documentation

CLI

Sample input used below

--selector

--format

--outer / --inner

--attr / --missing

--first

--limit

--count

--separator

--strip / --no-strip

Stdin

Piping examples (real pages)

--version and --help

Upgrading

Uninstall

Troubleshooting

“justhtml: command not found”

Xdebug warning on justhtml --version

Formula

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Xdebug warning on `justhtml --version`

Packages