Skip to content

Latest commit

 

History

History
34 lines (25 loc) · 666 Bytes

File metadata and controls

34 lines (25 loc) · 666 Bytes

HTML Ingestion

IceFrame allows you to scrape tables from HTML pages.

Usage

from iceframe import IceFrame
from iceframe.utils import load_catalog_config_from_env

config = load_catalog_config_from_env()
ice = IceFrame(config)

# Read tables from a URL
ice.create_table_from_html(
    "my_namespace.wikipedia_data",
    "https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)"
)

# Match a specific table
ice.create_table_from_html(
    "my_namespace.specific_table",
    "https://example.com/data",
    match="Population"
)

Dependencies

Requires lxml, html5lib, and beautifulsoup4.

pip install "iceframe[html]"