HTML table data extractor for Go

GoDoc MIT license codecov build

htmltable enables structured data extraction from HTML tables and URLs and requires almost no external dependencies.


You can retrieve a slice of header-annotated types using the NewSlice* contructors:

import ""

type Ticker struct {
    Symbol   string `header:"Symbol"`
    Security string `header:"Security"`
    CIK      string `header:"CIK"`

url := ""
out, _ := htmltable.NewSliceFromURL[Ticker](url)

// Output: 
// MMM
// 3M

An error would be thrown if there’s no matching page with the specified columns:

page, _ := htmltable.NewFromURL("")
_, err := page.FindWithColumns("invalid", "column", "names")

// Output: 
// cannot find table with columns: invalid, column, names

And you can use more low-level API to work with extracted data:

page, _ := htmltable.NewFromString(`<body>
        <tr><td> 1 </td><td>2</td></tr>
        <tr><td>3  </td><td>4   </td></tr>

fmt.Printf("found %d tables\n", page.Len())
_ = page.Each2("c", "d", func(c, d string) error {
    fmt.Printf("c:%s d:%s\n", c, d)
    return nil

// Output: 
// found 2 tables
// c:2 d:5
// c:4 d:6

And the last note: you’re encouraged to plug your own structured logger:

htmltable.Logger = func(_ context.Context, msg string, fields ...any) {
    fmt.Printf("[INFO] %s %v\n", msg, fields)

// Output:
// [INFO] found table [columns [Symbol Security SEC filings GICSSector GICS Sub-Industry Headquarters Location Date first added CIK Founded] count 504]
// [INFO] found table [columns [Date Added Ticker Added Security Removed Ticker Removed Security Reason] count 308]


This library aims to be something like pandas.read_html or table_extract Rust crate, but more idiomatic for Go.


View Github