Rex

Version Go Report Card Coverage Status PkgGoDev

rex-gopher

This is a regular expressions builder for gophers!

Why?

It makes readability better and helps to construct regular expressions using human-friendly constructions. Also, it allows commenting and reusing blocks, which improves the quality of code.

It is just a builder, so it returns standart *regexp.Regexp.

The library supports groups, composits, classes, flags, repetitions and if you want you can even use raw regular expressions in any place. Also it contains a set of predefined helpers for matching phones, emails, etc…

Let’s see an example of validating (some_id[#]):

// Using this builder.
re := rex.New(
    rex.Chars.Begin(), // `^`
    // ID should begin with lowercased character.
    rex.Chars.Lower().Repeat().OneOrMore(), // `[a-z]+`
    // ID should contain number inside brackets [#].
    rex.Chars.Single('['),                   // `[`
    rex.Chars.Digits().Repeat().OneOrMore(), // `[0-9]+`
    rex.Chars.Single(']'),                   // `]`
    rex.Chars.End(),                         // `$`
).MustCompile()

Yes, it requires more code, but it has its advantages.

More, but simpler code, fewer bugs.

Meme

Drake Hotline Bling meme

The picture contains two frame fragments from the video.

Documentation

import "github.com/hedhyw/rex/pkg/rex"

func main() {
    rex.New(/* tokens */).MustCompile() // The same as `regexp.MustCompile`.
    rex.New(/* tokens */).Compile() // The same as `regexp.Compile`.
    rex.New(/* tokens */).String() // Get constructed regular expression as a string.
}

Common

Common operators for core operations.

rex.Common.Raw(raw string) // Raw regular expression.
rex.Common.Text(text string) // Escaped text.
rex.Common.Class(tokens ...dialect.ClassToken) // Include specified characters.
rex.Common.NotClass(tokens ...dialect.ClassToken) // Exclude specified characters.

Character classes

Single characters and classes, that can be used as-is, as well as childs to rex.CommonClass or rex.CommonNotClass.

rex.Chars.Begin()                // `^`
rex.Chars.End()                  // `$`
rex.Chars.Any()                  // `.`
rex.Chars.Range('a', 'z')        // `[a-z]`
rex.Chars.Runes("abc")           // `[abc]`
rex.Chars.Single('r')            // `r`
rex.Chars.Unicode(unicode.Greek) // `\p{Greek}`
rex.Chars.UnicodeByName("Greek") // `\p{Greek}`

rex.Chars.Digits()               // `[0-9]`
rex.Chars.Alphanumeric()         // `[0-9A-Za-z]`
rex.Chars.Alphabetic()           // `[A-Za-z]`
rex.Chars.ASCII()                // `[\x00-\x7F]`
rex.Chars.Whitespace()           // `[\t\n\v\f\r ]`
rex.Chars.WordCharacter()        // `[0-9A-Za-z_]`
rex.Chars.Blank()                // `[\t ]`
rex.Chars.Control()              // `[\x00-\x1F\x7F]`
rex.Chars.Graphical()            // `[[:graph:]]`
rex.Chars.Lower()                // `[a-z]`
rex.Chars.Printable()            // `[ [:graph:]]`
rex.Chars.Punctuation()          // `[!-/:[email protected][-`{-~]`
rex.Chars.Upper()                // `[A-Z]`
rex.Chars.HexDigits()            // `[0-9A-Fa-f]`

If you want to combine mutiple character classes, use rex.Common.Class:

// Only specific characters:
rex.Common.Class(rex.Chars.Digits(), rex.Chars.Single('a'))
// It will produce `[0-9a]`.

// All characters except:
rex.Common.NotClass(rex.Chars.Digits(), rex.Chars.Single('a'))
// It will produce `[^0-9a]`.

Groups

Helpers for grouping expressions.

// Define a captured group. That can help to select part of the text.
rex.Group.Define(rex.Chars.Single('a'), rex.Chars.Single('b')) // (ab)
// A group that defines "OR" condition for given expressions.
// Example: "a" or "rex", ...
rex.Group.Composite(rex.Chars.Single('a'), rex.Common.Text("rex")) // (?:a|rex)

// Define non-captured group. The result will not be captured.
rex.Group.Define(rex.Chars.Single('a')).NonCaptured() // (?:a)
// Define a group with a name.
rex.Group.Define(rex.Chars.Single('a')).WithName("my_name") // (?P<my_name>a)

Flags

// TODO: https://github.com/hedhyw/rex/issues/31

Repetitions

Helpers that specify how to repeat characters. They can be called on character class tokens.

RepetableClassToken.Repeat().OneOrMore() // `+`
RepetableClassToken.ZeroOrMore() // `*`
RepetableClassToken.ZeroOrOne() // `?`
RepetableClassToken.EqualOrMoreThan(n int) // `{n,}`
RepetableClassToken.Between(n, m int) // `{n,m}`

// Example:
rex.Chars.Digits().Repeat().OneOrMore() // [0-9]+
rex.Group.Define(rex.Chars.Single('a')).Repeat().OneOrMore() // (a)+

Helper

Common regular expression patters that are ready to use.

⚠️ These patterns are likely to be changed in new versions.

rex.Helper.Phone() // Combines PhoneE164 and PhoneE123.
rex.Helper.PhoneE164() // +155555555
rex.Helper.PhoneE123() // Combines PhoneNationalE123 and PhoneInternationalE123.
rex.Helper.PhoneNationalE123() // (607) 123 4567
rex.Helper.PhoneInternationalE123() // +22 607 123 4567
rex.Helper.HostnameRFC952() // Hostname by RFC-952 (stricter).
rex.Helper.HostnameRFC1123() // Hostname by RFC-1123.
rex.Helper.Email() // Unquoted email pattern, it doesn't check RFC 5322 completely, due to high complexity.
rex.Helper.IP()   // IPv4 or IPv6.
rex.Helper.IPv4() // 127.0.0.1
rex.Helper.IPv6() // 2001:0db8:85a3:0000:0000:8a2e:0370:7334
rex.Helper.MD5Hex() // d41d8cd98f00b204e9800998ecf8427e
rex.Helper.SHA1Hex() // da39a3ee5e6b4b0d3255bfef95601890afd80709
rex.Helper.SHA256Hex() // e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Examples

Simple email validator

Let’s describe a simple email regular expression in order to show the basic functionality (there is a more advanced helper rex.Helper.Email()):

// We can define a set of characters and reuse the block.
customCharacters := rex.Common.Class(
    rex.Chars.Range('a', 'z'), // `[a-z]`
    rex.Chars.Upper(),         // `[A-Z]`
    rex.Chars.Single('-'),     // `\x2D`
    rex.Chars.Digits(),        // `[0-9]`
) // `[a-zA-Z-0-9]`

re := rex.New(
    rex.Chars.Begin(), // `^`
    customCharacters.Repeat().OneOrMore(),

    // Email delimeter.
    rex.Chars.Single('@'), // `@`

    // Allow dot after delimter.
    rex.Common.Class(
        rex.Chars.Single('.'), // \.
        customCharacters,
    ).Repeat().OneOrMore(),

    // Email should contain at least one dot.
    rex.Chars.Single('.'), // `\.`
    rex.Chars.Alphanumeric().Repeat().Between(2, 3),

    rex.Chars.End(), // `$`
).MustCompile()

Raw regular expression

rex.New(
    rex.Chars.Begin(), // `^`
    rex.Common.Raw("[a-zA-Z0-9]+"), // `[a-zA-Z0-9]+`
    rex.Chars.Single('@'), // `@`
    rex.Common.Raw("[a-zA-Z0-9]+"), // `[a-zA-Z0-9]+`
    rex.Chars.End(), // `$`
).MustCompile()

// Or even!

rex.New(
    rex.Common.Raw(`^[a-zA-Z\d][email protected][a-zA-Z\d]+\.[a-zA-Z\d]{2,3}$`),
).MustCompile()

Simple composite

re := rex.New(
    rex.Chars.Begin(),
    rex.Group.Composite(
        // Text matches exact text (symbols will be escaped).
        rex.Common.Text("hello."),
        // OR one or more numbers.
        rex.Chars.Digits().Repeat().OneOrMore(),
    ),
    rex.Chars.End(),
).MustCompile()

re.MatchString("hello.")    // true
re.MatchString("hello")     // false
re.MatchString("123")       // true
re.MatchString("hello.123") // false

Example groups usage

re := rex.New(
    // Define a named group.
    rex.Group.Define(
        rex.Helper.Phone(),
    ).WithName("phone"),
).MustCompile()

const text = `
E.164:      +15555555
E.123.Intl: (607) 123 4567
E.123.Natl: +22 607 123 4567
`

submatches := re.FindAllStringSubmatch(text, -1)
// submatches[0]: +15555555
// submatches[1]: (607) 123 4567
// submatches[2]: +22 607 123 4567

More examples

More examples can be found here: examples_test.go.

GitHub

View Github