github.com/amcrypto-jp/codesearch

Indexed source search with RE2 regular expressions.

This fork keeps the original Code Search command-line tools usable on current Go releases and adds practical indexing, Windows, and search workflow fixes.

Quick Start

git clone https://github.com/amcrypto-jp/codesearch
cd codesearch
go install ./cmd/...

cindex ~/src/project
csearch 'func main'

What This Fork Adds

The module path stays compatible with the original codebase, while the repository URL and documentation now point at the maintained fork.

How It Works

Build once, search repeatedly.

`cindex` records roots and trigram postings. `csearch` uses the index to identify candidate files, then opens those files to verify the RE2 match. `csweb` reads the same index for browser-based exploration.

cindex builds a trigram index that csearch and csweb query.
Indexing and search flow used by the command-line and web tools.

Command Reference

Four tools, one index.

The default index file is `$CSEARCHINDEX`, or `$HOME/.csearchindex` when `$CSEARCHINDEX` is unset. `cindex`, `csearch`, and `csweb` also accept `-indexpath FILE`.

cindex

Creates or updates the trigram index.

cindex [options] [path...]
  • -reset discards the existing index.
  • -list prints indexed roots.
  • -check validates the index format.
  • -exclude FILE reads exclusion patterns.
  • -filelist FILE reads paths to index, one per line.
  • -includehidden indexes dot-files and dot-directories except VCS directories.
  • -follow-symlinks follows symlinked files and directories under their symlink paths.
  • -zip indexes content inside ZIP files.
  • -logskip logs why files are skipped.
  • -stats prints index size statistics.

csearch

Searches indexed files and verifies matches against file contents.

csearch [options] regexp
  • -f REGEXP searches only matching file names.
  • -i performs case-insensitive search.
  • -n prints line numbers.
  • -l -0 prints matching file names separated by NUL bytes.
  • -B N, -A N, and -C N print context.
  • -m N stops after N total matches.
  • -M N stops after N matches per file.
  • -brute searches every indexed file.
  • -all also walks indexed roots to search unindexed regular files.
  • -includehidden includes hidden files during -all searches.

cgrep

Greps explicit files or standard input with the same regexp engine.

cgrep [options] regexp [file...]
  • -i performs case-insensitive search.
  • -n prints line numbers.
  • -h suppresses file name prefixes.
  • -l -0 prints matching file names separated by NUL bytes.
  • -c prints match counts.
  • -v prints non-matching lines.
  • -B N, -A N, and -C N print context.

csweb

Starts a local web UI backed by the same index file.

csweb -indexpath /tmp/project.index

Open http://localhost:2473 after the server starts.

Text Detection

By default `cindex` skips hidden paths, backup names, VCS directories, symlinks, binary files, invalid UTF-8, very long files, very long lines, and files with too many distinct trigrams.

  • -maxfilelen N skips files larger than N bytes.
  • -maxlinelen N skips files with a line longer than N bytes.
  • -maxtrigrams N skips files with more than N distinct trigrams.
  • -maxinvalidutf8ratio R permits a limited invalid UTF-8 byte-pair ratio.

Pattern Files

Share exclusions across indexing and search.

Pattern files used by `-exclude` contain one filepath pattern per line. Blank lines and lines beginning with `#` are ignored.

vendor
*.min.js
generated/*
third_party/*

Patterns without path separators match a file or directory base name. Patterns containing path separators match the slash-separated path.

History

Original design and community fixes.

Original Code Search was written by Russ Cox. For historical context and the original implementation notes, read Regular Expression Matching with a Trigram Index.

This fork includes fixes and command-line features derived from long-running community forks, including work by Manpreet Singh, Patrick Mezard, Benoit Mortgat, and Macoy Madson.