clang-tools  14.0.0git
Namespaces | Classes | Typedefs | Functions
clang::clangd::dex Namespace Reference

Namespaces

 detail
 

Classes

struct  Chunk
 NOTE: This is an implementation detail. More...
 
class  Corpus
 
class  Dex
 In-memory Dex trigram-based index implementation. More...
 
class  Iterator
 Iterator is the interface for Query Tree node. More...
 
class  PostingList
 PostingList is the storage of DocIDs which can be inserted to the Query Tree as a leaf by constructing Iterator over the PostingList object. More...
 
class  Token
 A Token represents an attribute of a symbol, such as a particular trigram present in the name (used for fuzzy search). More...
 
class  Trigram
 

Typedefs

using DocID = uint32_t
 Symbol position in the list of all index symbols sorted by a pre-computed symbol quality. More...
 

Functions

std::vector< std::string > generateProximityURIs (llvm::StringRef URIPath)
 Returns Search Token for a number of parent directories of given Path. More...
 
std::vector< std::pair< DocID, float > > consume (Iterator &It)
 Advances the iterator until it is exhausted. More...
 
template<typename Func >
static void identifierTrigrams (llvm::StringRef Identifier, Func Out)
 
void generateIdentifierTrigrams (llvm::StringRef Identifier, std::vector< Trigram > &Out)
 Produces list of unique fuzzy-search trigrams from unqualified symbol. More...
 
std::vector< TokengenerateQueryTrigrams (llvm::StringRef Query)
 Returns list of unique fuzzy-search trigrams given a query. More...
 

Typedef Documentation

◆ DocID

using clang::clangd::dex::DocID = typedef uint32_t

Symbol position in the list of all index symbols sorted by a pre-computed symbol quality.

Definition at line 46 of file Iterator.h.

Function Documentation

◆ consume()

std::vector< std::pair< DocID, float > > clang::clangd::dex::consume ( Iterator It)

Advances the iterator until it is exhausted.

Returns pairs of document IDs with the corresponding boosting score.

Boosting can be seen as a compromise between retrieving too many items and calculating finals score for each of them (which might be very expensive) and not retrieving enough items so that items with very high final score would not be processed. Boosting score is a computationally efficient way to acquire preliminary scores of requested items.

Definition at line 357 of file Iterator.cpp.

◆ generateIdentifierTrigrams()

void clang::clangd::dex::generateIdentifierTrigrams ( llvm::StringRef  Identifier,
std::vector< Trigram > &  Out 
)

Produces list of unique fuzzy-search trigrams from unqualified symbol.

The trigrams give the 3-character query substrings this symbol can match.

The symbol's name is broken into segments, e.g. "FooBar" has two segments. Trigrams can start at any character in the input. Then we can choose to move to the next character or to the start of the next segment.

Short trigrams (length 1-2) are used for short queries. These are:

  • prefixes of the identifier, of length 1 and 2
  • the first character + next head character

For "FooBar" we get the following trigrams: {f, fo, fb, foo, fob, fba, oob, oba, bar}.

Trigrams are lowercase, as trigram matching is case-insensitive. Trigrams in the list are deduplicated.

Definition at line 81 of file Trigram.cpp.

◆ generateProximityURIs()

std::vector< std::string > clang::clangd::dex::generateProximityURIs ( llvm::StringRef  URIPath)

Returns Search Token for a number of parent directories of given Path.

Should be used within the index build process.

This function is exposed for testing only.

Definition at line 335 of file Dex.cpp.

◆ generateQueryTrigrams()

std::vector< Token > clang::clangd::dex::generateQueryTrigrams ( llvm::StringRef  Query)

Returns list of unique fuzzy-search trigrams given a query.

Query is segmented using FuzzyMatch API and downcasted to lowercase. Then, the simplest trigrams - sequences of three consecutive letters and digits are extracted and returned after deduplication.

For short queries (less than 3 characters with Head or Tail roles in Fuzzy Matching segmentation) this returns a single trigram with the first characters (up to 3) to perform prefix match.

Definition at line 101 of file Trigram.cpp.

◆ identifierTrigrams()

template<typename Func >
static void clang::clangd::dex::identifierTrigrams ( llvm::StringRef  Identifier,
Func  Out 
)
static