clang-tools 19.0.0git
Namespaces | Classes | Typedefs | Functions | Variables
clang::clangd::dex Namespace Reference

Namespaces

namespace  detail
 

Classes

struct  Chunk
 NOTE: This is an implementation detail. More...
 
class  Corpus
 
class  Dex
 In-memory Dex trigram-based index implementation. More...
 
class  Iterator
 Iterator is the interface for Query Tree node. More...
 
class  PostingList
 PostingList is the storage of DocIDs which can be inserted to the Query Tree as a leaf by constructing Iterator over the PostingList object. More...
 
class  Token
 A Token represents an attribute of a symbol, such as a particular trigram present in the name (used for fuzzy search). More...
 
class  Trigram
 

Typedefs

using DocID = uint32_t
 Symbol position in the list of all index symbols sorted by a pre-computed symbol quality.
 

Functions

llvm::StringRef findPathInURI (llvm::StringRef S)
 
llvm::SmallVector< llvm::StringRef, ProximityURILimitgenerateProximityURIs (llvm::StringRef)
 Returns Search Token for a number of parent directories of given Path.
 
std::vector< std::pair< DocID, float > > consume (Iterator &It)
 Advances the iterator until it is exhausted.
 
template<typename Func >
static void identifierTrigrams (llvm::StringRef Identifier, Func Out)
 
void generateIdentifierTrigrams (llvm::StringRef Identifier, std::vector< Trigram > &Out)
 Produces list of unique fuzzy-search trigrams from unqualified symbol.
 
std::vector< TokengenerateQueryTrigrams (llvm::StringRef Query)
 Returns list of unique fuzzy-search trigrams given a query.
 

Variables

constexpr unsigned ProximityURILimit = 5
 

Typedef Documentation

◆ DocID

using clang::clangd::dex::DocID = typedef uint32_t

Symbol position in the list of all index symbols sorted by a pre-computed symbol quality.

Definition at line 45 of file Iterator.h.

Function Documentation

◆ consume()

std::vector< std::pair< DocID, float > > clang::clangd::dex::consume ( Iterator It)

Advances the iterator until it is exhausted.

Returns pairs of document IDs with the corresponding boosting score.

Boosting can be seen as a compromise between retrieving too many items and calculating finals score for each of them (which might be very expensive) and not retrieving enough items so that items with very high final score would not be processed. Boosting score is a computationally efficient way to acquire preliminary scores of requested items.

Definition at line 357 of file Iterator.cpp.

References clang::clangd::dex::Iterator::advance(), clang::clangd::dex::Iterator::consume(), clang::clangd::dex::Iterator::peek(), and clang::clangd::dex::Iterator::reachedEnd().

Referenced by clang::clangd::dex::Dex::fuzzyFind().

◆ findPathInURI()

llvm::StringRef clang::clangd::dex::findPathInURI ( llvm::StringRef  S)

Definition at line 359 of file Dex.cpp.

References C, and findPathInURI().

Referenced by findPathInURI(), and generateProximityURIs().

◆ generateIdentifierTrigrams()

void clang::clangd::dex::generateIdentifierTrigrams ( llvm::StringRef  Identifier,
std::vector< Trigram > &  Out 
)

Produces list of unique fuzzy-search trigrams from unqualified symbol.

The trigrams give the 3-character query substrings this symbol can match.

The symbol's name is broken into segments, e.g. "FooBar" has two segments. Trigrams can start at any character in the input. Then we can choose to move to the next character or to the start of the next segment.

Short trigrams (length 1-2) are used for short queries. These are:

  • prefixes of the identifier, of length 1 and 2
  • the first character + next head character

For "FooBar" we get the following trigrams: {f, fo, fb, foo, fob, fba, oob, oba, bar}.

Trigrams are lowercase, as trigram matching is case-insensitive. Trigrams in the list are deduplicated.

Definition at line 100 of file Trigram.cpp.

References clang::clangd::Identifier, and identifierTrigrams().

◆ generateProximityURIs()

llvm::SmallVector< llvm::StringRef, 5 > clang::clangd::dex::generateProximityURIs ( llvm::StringRef  )

Returns Search Token for a number of parent directories of given Path.

Should be used within the index build process.

This function is exposed for testing only.

Definition at line 374 of file Dex.cpp.

References findPathInURI(), generateProximityURIs(), and ProximityURILimit.

Referenced by generateProximityURIs().

◆ generateQueryTrigrams()

std::vector< Token > clang::clangd::dex::generateQueryTrigrams ( llvm::StringRef  Query)

Returns list of unique fuzzy-search trigrams given a query.

Query is segmented using FuzzyMatch API and downcasted to lowercase. Then, the simplest trigrams - sequences of three consecutive letters and digits are extracted and returned after deduplication.

For short queries (less than 3 characters with Head or Tail roles in Fuzzy Matching segmentation) this returns a single trigram with the first characters (up to 3) to perform prefix match.

Definition at line 123 of file Trigram.cpp.

References clang::clangd::calculateRoles(), clang::clangd::Head, clang::clangd::Tail, and clang::clangd::dex::Token::Trigram.

Referenced by clang::clangd::dex::Dex::fuzzyFind().

◆ identifierTrigrams()

template<typename Func >
static void clang::clangd::dex::identifierTrigrams ( llvm::StringRef  Identifier,
Func  Out 
)
static

Variable Documentation

◆ ProximityURILimit

constexpr unsigned clang::clangd::dex::ProximityURILimit = 5
constexpr

Definition at line 371 of file Dex.cpp.

Referenced by generateProximityURIs().