text_splitter
SemanticTextSplitter
Bases: TextSplitterInterface
A class that splits text into smaller pieces based on semantic similarity and sentence count.
Attributes:
Name | Type | Description |
---|---|---|
max_sentences |
int
|
The maximum number of sentences per chunk. |
semantic |
bool
|
A flag indicating whether to use semantic splitting. |
semantic_threshold |
float
|
The threshold for cosine similarity to determine splitting points. |
embedding_model |
TextEmbedding
|
The model used for generating text embeddings. |
Source code in src/agere/addons/text_splitter.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
|
__init__(max_sentences, semantic=True, semantic_threshold=0.8)
Initialize the SemanticTextSplitter with the given parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
max_sentences |
int
|
The maximum number of sentences per chunk. |
required |
semantic |
bool
|
Whether to use semantic splitting. Defaults to True. |
True
|
semantic_threshold |
float
|
Threshold for cosine similarity to determine splitting points. Defaults to 0.8. |
0.8
|
Raises:
Type | Description |
---|---|
ValueError
|
If 'semantic_threshold' is not between 0 and 1. |
Source code in src/agere/addons/text_splitter.py
split(text)
Split the text into chunks based on semantic similarity or sentence count.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
The text to split. |
required |
Returns:
Type | Description |
---|---|
Iterable[str]
|
Iterable[str]: The chunks of text. |
Source code in src/agere/addons/text_splitter.py
split_by_semantic(text)
Split the text into chunks based on semantic similarity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
The text to split. |
required |
Returns:
Type | Description |
---|---|
Iterable[str]
|
Iterable[str]: The chunks of text. |
Source code in src/agere/addons/text_splitter.py
split_by_sentence(text, max_sentences)
Split the text into chunks based on the number of sentences.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
The text to split. |
required |
max_sentences |
int
|
The maximum number of sentences per chunk. |
required |
Returns:
Type | Description |
---|---|
Iterable[str]
|
Iterable[str]: The chunks of text. |