kopia lustrzana https://github.com/cblgh/lieu
				
				
				
			Added a bit of documentation for new features
							rodzic
							
								
									212f5c5655
								
							
						
					
					
						commit
						7c6a63ce2c
					
				|  | @ -102,6 +102,8 @@ bannedSuffixes = "data/banned-suffixes.txt" | |||
| boringWords = "data/boring-words.txt" | ||||
| # domains that won't be output as outgoing links | ||||
| boringDomains = "data/boring-domains.txt" | ||||
| # queries to search for finding preview text | ||||
| previewQueryList = "data/preview-query-list.txt" | ||||
| ``` | ||||
| 
 | ||||
| For your own use, the following config fields should be customized: | ||||
|  | @ -119,6 +121,7 @@ The following config-defined files can stay as-is unless you have specific requi | |||
| * `heuristics` | ||||
| * `wordlist` | ||||
| * `bannedSuffixes` | ||||
| * `previewQueryList` | ||||
| 
 | ||||
| For a full rundown of the files and their various jobs, see the [files | ||||
| description](docs/files.md). | ||||
|  |  | |||
|  | @ -37,6 +37,8 @@ bannedSuffixes = "data/banned-suffixes.txt" | |||
| boringWords = "data/boring-words.txt" | ||||
| # domains that won't be output as outgoing links | ||||
| boringDomains = "data/boring-domains.txt" | ||||
| # queries to search for finding preview text | ||||
| previewQueryList = "data/preview-query-list.txt" | ||||
| ``` | ||||
| 
 | ||||
| ## HTML | ||||
|  | @ -120,6 +122,23 @@ are stopped from entering the search index. The default wordlist consists of the | |||
| 1000 or so most common English words, albeit curated slightly to still allow for | ||||
| interesting concepts and verbs—such as `reading` and `books`, for example. | ||||
| 
 | ||||
| #### `previewQueryList` | ||||
| A list of css selectors (one per line) to fetch preview paragraphs, | ||||
| the first paragraph found that passes a check against the `heuristics` file makes | ||||
| it into the search index. For each selector lieu tries the first four paragraphs | ||||
| found with each selector before skipping to the next one. | ||||
| 
 | ||||
| To get good results one usually wants to tune this to getting the first "real" paragraph | ||||
| after the header, or a summary paragraph if provided. It is also worth trying to avoind getting | ||||
| irelevant paragraphs as they clutter up your index and results, lieu will fall back to other | ||||
| preview sources. | ||||
| 
 | ||||
| The default has been (at the time of writing) tuned for use with the Fediring. | ||||
| 
 | ||||
| Depending on how well the websites you are indexing are with semantic HTML this will | ||||
| get you the 70 to 90% solution. For the rest use heuristics and contact the creators of the | ||||
| websites you are tring to index, they (usually) appreciate the feedback. | ||||
| 
 | ||||
| #### OpenSearch metadata | ||||
| If you are running your own instance of Lieu, you might want to look into changing the URL | ||||
| defined in the file `opensearch.xml`, which specifies [OpenSearch | ||||
|  |  | |||
		Ładowanie…
	
		Reference in New Issue
	
	 Slatian
						Slatian