yomichan-import

Author	SHA1	Message	Date
Alexei Yatskov	f4da17e228	Merge pull request #41 from stephenmk/master New version of JMnedict (the proper name dictionary)	2023-02-05 09:57:17 -08:00
stephenmk	ecf22da5a3	Improve readability of publication date functions	2023-02-04 01:42:08 -06:00
stephenmk	a9d85dc720	Simplify string -> runes conversion	2023-02-03 22:07:41 -06:00
stephenmk	70611a51c4	Fix typo	2023-02-03 15:51:52 -06:00
stephenmk	dffbec6337	Designate more JMnedict category tags	2023-02-02 20:15:28 -06:00
stephenmk	5755b79341	Use cached part-of-speech values	2023-02-02 15:50:57 -06:00
stephenmk	7bff70b71c	JMdict: Ensure part-of-speech info is added in non-English versions Only English-language senses in JMdict contain part-of-speech tags. This info is displayed to users in definition tags and also used for deinflecting verbs and adjectives during term lookups. The old version of Yomichan-Import took the PoS tags from the final sense in the English version of an entry and applied them to every sense of every other language. For example, 川・かわ has two senses in English JMdict: a noun sense and a suffix sense. Therefore every sense of 川・かわ in every other language was tagged as a suffix. Instead, I suggest gathering all distinct PoS tags from each English entry and applying them all to each non-English sense. Every non-English sense of 川・かわ will therefore be tagged as both a noun and suffix.	2023-02-02 10:44:16 -06:00
stephenmk	19d6d0bb43	Rename some jmdict functions	2023-02-01 19:14:37 -06:00
stephenmk	3b420f8b6c	Use library implementation of Contains function	2023-02-01 18:57:35 -06:00
stephenmk	8281301869	New JMnedict version	2023-02-01 18:55:03 -06:00
stephenmk	b826dbf264	Add verification logic for date entry in JMdict Very old versions of JMdict and unofficial versions are unlikely to have the publication date entry at the end of the file.	2023-01-30 13:26:26 -06:00
Alexei Yatskov	74de4ce9e5	Merge pull request #40 from stephenmk/master New version of JMdict for Yomichan	2023-01-29 22:30:04 -08:00
stephenmk	0b328e1e07	Add support for undocumented frequency and information tags Custom dictionary files using the JMdict XML format may contain nonstandard frequency and information tags.	2023-01-29 22:34:13 -06:00
stephenmk	aab031972c	Simplify declaration of constants	2023-01-29 20:06:46 -06:00
stephenmk	8b4b899959	Hide new JMdict structured content features behind "extra" option Require `-language=english_extra` to produce the complete version of the new JMdict dictionary file. If and when we determine that the all the new features are ready to be included the dictionary by default, we can remove this logic.	2023-01-29 14:06:50 -06:00
stephenmk	abbe183145	Simplify logic for `index.json` struct	2023-01-28 18:39:08 -06:00
stephenmk	184dd45dbc	Use snake_case in filenames	2023-01-28 18:17:06 -06:00
stephenmk	517ef3d052	Fix bug in term score assignments This commit ensures that terms are grouped among their entries of origin and displayed in correct sequential order in Yomichan's default result grouping mode, "Group term-reading pairs."	2023-01-27 19:09:12 -06:00
stephenmk	7bd967915c	Add "forms" term in special circumstances If a headword appears in multiple entries, then each entry needs a corresponding "forms" term in the output dictionary. For example, 軽卒 is the only headword in entry 2275730, but 軽卒 also appears as an irregular form in entry 1252910. If a "forms" term is not included for the former entry, then it will appear that 軽卒 is irregular for all senses in the output dictionary.	2023-01-25 18:26:47 -06:00
stephenmk	406067eedd	Include entity tags in standalone forms dictionary	2023-01-24 13:02:50 -06:00
stephenmk	96358e3eb5	Fix function parameter Sense numbers start at 1, not 0	2023-01-24 08:55:24 -06:00
stephenmk	ef1e74447d	Include term tags and scores in standalone forms dictionary	2023-01-23 23:52:42 -06:00
stephenmk	d606f729cf	Use secondary frequency tags in term score calculation If a term has a frequency tag, it should return higher in search results than a match which does not have a tag. For example, a search for 素性 should return すじょう rather than そせい, because the former has a "news" frequency tag.	2023-01-23 14:13:22 -06:00
stephenmk	6726c5245b	Rename variables for consistency	2023-01-23 14:09:50 -06:00
stephenmk	d8a3b420ee	Exclude "search" and "forms" terms from non-English dictionaries This allows a user to install the English version and another version without cluttering their setup with duplicated information. If a user doesn't want to use the English version, they can get the "search" and "forms" terms by installing the separate jmdict_forms file.	2023-01-22 17:55:27 -06:00
stephenmk	8451803bfd	Update copyright	2023-01-22 15:00:13 -06:00
stephenmk	972dc6c4e9	Update dictionary build script	2023-01-22 14:40:39 -06:00
stephenmk	abc28bb19d	Add new JMdict version	2023-01-22 14:37:18 -06:00
stephenmk	73fb992865	Add intersection and union functions for string arrays	2023-01-22 14:32:45 -06:00
stephenmk	56f9895967	Add struct for handling index.json data	2023-01-22 14:27:02 -06:00
stephenmk	853d0b33dc	Use empty interface type for dictionary glossaries Necesssary for structured content support	2023-01-22 14:14:33 -06:00
Alexei Yatskov	9222417bfd	Merge pull request #37 from toasted-nutbread/update-vs-rules Update how suru verb rules are detected	2022-08-20 11:52:32 -07:00
toasted-nutbread	77d5d2debd	Update how suru verb rules are detected	2022-08-14 15:35:20 -04:00
Alex Yatskov	2168659243	Fix import path	2022-08-07 09:38:50 -07:00
Alexei Yatskov	b5d6095c06	Merge pull request #36 from 0x766F6964/update_daijisen Update daijisen	2022-08-01 19:34:03 -07:00
Randy Palamar	5b8481e5bf	remove duplicate newlines in definitions this prevents entries from have empty lines which are particularly annoying when using the popup dictionary in yomichan	2022-07-28 20:38:02 -06:00
Randy Palamar	94326126d3	update the daijisen regexps this also fixes #5 the method used is a bit hacky but it works	2022-07-28 20:27:29 -06:00
Randy Palamar	8bc7ffdb36	add newlines to characters indicating sub-definitions this will cause some things to be displayed incorrectly but overall makes daijisen much more readable.	2022-07-28 20:25:35 -06:00
Randy Palamar	65df67b085	map most of daijisen the remaining glyphs don't exist in unicode usually because they are normally displayed using HTML or MathJax type things	2022-07-28 20:20:48 -06:00
Alexei Yatskov	57280ea5fd	Merge pull request #35 from univerio/shougakukan2 Add support for 小学館中日・日中統合辞書第2版 EPWING	2022-07-14 21:18:53 -07:00
Alex Yatskov	75207654d9	Update README	2022-07-14 14:24:32 -07:00
Alex Yatskov	1fdf4f2998	Switch to foosoft.net for packages	2022-07-03 20:59:33 -07:00
Jack Zhou	c918a6bb5d	Implement shougakukan2	2022-05-16 21:39:11 -07:00
Alex Yatskov	a4af996222	Merge pull request #31 from 0x766F6964/add_font_mappings finish mapping most of daijirin	2022-02-05 18:23:22 -08:00
Alex Yatskov	d61c1e0df6	Readme consistency	2022-02-05 18:22:07 -08:00
Alex Yatskov	6b3aaf3886	Update readme	2022-02-05 18:20:31 -08:00
Alex Yatskov	e16da37017	Update README	2021-12-15 18:06:35 -08:00
Alex Yatskov	e9849380ea	Add links	2021-12-14 20:32:29 -08:00
Alex Yatskov	fc7fd48748	Add site metadata	2021-12-14 20:27:16 -08:00
Randy Palamar	6224b4c21f	finish mapping most of daijirin Now you can search for totally useful every day words like 瘟㾮日 and 多羅吒干𤚥 :^). The characters that remain either don't exist in unicode or are very difficult to find. Also a couple terms seem unsearchable in qolibri so I couldn't check what the characters are supposed to be. Any questionable choice was marked with FIXME. This will make it easy in the future to replace some characters with their images if its something that we want to support in the future. * The FIXMEs with the missing font symbol should all be the correct character (not commonly covered by fonts) * The くの字点 choices are to try and imitate the daijirin experience(TM). Probably the worst use of image fonts I've seen. Those characters should never appear in horizontal text. They should have just been replaced with the text that was supposed to be repeated. * The 漢文訓読 characters in '{}' are technically the unicode specified characters for those glyphs however they just look like their full size variants. I surrounded them with '{}' so the examples that use them are still readable. * The other FIXMEs should be self explanatory. Search the term in qolibri and look at what they used to see why they are questionable.	2021-06-17 07:56:14 -06:00

1 2 3 4 5 ...

301 Commits