1
Commit Graph

304 Commits

Author SHA1 Message Date
d4ec828e24 Update paths 2023-12-30 20:43:50 -08:00
840fb7ac4f Update README 2023-12-30 14:31:18 -08:00
00dc44386e Update maintanence info 2023-02-25 12:43:03 -08:00
Alexei Yatskov
f4da17e228
Merge pull request #41 from stephenmk/master
New version of JMnedict (the proper name dictionary)
2023-02-05 09:57:17 -08:00
stephenmk
ecf22da5a3
Improve readability of publication date functions 2023-02-04 01:42:08 -06:00
stephenmk
a9d85dc720
Simplify string -> runes conversion 2023-02-03 22:07:41 -06:00
stephenmk
70611a51c4
Fix typo 2023-02-03 15:51:52 -06:00
stephenmk
dffbec6337
Designate more JMnedict category tags 2023-02-02 20:15:28 -06:00
stephenmk
5755b79341
Use cached part-of-speech values 2023-02-02 15:50:57 -06:00
stephenmk
7bff70b71c
JMdict: Ensure part-of-speech info is added in non-English versions
Only English-language senses in JMdict contain part-of-speech tags.
This info is displayed to users in definition tags and also used
for deinflecting verbs and adjectives during term lookups.

The old version of Yomichan-Import took the PoS tags from the final
sense in the English version of an entry and applied them to every
sense of every other language. For example, 川・かわ has two senses in
English JMdict: a noun sense and a suffix sense. Therefore every sense
of 川・かわ in every other language was tagged as a suffix.

Instead, I suggest gathering all distinct PoS tags from each English
entry and applying them all to each non-English sense. Every
non-English sense of 川・かわ will therefore be tagged as both a noun
and suffix.
2023-02-02 10:44:16 -06:00
stephenmk
19d6d0bb43
Rename some jmdict functions 2023-02-01 19:14:37 -06:00
stephenmk
3b420f8b6c
Use library implementation of Contains function 2023-02-01 18:57:35 -06:00
stephenmk
8281301869
New JMnedict version 2023-02-01 18:55:03 -06:00
stephenmk
b826dbf264
Add verification logic for date entry in JMdict
Very old versions of JMdict and unofficial versions are unlikely to
have the publication date entry at the end of the file.
2023-01-30 13:26:26 -06:00
Alexei Yatskov
74de4ce9e5
Merge pull request #40 from stephenmk/master
New version of JMdict for Yomichan
2023-01-29 22:30:04 -08:00
stephenmk
0b328e1e07
Add support for undocumented frequency and information tags
Custom dictionary files using the JMdict XML format may contain
nonstandard frequency and information tags.
2023-01-29 22:34:13 -06:00
stephenmk
aab031972c
Simplify declaration of constants 2023-01-29 20:06:46 -06:00
stephenmk
8b4b899959
Hide new JMdict structured content features behind "extra" option
Require `-language=english_extra` to produce the complete version of
the new JMdict dictionary file.

If and when we determine that the all the new features are ready to be
included the dictionary by default, we can remove this logic.
2023-01-29 14:06:50 -06:00
stephenmk
abbe183145
Simplify logic for index.json struct 2023-01-28 18:39:08 -06:00
stephenmk
184dd45dbc
Use snake_case in filenames 2023-01-28 18:17:06 -06:00
stephenmk
517ef3d052
Fix bug in term score assignments
This commit ensures that terms are grouped among their entries of
origin and displayed in correct sequential order in Yomichan's default
result grouping mode, "Group term-reading pairs."
2023-01-27 19:09:12 -06:00
stephenmk
7bd967915c
Add "forms" term in special circumstances
If a headword appears in multiple entries, then each entry needs a
corresponding "forms" term in the output dictionary.

For example, 軽卒 is the only headword in entry 2275730, but 軽卒 also
appears as an irregular form in entry 1252910. If a "forms" term is
not included for the former entry, then it will appear that 軽卒 is
irregular for all senses in the output dictionary.
2023-01-25 18:26:47 -06:00
stephenmk
406067eedd
Include entity tags in standalone forms dictionary 2023-01-24 13:02:50 -06:00
stephenmk
96358e3eb5
Fix function parameter
Sense numbers start at 1, not 0
2023-01-24 08:55:24 -06:00
stephenmk
ef1e74447d
Include term tags and scores in standalone forms dictionary 2023-01-23 23:52:42 -06:00
stephenmk
d606f729cf
Use secondary frequency tags in term score calculation
If a term has a frequency tag, it should return higher in search
results than a match which does not have a tag.

For example, a search for 素性 should return すじょう rather than
そせい, because the former has a "news" frequency tag.
2023-01-23 14:13:22 -06:00
stephenmk
6726c5245b
Rename variables for consistency 2023-01-23 14:09:50 -06:00
stephenmk
d8a3b420ee
Exclude "search" and "forms" terms from non-English dictionaries
This allows a user to install the English version and another version
without cluttering their setup with duplicated information.

If a user doesn't want to use the English version, they can get the
"search" and "forms" terms by installing the separate jmdict_forms
file.
2023-01-22 17:55:27 -06:00
stephenmk
8451803bfd
Update copyright 2023-01-22 15:00:13 -06:00
stephenmk
972dc6c4e9
Update dictionary build script 2023-01-22 14:40:39 -06:00
stephenmk
abc28bb19d
Add new JMdict version 2023-01-22 14:37:18 -06:00
stephenmk
73fb992865
Add intersection and union functions for string arrays 2023-01-22 14:32:45 -06:00
stephenmk
56f9895967
Add struct for handling index.json data 2023-01-22 14:27:02 -06:00
stephenmk
853d0b33dc
Use empty interface type for dictionary glossaries
Necesssary for structured content support
2023-01-22 14:14:33 -06:00
Alexei Yatskov
9222417bfd
Merge pull request #37 from toasted-nutbread/update-vs-rules
Update how suru verb rules are detected
2022-08-20 11:52:32 -07:00
toasted-nutbread
77d5d2debd Update how suru verb rules are detected 2022-08-14 15:35:20 -04:00
2168659243 Fix import path 2022-08-07 09:38:50 -07:00
Alexei Yatskov
b5d6095c06
Merge pull request #36 from 0x766F6964/update_daijisen
Update daijisen
2022-08-01 19:34:03 -07:00
Randy Palamar
5b8481e5bf remove duplicate newlines in definitions
this prevents entries from have empty lines which are particularly
annoying when using the popup dictionary in yomichan
2022-07-28 20:38:02 -06:00
Randy Palamar
94326126d3 update the daijisen regexps
this also fixes #5

the method used is a bit hacky but it works
2022-07-28 20:27:29 -06:00
Randy Palamar
8bc7ffdb36 add newlines to characters indicating sub-definitions
this will cause some things to be displayed incorrectly but overall
makes daijisen much more readable.
2022-07-28 20:25:35 -06:00
Randy Palamar
65df67b085 map most of daijisen
the remaining glyphs don't exist in unicode usually because they are
normally displayed using HTML or MathJax type things
2022-07-28 20:20:48 -06:00
Alexei Yatskov
57280ea5fd
Merge pull request #35 from univerio/shougakukan2
Add support for 小学館 中日・日中 統合辞書 第2版 EPWING
2022-07-14 21:18:53 -07:00
75207654d9 Update README 2022-07-14 14:24:32 -07:00
1fdf4f2998 Switch to foosoft.net for packages 2022-07-03 20:59:33 -07:00
Jack Zhou
c918a6bb5d Implement shougakukan2 2022-05-16 21:39:11 -07:00
Alex Yatskov
a4af996222
Merge pull request #31 from 0x766F6964/add_font_mappings
finish mapping most of daijirin
2022-02-05 18:23:22 -08:00
d61c1e0df6 Readme consistency 2022-02-05 18:22:07 -08:00
6b3aaf3886 Update readme 2022-02-05 18:20:31 -08:00
e16da37017 Update README 2021-12-15 18:06:35 -08:00