Utility¶
Modules
Cread Word Database¶
-
util.create_DB.
createTable
()¶ Convert text word information such as meaning into sqlite3 database
-
util.create_DB.
makeDir
()¶ Check target directory is exist or not If target dicrectory is not exist, create target directory
Dictionary¶
-
util.dictionary.
is_ascii
(keyword)¶ Check special character such as !@#$ and number is included in word or not
- Parameters
keyword – Word to search
- Returns
True (Word contain only ascii character) / False (Special character or number is included in word)
-
util.dictionary.
simple_word_dict
(keyword)¶ Search the meaning of word in database
- Parameters
keyword – Word to search
- Returns
Meaning of word. If word is not in database, return null
-
util.dictionary.
wikipedia_dict
(keyword)¶ Search the meaning of word which can’t find the meaning in database
- Parameters
keyword – Word to search
- Returns
Information related to word
Gaze Analyze¶
-
util.gaze_analyze.
analyze
(word, duration, start, end, word_idx, sentences)¶ Get information of gaze collected by using eye-tracker. Processing information to call calculate-impaction function. If impaction is larger than threshold, clear word, duration, start, end list It means user feel difficult to read this picture.
- Parameters
word – Recently read word
duration – Fixation time of each word in word list
start – Starting word index in gaze information (e.g. Saccade)
end – Ending word index in gaze information (e.g. Saccade)
word_idx – Index of word
sentences – Index of sentence
- Returns
-
util.gaze_analyze.
calculate_impaction
(avg_fix, avg_sac, avg_reg)¶ Calculate the impaction by using gaze information There are three parameters, (weight of fixation, saccade, regression) Optimize it heuristically and use it to decide difficulty of reading
- Parameters
avg_fix – Average fixation time
avg_sac – Average saccade time
avg_reg – Average regression time
- Returns
Impaction of gaze calculated
Read Configuration¶
-
util.read_configuration.
configuration
()¶ Read configuration from text file
- Returns
Configurations
pdf2html¶
-
util.read_text.
is_ascii
(word)¶ Check non-ascii character is included in word or not
- Parameters
word – Work to check non-ascii character
- Returns
True (Only ascii is included) / False (Non-ascii character is included)
-
util.read_text.
mapping
(text)¶ Replace special character in text.
- Parameters
text – String to process
- Returns
Processed string
-
util.read_text.
pdf2html
(filename)¶ Convert pdf file to html file using pdf2htmlex program. pdf2htmlex is runned using docker and python run it with shell command
- Parameters
filename – Filename of pdf to convert
- Returns
HTML document content
-
util.read_text.
spaning
(html, word_idx, sentence)¶ Spannning the word in html. Surround “<span>” tag around the word. Mark the index in span class. Also spanning the sentence in html. Surround “<span>” tag around the sentence. Mark the index in span class. Word index and sentence index is used to identify hard word and sentence. Do it recursively until all word and sentence is spanned
- Parameters
html – HTML data converted from PDF
word_idx – Counter of word index
sentence – Current sentence to tag
- Returns
None
-
util.read_text.
words2sentence
(sentence)¶ Deprecated
- Parameters
sentence – Sentence to convert
- Returns
list of word