Utility¶

Modules

Cread Word Database¶

util.create_DB.createTable()¶: Convert text word information such as meaning into sqlite3 database

util.create_DB.makeDir()¶: Check target directory is exist or not If target dicrectory is not exist, create target directory

Dictionary¶

util.dictionary.is_ascii(keyword)¶

Check special character such as !@#$ and number is included in word or not

Parameters: keyword – Word to search
Returns: True (Word contain only ascii character) / False (Special character or number is included in word)

util.dictionary.simple_word_dict(keyword)¶

Search the meaning of word in database

Parameters: keyword – Word to search
Returns: Meaning of word. If word is not in database, return null

util.dictionary.wikipedia_dict(keyword)¶

Search the meaning of word which can’t find the meaning in database

Parameters: keyword – Word to search
Returns: Information related to word

Gaze Analyze¶

util.gaze_analyze.analyze(word, duration, start, end, word_idx, sentences)¶

Get information of gaze collected by using eye-tracker. Processing information to call calculate-impaction function. If impaction is larger than threshold, clear word, duration, start, end list It means user feel difficult to read this picture.

Parameters

word – Recently read word
duration – Fixation time of each word in word list
start – Starting word index in gaze information (e.g. Saccade)
end – Ending word index in gaze information (e.g. Saccade)
word_idx – Index of word
sentences – Index of sentence

Returns

util.gaze_analyze.calculate_impaction(avg_fix, avg_sac, avg_reg)¶

Calculate the impaction by using gaze information There are three parameters, (weight of fixation, saccade, regression) Optimize it heuristically and use it to decide difficulty of reading

Parameters

avg_fix – Average fixation time
avg_sac – Average saccade time
avg_reg – Average regression time

Returns

Impaction of gaze calculated

Read Configuration¶

util.read_configuration.configuration()¶

Read configuration from text file

Returns: Configurations

pdf2html¶

util.read_text.is_ascii(word)¶

Check non-ascii character is included in word or not

Parameters: word – Work to check non-ascii character
Returns: True (Only ascii is included) / False (Non-ascii character is included)

util.read_text.mapping(text)¶

Replace special character in text.

Parameters: text – String to process
Returns: Processed string

util.read_text.pdf2html(filename)¶

Convert pdf file to html file using pdf2htmlex program. pdf2htmlex is runned using docker and python run it with shell command

Parameters: filename – Filename of pdf to convert
Returns: HTML document content

util.read_text.spaning(html, word_idx, sentence)¶

Spannning the word in html. Surround “<span>” tag around the word. Mark the index in span class. Also spanning the sentence in html. Surround “<span>” tag around the sentence. Mark the index in span class. Word index and sentence index is used to identify hard word and sentence. Do it recursively until all word and sentence is spanned

Parameters

html – HTML data converted from PDF
word_idx – Counter of word index
sentence – Current sentence to tag

Returns

None

util.read_text.words2sentence(sentence)¶

Deprecated

Parameters: sentence – Sentence to convert
Returns: list of word