Utility

Modules


Cread Word Database

util.create_DB.createTable()

Convert text word information such as meaning into sqlite3 database

util.create_DB.makeDir()

Check target directory is exist or not If target dicrectory is not exist, create target directory

Dictionary

util.dictionary.is_ascii(keyword)

Check special character such as !@#$ and number is included in word or not

Parameters

keyword – Word to search

Returns

True (Word contain only ascii character) / False (Special character or number is included in word)

util.dictionary.simple_word_dict(keyword)

Search the meaning of word in database

Parameters

keyword – Word to search

Returns

Meaning of word. If word is not in database, return null

util.dictionary.wikipedia_dict(keyword)

Search the meaning of word which can’t find the meaning in database

Parameters

keyword – Word to search

Returns

Information related to word

Gaze Analyze

util.gaze_analyze.analyze(word, duration, start, end, word_idx, sentences)

Get information of gaze collected by using eye-tracker. Processing information to call calculate-impaction function. If impaction is larger than threshold, clear word, duration, start, end list It means user feel difficult to read this picture.

Parameters
  • word – Recently read word

  • duration – Fixation time of each word in word list

  • start – Starting word index in gaze information (e.g. Saccade)

  • end – Ending word index in gaze information (e.g. Saccade)

  • word_idx – Index of word

  • sentences – Index of sentence

Returns

util.gaze_analyze.calculate_impaction(avg_fix, avg_sac, avg_reg)

Calculate the impaction by using gaze information There are three parameters, (weight of fixation, saccade, regression) Optimize it heuristically and use it to decide difficulty of reading

Parameters
  • avg_fix – Average fixation time

  • avg_sac – Average saccade time

  • avg_reg – Average regression time

Returns

Impaction of gaze calculated

Read Configuration

util.read_configuration.configuration()

Read configuration from text file

Returns

Configurations

pdf2html

util.read_text.is_ascii(word)

Check non-ascii character is included in word or not

Parameters

word – Work to check non-ascii character

Returns

True (Only ascii is included) / False (Non-ascii character is included)

util.read_text.mapping(text)

Replace special character in text.

Parameters

text – String to process

Returns

Processed string

util.read_text.pdf2html(filename)

Convert pdf file to html file using pdf2htmlex program. pdf2htmlex is runned using docker and python run it with shell command

Parameters

filename – Filename of pdf to convert

Returns

HTML document content

util.read_text.spaning(html, word_idx, sentence)

Spannning the word in html. Surround “<span>” tag around the word. Mark the index in span class. Also spanning the sentence in html. Surround “<span>” tag around the sentence. Mark the index in span class. Word index and sentence index is used to identify hard word and sentence. Do it recursively until all word and sentence is spanned

Parameters
  • html – HTML data converted from PDF

  • word_idx – Counter of word index

  • sentence – Current sentence to tag

Returns

None

util.read_text.words2sentence(sentence)

Deprecated

Parameters

sentence – Sentence to convert

Returns

list of word