Day098 — Extract data by pattern
1 min readSep 3, 2019
Here are the things I learned when I did a Python project extracting data from a Word file.
load the Word file:
doc = Document('./resources/the_word_file.docx')
for p in doc.paragraphs:
a_line = p.text
split the string by arbitrary number of white spaces into array:
# just use .split(), no parameter is needed
my_string.split()
extract text between patterns:
import re
re.findall(r'pattern_prefix([^(]*)pattern_suffix', my_string)# orm = re.search('pattern_prefix(.+?)pattern_suffix', my_string)
if m:
found = m.group(1)# or
re.search(r"(?<=pattern_prefix).*?(?=pattern_suffix)", my_string).group(0)
good place to evaluate the regular expression: