Day098 — Extract data by pattern

Jacky Tsang
1 min readSep 3, 2019

--

Here are the things I learned when I did a Python project extracting data from a Word file.

load the Word file:

doc = Document('./resources/the_word_file.docx')
for p in doc.paragraphs:
a_line = p.text

split the string by arbitrary number of white spaces into array:

# just use .split(), no parameter is needed
my_string.split()

extract text between patterns:

import re
re.findall(r'pattern_prefix([^(]*)pattern_suffix', my_string)
# orm = re.search('pattern_prefix(.+?)pattern_suffix', my_string)
if m:
found = m.group(1)
# or
re.search(r"(?<=pattern_prefix).*?(?=pattern_suffix)", my_string).group(0)

good place to evaluate the regular expression:

--

--

No responses yet