


Note that the term "section" here is used as in common publishing parlance to refer to a (section) heading and its subordinate paragraphs, and does not refer to a Word document section object (like ctions).

You'll need an implementation of is_heading() and create_document_from_paragraphs().
IMPORT TO WRITEITNOW INTO SEPERATE CHAPTERS CODE
Something like this combined with portions of your other code should give you something workable to start with. """Generate a sequence of paragraphs for each headed section in document.Įach generated sequence has a heading paragraph in its first position,įor paragraph in document.paragraphs: Then iterate_document_sections() would look something like: def iterate_document_sections(document): At the top level you could have: for paragraphs in iterate_document_sections(document.paragraphs):Ĭreate_document_from_paragraphs(paragraphs) I think the approach of using iterators is a sound one, but I'd be inclined to parcel them differently.
IMPORT TO WRITEITNOW INTO SEPERATE CHAPTERS HOW TO
I am open for any alternative suggestions on how to achieve what I want with different methods, or if there is an easier way to do it with PDF files. The red braces mark what I want to extract from each file. This is the XML reading python-docx gives me. How do I extract the text and heading for each article? If ('Normal'):įor heading in iter_headings(document.paragraphs):įor paragraph in iter_text(document.paragraphs): docx files, and I can read the headings and text separately, but I can't seem to figure out a way how to merge it all and split it into separate files each with the heading and the text. I got to the part where it iterates through all of the files in a path where I hold the. So if my original file1.docx has 4 articles, I want it to be split into 4 separate files each with its heading and text. Inside each of the docx there are a couple of articles, each with a 'Heading 1' and text underneath it. I want to write a program that grabs my docx files, iterates through them and splits each file into multiple separate files based on headings.
