Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions docs/word/how-to-replace-text-in-a-word-document-with-sax.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
---

api_name:
- Microsoft.Office.DocumentFormat.OpenXML.Packaging
api_type:
- schema
ms.assetid: 2f6f0f89-0ac0-4d40-9f1a-222caf074cf1
title: 'How to: Replace Text in a Word Document Using SAX (Simple API for XML)'
description: 'Learn how to replace text in a Word document using SAX (Simple API for XML)'
ms.suite: office

ms.author: o365devx
author: o365devx
ms.topic: conceptual
ms.date: 04/03/2025
ms.localizationpriority: high
---
# Replace Text in a Word Document Using SAX (Simple API for XML)

This topic shows how to use the Open XML SDK to search and replace text in a Word document with the
Open XML SDK using the Simple API for XML (SAX) approach. For more information about the basic structure
of a `WordprocessingML` document, see [Structure of a WordprocessingML document](./structure-of-a-wordprocessingml-document.md).

## Why Use the SAX Approach?

The Open XML SDK provides two ways to parse Office Open XML files: the Document Object Model (DOM) and the Simple API for XML (SAX). The DOM approach is designed to make it easy to query and parse Open XML files by using strongly-typed classes. However, the DOM approach requires loading entire Open XML parts into memory, which can lead to slower processing and Out of Memory exceptions when working with very large parts. The SAX approach reads in the XML in an Open XML part one element at a time without reading in the entire part into memory giving noncached, forward-only access to the XML data, which makes it a better choice when reading very large parts.

## Accessing the MainDocumentPart

The text of a Word document is stored in the <xref:DocumentFormat.OpenXml.Packaging.MainDocumentPart>, so the first step to
finding and replacing text is to access the Word document's `MainDocumentPart`. To do that we first use the `WordprocessingDocument.Open`
method passing in the path to the document as the first parameter and a second parameter `true` to indicate that we
are opening the file for editing. Then make sure that the `MainDocumentPart` is not null.

### [C#](#tab/cs-1)
[!code-csharp[](../../samples/word/replace_text_with_sax/cs/Program.cs#snippet1)]

### [Visual Basic](#tab/vb-1)
[!code-vb[](../../samples/word/replace_text_with_sax/vb/Program.vb#snippet1)]
***

## Create Memory Stream, OpenXmlReader, and OpenXmlWriter

With the DOM approach to editing documents, the entire part is read into memory, so we can use the Open XML SDK's
strongly typed classes to access the <xref:DocumentFormat.OpenXml.Wordprocessing.Text> class to access the
document's text and edit it. The SAX approach, however, uses the <xref:DocumentFormat.OpenXml.OpenXmlPartReader>
and <xref:DocumentFormat.OpenXml.OpenXmlPartWriter> classes, which access a part's stream with forward-only
access. The advantage of this is that the entire part does not need to be loaded into memory, which is faster
and uses less memory, but since the same part cannot be opened in multiple streams at the same time, we cannot create a
<xref:DocumentFormat.OpenXml.OpenXmlReader> to read a part and a <xref:DocumentFormat.OpenXml.OpenXmlWriter> to edit
the same part at the same time. The solution to this is to create an additional memory stream and write the
updated part to the new memory stream then use the stream to update the part when `OpenXmlReader` and `OpenXmlWriter`
have been disposed. In the code below we create the `MemoryStream` to store the updated part and create an
`OpenXmlReader` for the `MainDocumentPart` and a `OpenXmlWriter` to write to the `MemoryStream`

### [C#](#tab/cs-2)
[!code-csharp[](../../samples/word/replace_text_with_sax/cs/Program.cs#snippet2)]

### [Visual Basic](#tab/vb-2)
[!code-vb[](../../samples/word/replace_text_with_sax/vb/Program.vb#snippet2)]
***

## Reading the Part and Writing to the New Stream

Now that we have an `OpenXmlReader` to read the part and an `OpenXmlWriter` to write to the new `MemoryStream`
we use the <xref:DocumentFormat.OpenXml.OpenXmlReader.Read*> method to read each element in the part. As
each element is read in we check if it is of type `Text` and if it is, we use the <xrefDocumentFormat.OpenXml.OpenXmlReader.GetText*>
method to access the text and use <xref:System.String.Replace*> to update the text. If it is not a
`Text` element, then we write it to the stream unchanged.

> [!Note]
> In a Word document text can be separated into multiple `Text` elements, so if you are replacing a
> phrase and not a single word, it's best to replace one word at a time.

### [C#](#tab/cs-3)
[!code-csharp[](../../samples/word/replace_text_with_sax/cs/Program.cs#snippet3)]

### [Visual Basic](#tab/vb-3)
[!code-vb[](../../samples/word/replace_text_with_sax/vb/Program.vb#snippet3)]
***

## Writing the New Stream to the MainDocumentPart

With the updated part written to the memory stream the last step is to set the `MemoryStream`'s
position to 0 and use the <xref:DocumentFormat.OpenXml.Packaging.OpenXmlPart.FeedData*> method
to replace the `MainDocumentPart` with the updated stream.

### [C#](#tab/cs-4)
[!code-csharp[](../../samples/word/replace_text_with_sax/cs/Program.cs#snippet4)]

### [Visual Basic](#tab/vb-4)
[!code-vb[](../../samples/word/replace_text_with_sax/vb/Program.vb#snippet4)]
***

## Sample Code

Below is the complete sample code to replace text in a Word document using the SAX (Simple API for XML)
approach.

### [C#](#tab/cs-0)
[!code-csharp[](../../samples/word/replace_text_with_sax/cs/Program.cs#snippet0)]

### [Visual Basic](#tab/vb-0)
[!code-vb[](../../samples/word/replace_text_with_sax/vb/Program.vb#snippet0)]
***
14 changes: 14 additions & 0 deletions samples/samples.sln
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,10 @@ Project("{F184B08F-C81C-45F6-A57F-5ABD9991F28F}") = "working_with_tables_vb", "w
EndProject
Project("{F184B08F-C81C-45F6-A57F-5ABD9991F28F}") = "insert_a_picture_vb", "word\insert_a_picture\vb\insert_a_picture_vb.vbproj", "{6170C4E1-A109-435A-BF59-026C85B3BD9C}"
EndProject
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "replace_text_with_sax_cs", "word\replace_text_with_sax\cs\replace_text_with_sax_cs.csproj", "{4C514047-64B5-1383-4564-B827B846A6A7}"
EndProject
Project("{F184B08F-C81C-45F6-A57F-5ABD9991F28F}") = "replace_text_with_sax_vb", "word\replace_text_with_sax\vb\replace_text_with_sax_vb.vbproj", "{6EB91F44-EC13-5354-0450-9A2687C3B169}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Expand Down Expand Up @@ -938,6 +942,14 @@ Global
{6170C4E1-A109-435A-BF59-026C85B3BD9C}.Debug|Any CPU.Build.0 = Debug|Any CPU
{6170C4E1-A109-435A-BF59-026C85B3BD9C}.Release|Any CPU.ActiveCfg = Release|Any CPU
{6170C4E1-A109-435A-BF59-026C85B3BD9C}.Release|Any CPU.Build.0 = Release|Any CPU
{4C514047-64B5-1383-4564-B827B846A6A7}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{4C514047-64B5-1383-4564-B827B846A6A7}.Debug|Any CPU.Build.0 = Debug|Any CPU
{4C514047-64B5-1383-4564-B827B846A6A7}.Release|Any CPU.ActiveCfg = Release|Any CPU
{4C514047-64B5-1383-4564-B827B846A6A7}.Release|Any CPU.Build.0 = Release|Any CPU
{6EB91F44-EC13-5354-0450-9A2687C3B169}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{6EB91F44-EC13-5354-0450-9A2687C3B169}.Debug|Any CPU.Build.0 = Debug|Any CPU
{6EB91F44-EC13-5354-0450-9A2687C3B169}.Release|Any CPU.ActiveCfg = Release|Any CPU
{6EB91F44-EC13-5354-0450-9A2687C3B169}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
Expand Down Expand Up @@ -1095,6 +1107,8 @@ Global
{A43A75AB-D6B6-4D31-99F7-6951AFEF502D} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
{4EB1FCC9-E1E2-4D2A-ACF9-A3A31AA947A5} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
{6170C4E1-A109-435A-BF59-026C85B3BD9C} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
{4C514047-64B5-1383-4564-B827B846A6A7} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
{6EB91F44-EC13-5354-0450-9A2687C3B169} = {D207D3D7-FD4D-4FD4-A7D0-79A82086FB6F}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {721B3030-08D7-4412-9087-D1CFBB3F5046}
Expand Down
81 changes: 81 additions & 0 deletions samples/word/replace_text_with_sax/cs/Program.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using System.IO;
using DocumentFormat.OpenXml.Wordprocessing;

ReplaceTextWithSAX(args[0], args[1], args[2]);

// <Snippet0>
void ReplaceTextWithSAX(string path, string textToReplace, string replacementText)
{
// <Snippet1>
// Open the WordprocessingDocument for editing
using (WordprocessingDocument wordprocessingDocument = WordprocessingDocument.Open(path, true))
{
// Access the MainDocumentPart and make sure it is not null
MainDocumentPart? mainDocumentPart = wordprocessingDocument.MainDocumentPart;

if (mainDocumentPart is not null)
// </Snippet1>
{
// <Snippet2>
// Create a MemoryStream to store the updated MainDocumentPart
using (MemoryStream memoryStream = new MemoryStream())
{
// Create an OpenXmlReader to read the main document part
// and an OpenXmlWriter to write to the MemoryStream
using (OpenXmlReader reader = OpenXmlPartReader.Create(mainDocumentPart))
using (OpenXmlWriter writer = OpenXmlPartWriter.Create(memoryStream))
// </Snippet2>
{
// <Snippet3>
// Write the XML declaration with the version "1.0".
writer.WriteStartDocument();

// Read the elements from the MainDocumentPart
while (reader.Read())
{
// Check if the element is of type Text
if (reader.ElementType == typeof(Text))
{
// If it is the start of an element write the start element and the updated text
if (reader.IsStartElement)
{
writer.WriteStartElement(reader);

string text = reader.GetText().Replace(textToReplace, replacementText);

writer.WriteString(text);

}
else
{
// Close the element
writer.WriteEndElement();
}
}
else
// Write the other XML elements without editing
{
if (reader.IsStartElement)
{
writer.WriteStartElement(reader);
}
else if (reader.IsEndElement)
{
writer.WriteEndElement();
}
}
}
// </Snippet3>
}
// <Snippet4>
// Set the MemoryStream's position to 0 and replace the MainDocumentPart
memoryStream.Position = 0;
mainDocumentPart.FeedData(memoryStream);
// </Snippet4>
}
}
}
}
// </Snippet0>
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<Project Sdk="Microsoft.NET.Sdk"/>
70 changes: 70 additions & 0 deletions samples/word/replace_text_with_sax/vb/Program.vb
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
Imports DocumentFormat.OpenXml
Imports DocumentFormat.OpenXml.Packaging
Imports System.IO
Imports DocumentFormat.OpenXml.Wordprocessing

Module Program
Sub Main(args As String())
ReplaceTextWithSAX(args(0), args(1), args(2))
End Sub

' <Snippet0>
Sub ReplaceTextWithSAX(path As String, textToReplace As String, replacementText As String)
' <Snippet1>
' Open the WordprocessingDocument for editing
Using wordprocessingDocument As WordprocessingDocument = WordprocessingDocument.Open(path, True)
' Access the MainDocumentPart and make sure it is not null
Dim mainDocumentPart As MainDocumentPart = wordprocessingDocument.MainDocumentPart

If mainDocumentPart IsNot Nothing Then
' </Snippet1>
' <Snippet2>
' Create a MemoryStream to store the updated MainDocumentPart
Using memoryStream As New MemoryStream()
' Create an OpenXmlReader to read the main document part
' and an OpenXmlWriter to write to the MemoryStream
Using reader As OpenXmlReader = OpenXmlPartReader.Create(mainDocumentPart)
Using writer As OpenXmlWriter = OpenXmlPartWriter.Create(memoryStream)
' </Snippet2>
' <Snippet3>
' Write the XML declaration with the version "1.0".
writer.WriteStartDocument()

' Read the elements from the MainDocumentPart
While reader.Read()
' Check if the element is of type Text
If reader.ElementType Is GetType(Text) Then
' If it is the start of an element write the start element and the updated text
If reader.IsStartElement Then
writer.WriteStartElement(reader)

Dim text As String = reader.GetText().Replace(textToReplace, replacementText)

writer.WriteString(text)
Else
' Close the element
writer.WriteEndElement()
End If
Else
' Write the other XML elements without editing
If reader.IsStartElement Then
writer.WriteStartElement(reader)
ElseIf reader.IsEndElement Then
writer.WriteEndElement()
End If
End If
End While
' </Snippet3>
End Using
End Using
' <Snippet4>
' Set the MemoryStream's position to 0 and replace the MainDocumentPart
memoryStream.Position = 0
mainDocumentPart.FeedData(memoryStream)
' </Snippet4>
End Using
End If
End Using
End Sub
' </Snippet0>
End Module
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
<Project Sdk="Microsoft.NET.Sdk"/>