Skip to content
This repository was archived by the owner on Jun 18, 2020. It is now read-only.

Commit ba3c751

Browse files
Alexander PatrikalakisAlexander Patrikalakis
Alexander Patrikalakis
authored and
Alexander Patrikalakis
committed
Update SDK. Allow creating destination if it does not exist.
1 parent d7100fd commit ba3c751

File tree

12 files changed

+448
-236
lines changed

12 files changed

+448
-236
lines changed

.gitignore

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
/.idea
2+
/target
3+
/*.iml
4+
.*~
5+
*~

README.md

+141-82
Original file line numberDiff line numberDiff line change
@@ -1,79 +1,127 @@
11
# DynamoDB Import Export Tool
2-
The DynamoDB Import Export Tool is designed to perform parallel scans on the source table, store scan results in a queue, then consume the queue by writing the items asynchronously to a destination table.
2+
The DynamoDB Import Export Tool is designed to perform parallel scans on the source table,
3+
store scan results in a queue, then consume the queue by writing the items asynchronously to a destination table.
34

45
## Requirements ##
56
* Maven
67
* JRE 1.7+
7-
* Pre-existing source and destination DynamoDB tables
8+
* Pre-existing source DynamoDB tables. The destination table is optional in the CLI; you can choose to create the
9+
destination table if it does not exist.
810

911
## Running as an executable
10-
11-
1. Build the library:
12-
13-
```
14-
mvn install
12+
1. Build the library with `mvn install`. This produces the target jar in the target/ directory.
13+
The CLI's usage follows with required parameters marked by asterisks.
14+
15+
```bash
16+
--consistentScan
17+
Use this flag to use strongly consistent scan. If the flag is not used
18+
it will default to eventually consistent scan
19+
Default: false
20+
--createDestination
21+
Create destination table if it does not exist
22+
Default: false
23+
--copyStreamSpecificationWhenCreating
24+
Use the source table stream specification for the destination table
25+
during its creation.
26+
Default: false
27+
--destinationEndpoint
28+
Endpoint of the destination table
29+
* --destinationRegion
30+
Signing region for the destination endpoint
31+
* --destinationTable
32+
Name of the destination table
33+
--help
34+
Display usage information
35+
--maxWriteThreads
36+
Number of max threads to write to destination table
37+
Default: 1024
38+
* --readThroughputRatio
39+
Percentage of total read throughput to scan the source table
40+
Default: 0.0
41+
--section
42+
Section number to scan when running multiple programs concurrently [0,
43+
1... totalSections-1]
44+
Default: 0
45+
--sourceEndpoint
46+
Endpoint of the source table
47+
* --sourceRegion
48+
Signing region for the source endpoint
49+
* --sourceTable
50+
Name of the source table
51+
--totalSections
52+
Total number of sections to divide the scan into
53+
Default: 1
54+
* --writeThroughputRatio
55+
Percentage of total write throughput to write the destination table
56+
Default: 0.0
1557
```
1658
17-
2. This produces the target jar in the target/ directory, to start the replication process:
18-
19-
java -jar dynamodb-import-export-tool.jar
20-
21-
--destinationEndpoint <destination_endpoint> // the DynamoDB endpoint where the destination table is located.
22-
23-
--destinationTable <destination_table> // the destination table to write to.
24-
25-
--sourceEndpoint <source_endpoint> // the endpoint where the source table is located.
26-
27-
--sourceTable <source_table>// the source table to read from.
28-
29-
--readThroughputRatio <ratio_in_decimal> // the ratio of read throughput to consume from the source table.
30-
31-
--writeThroughputRatio <ratio_in_decimal> // the ratio of write throughput to consume from the destination table.
32-
33-
--maxWriteThreads <numWriteThreads> // (Optional, default=128 * Available_Processors) Maximum number of write threads to create.
34-
35-
--totalSections <numSections> // (Optional, default=1) Total number of sections to split the bootstrap into. Each application will only scan and write one section.
36-
37-
--section <sectionSequence> // (Optional, default=0) section to read and write. Only will scan this one section of all sections, [0...totalSections-1].
38-
39-
--consistentScan <boolean> // (Optional, default=false) indicates whether consistent scan should be used when reading from the source table.
59+
2. An example command you can use on one EC2 host to copy from one table `foo` in `us-east-1` to a new table
60+
called `bar` in `us-east-2` follows.
61+
62+
```bash
63+
java -jar target/dynamodb-import-export-tool-1.1.0.jar \
64+
--sourceRegion us-east-1 \
65+
--sourceTable foo \
66+
--destinationRegion us-east-2 \
67+
--destinationTable bar \
68+
--readThroughputRatio 1 \
69+
--writeThroughputRatio 1
70+
```
4071
41-
> **NOTE**: To split the replication process across multiple machines, simply use the totalSections & section command line arguments, where each machine will run one section out of [0 ... totalSections-1].
72+
> **NOTE**: To split the replication process across multiple machines, simply use the totalSections & section
73+
command line arguments, where each machine will run one section out of [0 ... totalSections-1].
4274
4375
## Using the API
76+
Find some examples of how to use the Import-Export tool's API below.
77+
The first demonstrates how to use the API to copy data from one DynamoDB table to another.
78+
The second demonstrates how to enqueue the data in a DynamoDB table in a
79+
`BlockingQueueConsumer` in memory.
4480
4581
### 1. Transfer Data from One DynamoDB Table to Another DynamoDB Table
4682
47-
The below example will read from "mySourceTable" at 100 reads per second, using 4 threads. And it will write to "myDestinationTable" at 50 writes per second, using 8 threads.
48-
Both tables are located at "dynamodb.us-west-1.amazonaws.com". (to transfer to a different region, create 2 AmazonDynamoDBClients
83+
The below example will read from "mySourceTable" at 100 reads per second, using four threads.
84+
And it will write to "myDestinationTable" at 50 writes per second, using eight threads.
85+
Both tables are located at "dynamodb.us-west-1.amazonaws.com".
86+
To transfer to a different region, create two AmazonDynamoDBClients
4987
with different endpoints to pass into the DynamoDBBootstrapWorker and the DynamoDBConsumer.
5088
5189
```java
52-
AmazonDynamoDBClient client = new AmazonDynamoDBClient(new ProfileCredentialsProvider());
53-
client.setEndpoint("dynamodb.us-west-1.amazonaws.com");
54-
55-
DynamoDBBootstrapWorker worker = null;
56-
57-
try {
58-
// 100.0 read operations per second. 4 threads to scan the table.
59-
worker = new DynamoDBBootstrapWorker(client,
60-
100.0, "mySourceTable", 4);
61-
} catch (NullReadCapacityException e) {
62-
LOGGER.error("The DynamoDB source table returned a null read capacity.", e);
63-
System.exit(1);
64-
}
65-
66-
// 50.0 write operations per second. 8 threads to scan the table.
67-
DynamoDBConsumer consumer = new DynamoDBConsumer(client, "myDestinationTable", 50.0, Executors.newFixedThreadPool(8));
68-
69-
try {
70-
worker.pipe(consumer);
71-
} catch (ExecutionException e) {
72-
LOGGER.error("Encountered exception when executing transfer.", e);
73-
System.exit(1);
74-
} catch (InterruptedException e){
75-
LOGGER.error("Interrupted when executing transfer.", e);
76-
System.exit(1);
90+
import com.amazonaws.dynamodb.bootstrap.DynamoDBBootstrapWorker;
91+
import com.amazonaws.dynamodb.bootstrap.DynamoDBConsumer;
92+
import com.amazonaws.dynamodb.bootstrap.exception.NullReadCapacityException;
93+
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
94+
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
95+
96+
import java.util.concurrent.ExecutionException;
97+
import java.util.concurrent.Executors;
98+
99+
class TransferDataFromOneTableToAnother {
100+
public static void main(String[] args) {
101+
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard()
102+
.withRegion(com.amazonaws.regions.Regions.US_WEST_1).build();
103+
DynamoDBBootstrapWorker worker = null;
104+
try {
105+
// 100.0 read operations per second. 4 threads to scan the table.
106+
worker = new DynamoDBBootstrapWorker(client,
107+
100.0, "mySourceTable", 4);
108+
} catch (NullReadCapacityException e) {
109+
System.err.println("The DynamoDB source table returned a null read capacity.");
110+
System.exit(1);
111+
}
112+
// 50.0 write operations per second. 8 threads to scan the table.
113+
DynamoDBConsumer consumer = new DynamoDBConsumer(client, "myDestinationTable", 50.0,
114+
Executors.newFixedThreadPool(8));
115+
try {
116+
worker.pipe(consumer);
117+
} catch (ExecutionException e) {
118+
System.err.println("Encountered exception when executing transfer: " + e.getMessage());
119+
System.exit(1);
120+
} catch (InterruptedException e){
121+
System.err.println("Interrupted when executing transfer: " + e.getMessage());
122+
System.exit(1);
123+
}
124+
}
77125
}
78126
```
79127
@@ -85,29 +133,40 @@ the DynamoDB entries but does not have a setup application for it. They can just
85133
to then process the new entries.
86134
87135
```java
88-
AmazonDynamoDBClient client = new AmazonDynamoDBClient(new ProfileCredentialsProvider());
89-
client.setEndpoint("dynamodb.us-west-1.amazonaws.com");
90-
91-
DynamoDBBootstrapWorker worker = null;
92-
93-
try {
94-
// 100.0 read operations per second. 4 threads to scan the table.
95-
worker = new DynamoDBBootstrapWorker(client,
96-
100.0, "mySourceTable", 4);
97-
} catch (NullReadCapacityException e) {
98-
LOGGER.error("The DynamoDB source table returned a null read capacity.", e);
99-
System.exit(1);
100-
}
101-
102-
BlockingQueueConsumer consumer = new BlockingQueueConsumer(8);
103-
104-
try {
105-
worker.pipe(consumer);
106-
} catch (ExecutionException e) {
107-
LOGGER.error("Encountered exception when executing transfer.", e);
108-
System.exit(1);
109-
} catch (InterruptedException e){
110-
LOGGER.error("Interrupted when executing transfer.", e);
111-
System.exit(1);
136+
import com.amazonaws.dynamodb.bootstrap.BlockingQueueConsumer;
137+
import com.amazonaws.dynamodb.bootstrap.DynamoDBBootstrapWorker;
138+
import com.amazonaws.dynamodb.bootstrap.exception.NullReadCapacityException;
139+
import com.amazonaws.services.dynamodbv2.AmazonDynamoDB;
140+
import com.amazonaws.services.dynamodbv2.AmazonDynamoDBClientBuilder;
141+
142+
import java.util.concurrent.ExecutionException;
143+
144+
class TransferDataFromOneTableToBlockingQueue {
145+
public static void main(String[] args) {
146+
AmazonDynamoDB client = AmazonDynamoDBClientBuilder.standard()
147+
.withRegion(com.amazonaws.regions.Regions.US_WEST_1).build();
148+
149+
DynamoDBBootstrapWorker worker = null;
150+
151+
try {
152+
// 100.0 read operations per second. 4 threads to scan the table.
153+
worker = new DynamoDBBootstrapWorker(client, 100.0, "mySourceTable", 4);
154+
} catch (NullReadCapacityException e) {
155+
System.err.println("The DynamoDB source table returned a null read capacity.");
156+
System.exit(1);
157+
}
158+
159+
BlockingQueueConsumer consumer = new BlockingQueueConsumer(8);
160+
161+
try {
162+
worker.pipe(consumer);
163+
} catch (ExecutionException e) {
164+
System.err.println("Encountered exception when executing transfer: " + e.getMessage());
165+
System.exit(1);
166+
} catch (InterruptedException e){
167+
System.err.println("Interrupted when executing transfer: " + e.getMessage());
168+
System.exit(1);
169+
}
170+
}
112171
}
113172
```

pom.xml

+19-10
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
22
<modelVersion>4.0.0</modelVersion>
33
<groupId>com.amazonaws</groupId>
4-
<version>1.0.1</version>
4+
<version>1.1.0</version>
55
<artifactId>dynamodb-import-export-tool</artifactId>
66
<packaging>jar</packaging>
77
<name>DynamoDB Import Export Tool</name>
@@ -11,14 +11,17 @@
1111
<url>https://github.com/awslabs/dynamodb-import-export-tool.git</url>
1212
</scm>
1313
<properties>
14-
<aws.java.sdk.version>1.10.10</aws.java.sdk.version>
15-
<powermock.version>1.6.2</powermock.version>
16-
<jcommander.version>1.48</jcommander.version>
17-
<guava.version>15.0</guava.version>
14+
<jdk.version>1.7</jdk.version>
15+
<aws.java.sdk.version>1.11.123</aws.java.sdk.version>
16+
<powermock.version>1.6.6</powermock.version>
17+
<jcommander.version>1.69</jcommander.version>
18+
<guava.version>21.0</guava.version>
1819
<log4j.core.version>1.2.17</log4j.core.version>
19-
<easymock.version>3.2</easymock.version>
20+
<easymock.version>3.4</easymock.version>
2021
<commons.logging.version>1.2</commons.logging.version>
21-
<maven.shade.version>2.4.1</maven.shade.version>
22+
<maven.shade.version>3.0.0</maven.shade.version>
23+
<maven.compiler.version>3.0</maven.compiler.version>
24+
<maven.gpg.version>1.6</maven.gpg.version>
2225
<gpg.skip>true</gpg.skip>
2326
</properties>
2427
<developers>
@@ -84,6 +87,11 @@
8487
<artifactId>log4j</artifactId>
8588
<version>${log4j.core.version}</version>
8689
</dependency>
90+
<dependency>
91+
<groupId>org.projectlombok</groupId>
92+
<artifactId>lombok</artifactId>
93+
<version>1.16.14</version>
94+
</dependency>
8795
<dependency>
8896
<groupId>org.powermock</groupId>
8997
<artifactId>powermock-module-junit4</artifactId>
@@ -109,14 +117,15 @@
109117
<plugin>
110118
<artifactId>maven-compiler-plugin</artifactId>
111119
<configuration>
112-
<source>1.7</source>
113-
<target>1.7</target>
120+
<source>${jdk.version}</source>
121+
<target>${jdk.version}</target>
114122
</configuration>
115-
<version>3.0</version>
123+
<version>${maven.compiler.version}</version>
116124
</plugin>
117125
<plugin>
118126
<groupId>org.apache.maven.plugins</groupId>
119127
<artifactId>maven-gpg-plugin</artifactId>
128+
<version>${maven.gpg.version}</version>
120129
<executions>
121130
<execution>
122131
<id>sign-artifacts</id>

0 commit comments

Comments
 (0)