[Datasync] Synchronization method to S3

Print

Question 

EFS -> DataSync -> S3 is scheduled every hour for data ingestion (synchronization). 

Q1. Synchronization behavior for newly added files

For example, if 10 files are synchronized today and 2 more files are added tomorrow, will only the 2 new files be read and synchronized? Or will synchronization be performed on all 12 files?

Q2. Validation of changes during synchronization

In the scenario above, when synchronization occurs, is there any validation or checking for updates to the existing files? This is important to confirm, as S3 usage fees are charged based on the number of list, copy, put, and post requests.

Answer 

A1.
If there are no changes to the existing 10 files, only the 2 newly created files will be transferred. However, if any of the existing 10 files have been modified, those files will also be transferred.

Note that the modified files are not transferred incrementally; instead, the entire file is re-uploaded.

(This is because S3 is an object storage, and when an object is stored, the entire object must be uploaded again.)

A2.
DataSync compares the source and destination storage to identify differences. For EFS, there is no cost for file listing. However, for S3, LIST requests will generate corresponding charges according to S3 pricing.

Czy ta odpowiedź była pomocna? Tak Nie

Wyślij opinię
Przykro nam, że nie mogliśmy Ci pomóc. Pomóż nam dopracować ten artykuł, pozostawiając informacje zwrotne.