Skip to content

Loader speed optimization #186

Open
Open
@alok87

Description

@alok87

Redshift Cluster Spec

  • Cluster CPU Utilisation: ~50%
  • Cluster resources: ra3.xlplus/2nodes

Load Speed

maxSizePerBatch LoadMinutes GB/Hour
0.5 3 10
0.5 5 6
0.5 8 3.75
1 10 6
1 9 6.67
1 7 8.57
1 7 8.57
1 6 10
4 21 11.43
4 21 11.43
0.5 4 7.5

The load speed reduces when multiple loads are happening and the max Speed is seen around 11.5GB per hour.

Division of time taken in the load task

Below example is for 8GB maxSizePerBatch

I0401 08:05:12.574673       1 load_processor.go:739] ts.inventory.customers, batchId:1, size:16389: processing...
I0401 08:05:12.574702       1 load_processor.go:646] ts.inventory.customers, batchId:1, startOffset:57150
I0401 08:05:13.119588       1 load_processor.go:701] ts.inventory.customers, load staging
I0401 08:05:21.538138       1 redshift.go:868] Running: COPY from s3 to: customers_ts_adx_reload_staged
I0401 08:34:51.824170       1 load_processor.go:212] ts.inventory.customers, copied staging
I0401 08:36:45.631030       1 load_processor.go:235] ts.inventory.customers, deduped
I0401 08:40:02.744066       1 load_processor.go:254] ts.inventory.customers, deleted common
I0401 08:40:04.206752       1 load_processor.go:273] ts.inventory.customers, deleted delete-op
I0401 08:40:04.216792       1 redshift.go:817] Running: UNLOAD from customers_ts_adx_reload_staged to s3
I0401 08:43:33.241421       1 load_processor.go:323] ts.inventory.customers, unloaded
I0401 08:43:33.241453       1 redshift.go:868] Running: COPY from s3 to: customers_ts_adx_reload
I0401 08:49:11.932393       1 load_processor.go:339] ts.inventory.customers, copied
I0401 08:49:19.985916       1 load_processor.go:151] ts.inventory.customers, offset: 73539, marking
I0401 08:49:19.985935       1 load_processor.go:158] ts.inventory.customers, offset: 73539, marked
I0401 08:49:19.985939       1 load_processor.go:161] ts.inventory.customers, committing (autoCommit=false)
I0401 08:49:19.987312       1 load_processor.go:163] ts.inventory.customers, committed (autoCommit=false)
I0401 08:49:19.987344       1 load_processor.go:768] ts.inventory.customers, batchId:1, size:16389, end:73538:, processed in 44m
Task TimeTaken %
load staging 29mins 65.9%
merge/dedupe 2mins. 4.4%
merge/deleteCommon 4mins 8.8%
merge/deleteOp. 2 seconds 0.07%
unload 3mins 6.8%
load target 6mins 13.6%

Need to find the optimization area and work on optimizing the speed.
Can we load at 100GB/hour?

Metadata

Metadata

Assignees

No one assigned

    Labels

    p1priority 1, do it ASAPperformanceMonitoring, Metrics, Logs, Benchmarks

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions