how can I ues Dataset to shuffle a large whole dataset?
to How can Dataset large shuffle
2023-09-11 14:15:27 时间
The Dataset.shuffle() implementation is designed for data that could be shuffled in memory; we're considering whether to add support for external-memory shuffles, but this is in the early stages. In case it works for you, here's the usual approach we use when the data are too large to fit in memory:
Randomly shuffle the entire data once using a MapReduce/Spark/Beam/etc. job to create a set of roughly equal-sized files ("shards").
In each epoch:
- Randomly shuffle the list of shard filenames, using Dataset.list_files(...).shuffle(num_shards).
- Use dataset.interleave(lambda filename: tf.data.TextLineDataset(filename), cycle_length=N) to mix together records from N different shards.
- Use dataset.shuffle(B) to shuffle the resulting dataset. Setting B might require some experimentation, but you will probably want to set it to some value larger than the number of records in a single shard.
相关文章
- 编写高质量代码改善java程序的151个建议——[52-57]String !about String How to use them?
- [Javascript] Broadcaster + Operator + Listener pattern -- 6. Create a Buffer to Pair Values Together with Zip
- [Cycle.js] From toy DOM Driver to real DOM Driver
- max virtual memory areas vm.max_map_count [65530] likely too low, increase to at least [262144]
- Loadrunner执行https报错Action.c(7): Error -27778: SSL protocol error when attempting to connect with hos
- [WASM Rust] Use the js-sys Crate to Invoke Global APIs Available in Any JavaScript Environment
- [RxJS] Add debug method to Observable in TypeScript
- linux gcc 编译报错undefined reference to `pow‘解决
- ErrorUnable to tunnel through proxy. Proxy returns HTTP1.1 400 Bad Reques
- how to debug Opportunity change implementation - entry onOKParticipantDialog
- How to monitor your mobile application network traffic in your own LAPTOP
- How to find implementation of Requirement defined in Pricing Procedure
- How to get user parameter settings
- association in CDS view is converted to LEFT OUTER MANY TO ONE JOIN in the runtime
- How to add a custom UI component to service order overview page
- How to control the product ID determination logic
- how to find the original page containing a given image
- ng-repeat part1 - how UI is rendered from {{name}} to actual value
- EVO HTML to PDF Converter for .Net Crack
- You currently don‘t have access to this membership resource. To resolve this issue, agree to the lat
- 【编程实践】如何使用 SQL 函数 How To Use Functions in SQL
- How To Install WildFly as a Service on Linux
- How to Use NSLog to Debug CGRect and CGPoint
- How to Manage Space of The FAST RECOVERY AREA