For testing transforms with S3 we are using Minio, which can be installed on Linux, macOS and Windows. Here we are assuming Mac usage, refer to documentation above for other platforms.
The simplest way to install Minio on Mac is using Homebrew. Use the following command:
brew install minio/stable/minioIn addition to the Minio server install the latest stable MinIO cli using
brew install minio/stable/mcNow you can start Minio server using the following command:
minio server startWhen it starts you can connect to the server UI using the following address: http://localhost:9000
The default user name/password is minioadmin|minioadmin
Populating Minio server with test data can be done using mc. First configure mc to work with the local
Minio server:
mc alias set local http://127.0.0.1:9000 minioadmin minioadminThis set an alias local to 'mc' connected to the local Minio server instance. Now we can use our
mc instance to populate server using a set of
commands provided by mc.
First test the connection to the newly added MinIO deployment using the mc admin info command:
mc admin info localTo copy the data to Minio, you first need to create a bucket:
mc mb local/testOnce the bucket is created, you can copy files (assuming you are in the transforms directory), using:
mc cp --recursive tools/ingest2parquet/test-data/input/ local/test/ingest2parquet/input
mc cp --recursive code/code_quality/test-data/input/ local/test/code_quality/input
mc cp --recursive code/proglang_select/test-data/input/ local/test/proglang_select/input
mc cp --recursive code/proglang_select/test-data/languages/ local/test/proglang_select/languages
mc cp --recursive code/malware/test-data/input/ local/test/malware/input
mc cp --recursive language/doc_quality/test-data/input/ local/test/doc_quality/input
mc cp --recursive language/lang_id/ray/test-data/input/ local/test/lang_id/input
mc cp --recursive universal/blocklist/test-data/input/ local/test/blocklist/input
mc cp --recursive universal/blocklist/test-data/domains/ local/test/blocklist/domains
mc cp --recursive universal/doc_id/test-data/input/ local/test/doc_id/input
mc cp --recursive universal/ededup/test-data/input/ local/test/ededup/input
mc cp --recursive universal/fdedup/test-data/input/ local/test/fdedup/input
mc cp --recursive universal/filter/test-data/input/ local/test/filter/input
mc cp --recursive universal/noop/test-data/input/ local/test/noop/input
mc cp --recursive universal/resize/test-data/input/ local/test/resize/input
mc cp --recursive universal/tokenization/test-data/ds01/input/ local/test/tokenization/ds01/input
mc cp --recursive universal/tokenization/test-data/ds02/input/ local/test/tokenization/ds02/inputNote, that once the data is copied, Minio is storing it on the local file system, so you do not need to copy it again after cluster restart
The last thing is to add Minio access and secret keys for accessing it. The following command:
mc admin user svcacct add --access-key "localminioaccesskey" --secret-key "localminiosecretkey" local minioadmincreates both access and secret key for usage by the applications