Amazon Web Services (AWS)¶
First steps on AWS¶
Do you need to start working on AWS, but you don’t know what to do? Well, this is the right page for you! Here we describe the first steps you need to follow to properly configure your AWS account to reach our security standards, but you’ll also learn how to set your local machine to connect to Nextbit AWS resources.
Create a new AWS user¶
As you probably think, you cannot create it by yourself. Please ask FPagani to create it for you.
First log-in¶
To log in the AWS Console go to AWS homepage then “Sign in to the console” at the top right of the page.
Fill the module with the information FPagani gave to you, then change the password with a personal one. Password policy:
Must be at least 8 characters long
Must include at least one uppercase letter
Must include at least one lowercase letter
Must include at least one number
Must include at least one non-alphanumeric character
Now you should be able to log into the AWS Console.
Enable Virtual MFA¶
To ensure Nextbit’s data and resources security inside AWS, you should immediately activate the Multi-Factor Authentication (MFA) on your user. You’ll need:
your AWS credentials
your Nextbit smartphone with Microsoft Authenticator app installed (any other authenticator app can be fine, but to keep all Nextbit MFA organized please use this one)
AWS provides you this guide to easily set a virtual MFA. A short summary:
log in the AWS Management Console
click on your username at the top of the page, then “My Security Credentials”
scroll to reach “Multi-factor authentication (MFA)”
click on “Assign MFA device” -> “Virtual MFA device”
open Microsoft Authenticator app on your Nextbit smartphone:
click on the three points at the top right, then “Add account” -> “Other account”
scan the QR code you see on laptop screen
write two consecutive MFA codes
click on the blue button “Activate Virtual MFA”
Generate Access Keys¶
Access keys are pairs of strings, built to enable programmatic access to AWS resources. AWS documentation provides a page to teach how to create and manage access keys.
A short summary:
log in the AWS Management Console
click on your username at the top of the page, then “My Security Credentials”
scroll to reach “Access keys for CLI, SDK, & API access”
click on “Create access key”, then “Download .csv file”
Store your access keys in a safe place.
Configure AWS CLI¶
To connect to AWS resources using a terminal, AWS provides a Command Line Interface (CLI). You can follow the official guide to install it.
Now you need to configure it, inserting your Access Key and Secret Access Key. The official documentation explains very well how to do that. Just note that, while access keys are required to properly configure AWS CLI, default region and output format are optional: you can skip those configurations simply using the Enter button.
In case you need to configure multiple profiles on the same local machine, please check this page.
Upload files on AWS Simple Storage Service (S3)¶
AWS Simple Storage Service (S3) is an object storage service. A first glance can be found at the official presentation page.
To move your first steps with AWS S3, we suggest you read the official guide, where you can find operative instructions to:
create a new bucket
upload objects
manage objects: download, move, delete
Storage classes¶
Leaving the defaults in uploading objects, AWS S3 stores your data in the STANDARD storage class. You may decide to change it, to save some money asking for less availability or redundancy. Look at the comparison of the main storage classes for an official reference. We usually choose between STANDARD, STANDARD-IA and ONEZONE-IA. Never use GLACIER nor DEEP-ARCHIVE, since they are special classes for long (truly long) retention periods and they are not suitable for a quick retrieval.
Other advanced settings¶
The default settings are suitable for most of your cases. Be careful to modify any of those: many of them (e.g. encryption, versioning, …) are separately charged. Set them if and only if you really need them and you are aware of the pricing method.
Pricing¶
S3 storage is priced based on:
region: some regions are cheaper than others. Region itself is not charged, but it affects the following fees;
storage amount and length: storage is charged per GB/month. Fractions are billed as well;
storage classes: each storage class has a different fee per GB;
requests: each thousand of requests to your data is charged (list, put, copy, …);
data transfer out of S3: fee per GB transferred to other AWS services (e.g. S3) or to the Internet (e.g. to your local machine);
eventual additional services: e.g. versioning, object tagging.
To know better how S3 pricing works, we suggest you visit the official S3 pricing page.
Launch a Virtual Machine using AWS Elastic Compute Cloud (EC2)¶
AWS Elastic Compute Cloud (EC2) is the main compute service of AWS. It allows you to rent Virtual Machines on demand, paying only for the time they are running. Look at the AWS official presentation page.
To move your first steps with AWS EC2, we suggest you read the official guide, where you can find operative instructions to:
launch a Linux-based instance
connect to your instance using an SSH client
If you downloaded a new key pair, remember to keep it in a safe place. It will be your only way to connect to your EC2 instance.
Connect to your instance using SSH¶
Once you have launched a Linux-based EC2 instance, you have to connect to it using SSH. From the EC2 Management console, select your brand new instance, click on the “Connect” gray button at the top of the page and follow the guide.
A short resume:
open a new terminal window
go to key pair folder
change permissions of the key pair writing
chmod 400 your_key_name.pem
(needed only the first time you use this key pair)connect to EC2 writing
ssh -i your_key_name.pem user_name@public_dns_name
, whereuser_name
may beec2-user
(Amazon Linux AMI) orubuntu
(Nvidia Deep Learning AMI).
Your terminal will now correspond with your instance terminal.
Note: to locate specific information about your EC2 instance, follow this link.
Security groups¶
To ensure connections to our EC2, we are used to connect to them within the VPN. At this regard, we have to set up a security group that allows inbound traffic only coming from our VPN. In some regions (e.g. North Virginia) this has already been created, while in other it is still missing.
If you need to create our “VPN” security group, write these inbound rules:
Type: SSH; Protocol: TCP; Port: 22; Source: 35.204.112.170/32
Type: Custom TCP rule; Protocol: TCP; Port: 6006; Source: 35.204.112.170/32
The first rule is compulsory to allow SSH connections, while the second one is not, since it allows requests on Tensorboard port to monitor deep learning models during training.
You can leave outbound rules as default.
Pricing¶
EC2 instances pricing is based on:
region: some regions are cheaper than others. Region itself is not charged, but it affects the following fees;
instance type: the more powerful is the instance, the more it will cost in running time;
running time: fee per second (rarely hours) of working instance. This fee doesn’t apply when you stop the instance;
provisioned storage on EBS disk: root disks are charged depending on the dimension you asked them, even if they are empty. This fee applies also to stopped instances;
data transfer: fee per GB transferred to other AWS services (e.g. S3) or to the Internet (e.g. to your local machine)
eventual license required by AMI: it increases the fee per running second (e.g. in this AMI).
Summing up, a few guidelines:
choose a cheap region (like North Virginia), unless you have constraints about data property or latency;
do not use powerful EC2 types to simply transfer data inside AWS, a
t3.micro
should be enough;stop your instance once your training has finished and you downloaded the results;
terminate your instance when you won’t need it for a long time (by default AWS will delete root EBS on termination);
delete additional EBS disks once you don’t need them.
Details on pricing can be found in EC2 pricing page or EBS pricing page.
Training a model on AWS¶
All deep learning projects can be trained and deployed on an NVIDIA GPU Cloud (NGC) optimized instance. Here we list the necessary steps to configure it on a AWS EC2 g4dn.xlarge instance on the NVIDIA Deep Learning AMI environment.
Log in via SSH following the instructions on the EC2 Management Dashboard.
Clone via HTTPS the repo
my_gitlab_repo
in the home directory, using a GitLab token as passwordDownload the most updated PyTorch container running
docker pull nvcr.io/nvidia/pytorch:YY.MM-py3
(substitute “YY” and “MM” checking for monthly updates here)Create a container with .. code-block:: guess
- docker run –gpus all –name my_amazing_container -e HOME=$HOME -e USER=$USER
-v $HOME:$HOME -p 6006:6006 –shm-size 60G -it nvcr.io/nvidia/pytorch:YY.MM-py3
(substitute “YY” and “MM” coherently with the
docker pull
command above)
At the end of the procedure you will gain access to a terminal on a Docker
container configured to work on the GPU.
Check whether you are in the container $HOME
directory or in a subdir:
sometimes Docker creation will redirect you to a directory named workspace
.
Now you can simply train your model leveraging the speed of parallel computing.
The $HOME
directory on the Docker container is linked to the $HOME
directory
of the host machine, so the repository can be found in the $HOME
; similarly the
port 6006 used by TensorBoard is remapped from the container to the port 6006
of the host machine.
If you need to mount a disk on your EC2, you’ll have to map also that directory to
a correspondence in the Docker container, writing also -v mounted_dir:mounted_dir
.
For any reference about docker
options, please read
this page.
Useful commands to interact with the Docker container are:
docker start -d my_amazing_container
: start the container;docker exec -it my_amazing_container bash
: open a terminal on the container;docker stop my_amazing_container
: stop the container;docker rm my_amazing_container
: remove the container.
In order to monitor training you can run the following commands from the container console:
watch -n 1 nvidia-smi
to monitor GPU usage;tensorboard --logdir my_gitlab_repo/runs/<run_id> --bind_all
to start Tensorboard.