This is a personal collection of repetitive commands and snippets for ML projects.
Specify key manually in boto3
S3 operations on boto3
SNS operations on boto3
AWS ML services
Enable static website hosting on S3
Enable hosting
aws s3 website s3://somebucket --index-document index.html
Goto Permissions > Public Access Settings > Edit
and change (Block new public bucket policies
, Block public and cross-account access if bucket has public policies
, and Block new public ACLs and uploading public objects
) to false.
Navigate to Permissions > Bucket Policy
and paste this policy.
Install OpenCV in conda
conda install -c conda-forge open-cv
Update conda
conda update -n base -c defaults conda
Make binaries work on Mac
sudo xcode-select --install
conda install clang_osx-64 clangxx_osx-64 gfortran_osx-64
Create/Update conda environment from file
conda env create -f environment.yml
conda env update -f environment.yml
Install CUDA toolkit in conda
conda install cudatoolkit=9.2 -c pytorch
conda install cudatoolkit=10.0 -c pytorch
Disable auto-activation of conda environment
conda config --set auto_activate_base false
Disable multithreading in numpy
export OMP_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
export OPENMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
Faster alternatives to Conda
Docker Image | Remarks |
---|---|
micromaba-docker | Small binary version of mamba |
condaforge/mambaforge | Docker image with conda-forge and mamba |
condaforge/miniforge | Docker image with conda-forge as default channel |
Run celery workers
File tasks.py
contains celery object, concurrency is set to 1 and no threads or process are used with -P solo
celery -A tasks.celery worker --loglevel=info -P solo
Start flower server to monitor celery
Use flower from docker-compose
Start docker-compose as daemon
docker-compose up --build -d
Use journald as logging driver
Edit /etc/docker/daemon.json
, add this json and restart.
{
"log-driver": "journald"
}
Send logs to CloudWatch
sudo nano /etc/docker/daemon.json
sudo systemctl daemon-reload
sudo service docker restart
Set environment variable globally in daemon
mkdir -p /etc/systemd/system/docker.service.d/
sudo nano /etc/systemd/system/docker.service.d/aws-credentials.conf
sudo systemctl daemon-reload
sudo service docker restart
Disable pip cache and version check
ENV PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1
Dockerfile for FastAPI
Return exit code in docker-compose
docker-compose up --abort-on-container-exit --exit-code-from worker
Change entrypoint of Dockerfile in compose
Use debugging mode
Enable CORS
Raise HTTP Exception
Run FastAPI in Jupyter Notebook
Mock ML model in test case
Test API in flask
Load model only once before first request
Load binary format in Word2Vec
from gensim.models import KeyedVectors
model = KeyedVectors.load_word2vec_format('model.bin',
binary=True)
model.most_similar('apple')
Prevent git from asking for password
git config credential.helper 'cache --timeout=1800'
Whitelist in .gitignore
Clone private repo using personal token
Create token from settings and run:
git clone https://<token>@github.com/amitness/example.git
Create alias to run command
# git test
git config --global alias.test "!python -m doctest``"
Install Git LFS
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
Triggers for GitHub Action
Useful GitHub Actions
Action | Remarks |
---|---|
scrape.yml | Scrap webpage and save to repo |
Increase timeout
gunicorn --bind 0.0.0.0:5000 main:app --timeout 6000
Check error logs
tail -f /var/log/gunicorn/error_
Run two workers
gunicorn main:app --preload -w 2 -b 0.0.0.0:5000
Use pseudo-threads
If CPU cores=1
, then suggested concurrency = 2*1+1=3
gunicorn main:app --worker-class=gevent --worker-connections=1000 --workers=3
Use multiple threads
If CPU cores=4
, then suggested concurrency = 2*4+1=9
gunicorn main:app --workers=3 --threads=3
Use in-memory file system for heartbeat file
gunicorn --worker-tmp-dir /dev/shm
Add background task to add 2 numbers
Auto-import common libraries
startup
folder in ~/.ipython/profile_default
start.py
# start.py
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Make auto-reload of modules by default
startup
folder in ~/.ipython/profile_default
Auto print all expressions
Edit ~/.ipython/profile_default/ipython_config.py
and add
Add conda kernel to jupyter
Activate conda environment and run below command.
pip install --user ipykernel
python -m ipykernel install --user --name=condaenv
Add R kernel to jupyter
conda install -c r r-irkernel
# Link to fix issue with readline
cd /lib/x86_64-linux-gnu/
sudo ln -s libreadline.so.7.0 libreadline.so.6
Start notebook on remote server
jupyter notebook --ip=0.0.0.0 --no-browser
Serve as voila app
voila --port=$PORT --no-browser app.ipynb
Enable widgets in jupyter lab
pip install jupyterlab
pip install ipywidgets
jupyter nbextension enable --py widgetsnbextension
jupyter labextension install @jupyter-widgets/jupyterlab-manager
Switch to language server in jupyter lab
pip install — pre jupyter-lsp
jupyter labextension install @krassowski/jupyterlab-lsp
pip install python-language-server[all]
Add kaggle credentials
pip install --upgrade kaggle kaggle-cli
mkdir ~/.kaggle
mv kaggle.json ~/.kaggle
chmod 600 /root/.kaggle/kaggle.json
Zip a folder
zip -r folder.zip folder
Use remote server as VPN
ssh -D 8888 -f -C -q -N [email protected]
SSH Tunneling for multiple ports (5555, 5556)
ssh -N -f -L localhost:5555:127.0.0.1:5555 -L localhost:5556:127.0.0.1:5556 [email protected]
Reverse SSH tunneling
Enable GatewayPorts=yes
in /etc/ssh/sshd_config
on server.
ssh -NT -R example.com:5000:localhost:5000 [email protected] -i ~/.ssh/xyz.pem -o GatewayPorts=yes
Copy remote files to local
scp [email protected]:/mnt/file.zip .
Set correct permission for PEM file
chmod 400 credentials.pem
Clear DNS cache
sudo service network-manager restart
sudo service dns-clean
sudo systemctl restart dnsmasq
sudo iptables -F
Unzip .xz file
sudo apt-get install xz-utils
unxz ne.txt.xz
Disable password-based login on server
Edit this file and set PasswordAuthentication
to no
sudo nano /etc/ssh/sshd_config
Auto-generate help for make files
Rebind prefix for tmux
Edit ~/.tmux.conf
with below content and reload by running tmux source-file ~/.tmux.conf
Clear DNS cache
sudo systemd-resolve --flush-caches
Reset GPU
sudo rmmod nvidia_uvm
sudo modprobe nvidia_uvm
Add comparison of code blocks side by side
Solution
Assign path to port
location /demo/ {
proxy_pass http://localhost:5000/;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
}
Increase timeout for nginx
Default timeout is 60s. Run below command or use alternative.
sudo nano /etc/nginx/proxy_params
Setup nginx for prodigy
Get list of all POS tags
import nltk
nltk.download('tagsets')
nltk.help.upenn_tagset()
Upgrade to latest node version
npm cache clean -f
npm install -g n
n stable
Save with quoted strings
df.to_csv('data.csv',
index=False,
quotechar='"',
quoting=csv.QUOTE_NONNUMERIC)
Import database dump
If database name is test
and user is postgres
.
pg_restore -U postgres -d test < example.dump
Add keyboard shortcut for custom command
Link
Enable pytest as default test runner
Allow camel case field name from frontend
Validate fields
Install build utilities
sudo apt update
sudo apt install build-essential python3-dev
sudo apt install python-pip virtualenv
Install mysqlclient
sudo apt install libmysqlclient-dev mysql-server
pip install mysqlclient
Get memory usage of python script
import os
import psutil
process = psutil.Process(os.getpid())
print(process.memory_info().rss)
Convert python package to command line tool
Install package from TestPyPi
pip install --index-url https://test.pypi.org/simple
--extra-index-url https://pypi.org/simple
example-package
Test multiple python versions using tox
Flake8: Exclude certain checks
Place setup.cfg
alongside setup.py
.
Send email with SMTP
less secure app access
in settings of gmail.
Run selenium on chromium
sudo apt update
sudo apt install chromium-chromedriver
cp /usr/lib/chromium-browser/chromedriver /usr/bin
pip install selenium
Generate fake user agent in selenium
Run pip install fake_useragent
.
Install CPU-only version of PyTorch
conda install pytorch torchvision cpuonly -c pytorch
Auto-select proper pytorch version based on GPU
pip install light-the-torch
ltt install torch torchvision
Set random seed
seed = 42
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
Create custom transformation
Disable warnings in pytest
Use model checkpoint callback
Connect to redis from commandline
redis-cli -h 1.1.1.1 -p 6380 -a password
Connect to local redis
Use URL for redis
If password is present:
redis://:{password}@{hostname}:{port}/{db_number}
else
redis://{hostname}:{port}/{db_number}
Add password to redis server
Edit /etc/redis/redis.conf
file.
sudo nano /etc/redis/redis.conf
Uncomment this line and set password.
# requirepass yourpassword
Restart redis server.
sudo service redis-server restart
Enable cluster mode locally
/etc/redis/redis.conf
file.
sudo nano /etc/redis/redis.conf
Uncomment this line and save file.
sudo service redis-server restart
Post JSON data to endpoint
import requests
headers = {'Content-Type': 'application/json'}
data = {}
response = requests.post('http://example.com',
json=data,
headers=headers)
Use random user agent in requests
Use rate limit and backoff in API
Add server alias to SSH config
Add to ~/.ssh/config
Disable CORS
Create ~/.streamlit/config.toml
[server]
enableCORS = false
File Uploader
file = st.file_uploader("Upload file",
type=['csv', 'xlsx'],
encoding='latin-1')
df = pd.read_csv(file)
Create download link for CSV file
Run on docker
Docker compose for streamlit
Add Dockerfile
to app folder.
Add project.conf
to nginx folder.
Add Dockerfile
to nginx folder.
Add docker-compose.yml
at the root
Run on heroku
Add requirements.txt
, create Procfile and setup.sh.
Deploy streamlit on google cloud
Create Dockerfile, app.yaml and run:
gcloud config set project your_projectname
gcloud app deploy
Render SVG
Install CPU-only version of Tensorflow
conda install tensorflow-mkl
or
pip install tensorflow-cpu==2.1.0
Install custom builds for CPU
Find link from https://github.com/lakshayg/tensorflow-build
pip install --ignore-installed --upgrade "url"
Install with GPU support
conda create --name tensorflow-22 \
tensorflow-gpu=2.2 \
cudatoolkit=10.1 \
cudnn=7.6 \
python=3.8 \
pip=20.0
Use only single GPU
export CUDA_VISIBLE_DEVICES=0
Allocate memory as needed
export TF_FORCE_GPU_ALLOW_GROWTH='true'
Enable XLA
import tensorflow as tf
tf.config.optimizer.set_jit(True)
Load saved model with custom layer
Ensure Conda doesn’t cause tensorflow issue
Upload tensorboard data to cloud
tensorboard dev upload --logdir ./logs \
--name "XYZ" \
--description "some model"
Use TPU in Keras
TPU survival guide on Google Colaboratory
Use universal sentence encoder
Backtranslate a text