This page contains useful libraries I’ve found when working on Data Science projects.
The libraries are organized below by phases of a typical Data Science project.
Phase: Data
Data Annotation
| Category | Tool | Remarks |
|---|---|---|
| Audio | audio-annotator, audiono | |
| General | superintendent, pigeon | Annotate in notebooks |
| labelstudio | Open Source Data Labeling Tool | |
| awesome-data-labeling | ||
| Image | makesense.ai, labelimg, via, cvat | |
| Text | doccano, brat | |
| chatito | Generate text datasets using DSL | |
| prodigy | Paid | |
| Inter-rater agreement | disagree | |
| simpledorff | Krippendorff’s Alpha |
Data Collection
Importing Data
| Category | Tool | Remarks |
|---|---|---|
| Prebuilt | openml, lineflow | |
| rs_datasets | Recommendation Datasets | |
| nlp | Python interface to NLP datasets | |
| tensorflow_datasets | Access datasets in Tensorflow | |
| hub | Prebuild datasets for PyTorch and Tensorflow | |
| pydataset | ||
| ir_datasets | Information Retrieval Datasets | |
| App Store | google-play-scraper | |
| Arxiv | pyarxiv | Programmatic access to arxiv.org |
| Audio | pydub | |
| Crawling | MechanicalSoup, libextract | |
| pyppeteer | Chrome Automation | |
| trafilatura | Extract text sections from HTML | |
| justext | Remove boilerplate from scraped HTML | |
| hext | DSL for extracting data from HTML | |
| ratelimit | API rate limit decorator | |
| backoff | Exponential backoff and jitter | |
| asks | Async version of requests | |
| requests-cache | Cached version of requests | |
| html2text | Convert HTML to markdown-formatted plain text | |
| Database | blaze | Pandas and Numpy interface to databases |
| talon | ||
| Excel | openpyxl | |
| Google Drive | gdown, pydrive | |
| Google Maps | geo-heatmap | |
| Google Search | googlesearch | Parse google search results |
| Google Sheets | gspread | |
| Google Ngrams | google-ngram-downloader | |
| HTML | python-readability, html-text | HTML to Text |
| Image | py-image-dataset-generator, idt, jmd-imagescraper | Auto fetch images from web for certain search |
| Video | moviepy | Edit Videos |
| pytube | Download youtube vidoes | |
| Lyrics | lyricsgenius | |
| Machine Translation Corpus | mtdata | |
| News | news-please, news-catcher | Scrap News |
| pygooglenews | Google News | |
| Network Packet | dpkt, scapy | |
| camelot, tabula-py, parsr, pdftotext, pdfplumber, pymupdf | ||
| grobid | Parse PDF into structured XML | |
| PyPDF2 | Read and write PDF in Python | |
| pdf2image | Convert PDF to image | |
| Remote file | smart_open | |
| Text to Speech | gtts | |
| twint, tweepy, twarc | Scrape Twitter | |
| Wikipedia | wikipedia, wikitextparser | Access data from wikipedia |
| wikitables | Import table from wikipedia article | |
| Wikidata | wikidata | Python API to wikidata |
| XML | xmltodict | Parse XML as python dictionary |
| YouTube | scrapetube | Scrape video metadata from channel |
Data Augmentation
| Category | Tool | Remarks |
|---|---|---|
| Audio | audiomentations, muda | |
| Image | imgaug, albumentations, augmentor, solt | |
| deepaugment | Automatic augmentation | |
| TextRecognitionDataGenerator, genalog | OCR | |
| Tabular data | deltapy | |
| mockaroo | Generate synthetic user details | |
| Text | nlpaug, noisemix, textattack, textaugment, niacin, SeaQuBe, DataAug4NLP, NL-Augmenter | |
| fastent | Expand NER entity list |
Phase: Exploration
Data Preparation
| Category | Tool | Remarks |
|---|---|---|
| Class Imbalance | imblearn | |
| Categorical encoding | category_encoders | |
| dirty_cat | Encode cateogories with typos | |
| Dataframe | cudf | Pandas on GPU |
| Data Validation | pandera, pandas-profiling | Pandas |
| Data Cleaning | pyjanitor | Janitor ported to python |
| Graph Sampling | little ball of fur | |
| Missing values | missingno | |
| Parallelize | pandarallel, swifter, modin | Parallelize pandas |
| vaex | Pandas on huge data | |
| numba | Parallelize numpy | |
| Parsing | pyparsing, parse | |
| Split images into train/validation/test | split-folders | |
| Submodular Optimization | twinning, apricot | |
| Weak Supervision | snorkel |
Data Exploration
| Category | Tool | Remarks |
|---|---|---|
| Explore Data | sweetviz, dataprep, quickda, vizidata | Generate quick visualizations of data |
| ipyplot | Plot images | |
| Notebook Tools | nbdime | View Jupyter notebooks through CLI |
| papermill | Parametrize notebooks | |
| nbformat | Access notebooks programatically | |
| nbconvert | Convert notebooks to other formats | |
| ipyleaflet | Maps in notebooks | |
| ipycanvas | Draw diagrams in notebook | |
| fastdoc | Convert notebook to PDF book | |
| Relationship | ppscore | Predictive Power Score |
| pdpbox | Partial Dependence Plot |
Feature Generation
| Category | Tool | Remarks |
|---|---|---|
| Automatic feature engineering | featuretools, autopandas | |
| tsfresh | Automatic feature engineering for time series | |
| DAG based dataset generation | DFFML | |
| Dimensionality reduction | fbpca, fitsne, trimap | |
| Metric learning | metric-learn, pytorch-metric-learning | |
| Time series | python-holidays | List of holidays |
| skits | Transformation for time-series data | |
| catch22 | Pre-built features for time-series data |
Phase: Modeling
Model Selection
| Category | Tool | Remarks |
|---|---|---|
| Project Structure | cookiecutter-data-science | |
| Find SOTA models | sotawhat, papers-with-code, codalab, nlpprogress, evalai, collectiveknowledge, sotabench | Benchmarks |
| bert-related-papers | BERT Papers | |
| survey-papers | Collection of survey papers | |
| Pretrained models | modeldepot, pytorch-hub | General |
| pretrained-models.pytorch, pytorchcv | Pre-trained ConvNets | |
| pytorch-image-models | 200+ pretrained ConvNet backbones | |
| huggingface-models, huggingface-pretrained | Transformer Models | |
| awesome-models | Pretrained CoreML models | |
| huggingface-languages | Multi-lingual Models | |
| model-forge, The Super Duper NLP Repo | Pre-trained NLP models by usecase | |
| AutoML | auto-sklearn, mljar-supervised, automl-gs, pycaret, evalml | |
| lazypredict | Run all sklearn models at once | |
| tpot | Genetic AutoML | |
| autocat | Auto-generate text classification models in spacy | |
| mindsdb, lugwig | Autogenerate ML code | |
| Active Learning | modal | |
| Anomaly detection | adtk | |
| Contrastive Learning | contrastive-learner | |
| Deep Clustering | deep-clustering-toolbox | |
| Few Shot Learning | keras-fewshotlearning | |
| Fuzzy Learning | fylearn, scikit-fuzzy | |
| Genetic Programming | gplearn | |
| Gradient Boosting | catboost, xgboost, ngboost | |
| lightgbm, thunderbm | GPU Capable | |
| Graph Neural Networks | spektral | GNN for Keras |
| Graph Embedding and Community Detection | karateclub, python-louvain, communities | |
| Hidden Markov Models | hmmlearn | |
| Interpretable Models | imodels | Models that show rules |
| Multi-view Learning | mvlearn | |
| Noisy Label Learning | cleanlab | |
| Optimization | nevergrad | Gradient Free Optimization |
| cvxpy | Convex Optimization | |
| Optimal Transport | pot, geomloss | |
| Probabilistic modeling | pomegranate, pymc3 | |
| Rule based classifier | sklearn-expertsys | |
| Self-Supervised Learning | lightly, vissl, solo-learn | Implementations of SSL models |
| self_supervised | Self-supervised models in Fast.AI | |
| Spiking Neural Network | norse | |
| Support Vector Machines | thundersvm | Run SVM on GPU |
| Survival Analysis | lifelines |
Frameworks
| Category | Tool | Remarks |
|---|---|---|
| Addons | mlxtend | Extra utilities not present in frameworks |
| tensor-sensor | Visualize tensors | |
| Pytorch | pytorch-summary | Keras-like summary |
| torchtyping, tsalib | Type annotation for tensors | |
| einops | Einstein Notation | |
| kornia | Computer Vision Methods | |
| nonechucks | Drop corrupt data automatically in DataLoader | |
| pytorch-optimizer | Collection of optimizers | |
| pytorch-block-sparse | Sparse matrix replacement for nn.Linear | |
| pytorch-forecasting | Time series forecasting in PyTorch lightning | |
| pytorch-lightning | Lightweight wrapper for PyTorch | |
| skorch | Wrap pytorch in scikit-learn compatible API | |
| torchcontrib | SOTA Bulding Blocks in PyTorch | |
| bitsandbytes | 8-bit optimizers for PyTorch | |
| Scikit-learn | scikit-lego, iterative-stratification | |
| iterstrat | Cross-validation for multi-label data | |
| scikit-multilearn | Multi-label classification | |
| tscv | Time-series cross-validation | |
| Sparsification | sparseml | Apply sparsification to any framework |
| Tensorflow | tensorflow-addons | |
| keras-radam | RADAM optimizer | |
| ktrain | FastAI like interface for keras | |
| larq | Binarized neural networks | |
| scikeras | Scikit-learn Wrapper for Keras | |
| tavolo | Kaggle Tricks as Keras Layers | |
| tensorflow-text | Addons for NLP | |
| tensorflow-wheels | Optimized wheels for Tensorflow | |
| tf-sha-rnn |
Natural Language Processing
| Category | Tool | Remarks |
|---|---|---|
| Libraries | spacy , nltk, corenlp, deeppavlov, kashgari, transformers, ernie, stanza, nlp-architect, spark-nlp, pytext, FARM | |
| headliner, txt2txt | Sequence to sequence models | |
| Nvidia NeMo | Toolkit for ASR, NLP and TTS | |
| nlu | 1-line models for NLP | |
| pyconverse | Conversational Text Analysis | |
| booknlp | NLP for Books | |
| fast-bert, simpletransformers | Wrappers | |
| finetune | Scikit-learn like API for transformers | |
| compromise | Javascript NLP | |
| CPU-optimizations | turbo_transformers, onnx_transformers | |
| fastT5 | Generate optimized T5 model | |
| Preprocessing | textacy, texthero, textpipe, nlpretext | |
| JamSpell, pyhunspell, pyspellchecker, cython_hunspell, hunspell-dictionaries, autocorrect (can add more languages), symspellpy, spello (train your own spelling correction), contextualSpellCheck, neuspell, nlprule, spylls | Spelling Correction | |
| gramformer | Grammar Checker | |
| language-tool-python, gingerit | Grammatical Error Correction | |
| ekphrasis | Pre-processing for social media texts | |
| editop | Compute edit-operations for text normalization | |
| contractions, pycontractions | Contraction Mapping | |
| truecase | Fix casing | |
| nnsplit, deepsegment, sentence-doctor, pysbd, sentence-splitter | Sentence Segmentation | |
| wordninja | Probabilistic Word Segmentation | |
| punctuator2 | Punctuation Restoration | |
| stopwords-iso | Stopwords for all languages | |
| language-check, langdetect, polyglot, pycld2, cld2, cld3, langid, lumi_language_id | Language Identification | |
| langcodes | Get language from language code | |
| neuralcoref | Coreference Resolution | |
| inflect, lemminflect, pyinflect | Inflections | |
| scrubadub | PID removal | |
| ftfy, clean-text,text-unidecode | Fix Unicode Issues | |
| fastpunct | Punctuation Restoration | |
| pyphen | Hypthenate words into syllables | |
| pypostal, mordecai, usaddress, libpostal | Parse Street Addresses | |
| geopy, geocoder, nominatim, pelias, photon, lieu | Geocoding | |
| probablepeople, python-nameparser | Parse person name | |
| python-phonenumbers | Parse phone numbers | |
| numerizer, word2number | Parse natural language number | |
| dateparser | Parse natural dates | |
| ctparse | Parse natural language time | |
| daterangeparser | Parse date ranges in natural language | |
| emoji | Handle emoji | |
| pyarabic | multilingual | |
| Tokenization | sentencepiece, youtokentome, subword-nmt | |
| sacremoses | Rule-based | |
| jieba, pkuseg | Chinese Word Segmentation | |
| kytea | Japanese word segmentation | |
| Clustering | kmodes, star-clustering, genieclust | |
| spherecluster | K-means with cosine distance | |
| sib | Sequential Information Bottleneck | |
| kneed | Automatically find number of clusters from elbow curve | |
| OptimalCluster | Automatically find optimal number of clusters | |
| gsdmm | Short-text clustering | |
| Code Switching | codeswitch | |
| Constituency Parsing | benepar, allennlp, chunk-english-fast | |
| Compact Models | mobilebert, distilbert, tinybert,BERT-of-Theseus-MNLI, MiniML | |
| Cross-lingual Embeddings | muse, laserembeddings, xlm, LaBSE | |
| transvec, vecmap | Train mapping between monolingual embeddings | |
| MuRIL | Embeddings for 17 indic languages with transliteration | |
| BPEmb | Subword Embeddings in 275 Languages | |
| piecelearn | Train own sub-word embeddings | |
| Dictionary | vocabulary | |
| Domain-specific | codebert | Code |
| clinicalbert-mimicnotes, clinicalbert-discharge-summary | Clinical Domain | |
| twitter-roberta-base | ||
| scispacy | bio-medical data | |
| blackstone | Legal text | |
| Entity Linking | dbpedia-spotlight, GENRE | |
| Entity Matching | py_entitymatching, deepmatcher | |
| Embeddings | InferSent, embedding-as-service, bert-as-service, sent2vec, sense2vec,glove-python, fse | |
| counterix | Train custom Count-based DSM | |
| embeddix | Convert word vectors format | |
| wiki2vec | Word2Vec trained on DBPedia Entities | |
| chars2vec | Character-embeddings for handling typo and slangs | |
| rank_bm25, BM25Transformer | BM25 | |
| sentence-transformers, DeCLUTR | BERT sentence embeddings | |
| conceptnet-numberbatch | Word embeddings trained with common-sense knowledge graph | |
| word2vec-twitter | Word2vec trained on twitter | |
| pymagnitude | Access word-embeddings programatically | |
| chakin | Download pre-trained word vectors | |
| zeugma | Pretrained-word embeddings as scikit-learn transformers | |
| starspace | Learn embeddings for anything | |
| svd2vec | Learn embeddings from co-occurrence | |
| all-but-the-top | Post-processing for word vectors | |
| entity-embed | Train custom embeddings for named entities | |
| Emotion Classification | goemotion-pytorch, text2emotion | |
| emosent-py | Sentiment scores for Emojis | |
| Feature Generation | homer, textstat | Readability scores |
| LexicalRichness | Lexical Richness Measure | |
| Fill mask | fitbert | |
| Finite State Transducer | OpenFST | |
| Gibberish Detection | nostril, gibberish-detector | |
| Grammar Induction | gitta, grasp | Generate CFG from sentences |
| Information Extraction | claucy | |
| GiveMe5W1H | Extract 5-why 1-how phrases from news | |
| spikex | Spacy pipeline for knowledge extraction | |
| Keyword extraction | rake, multi-rake, pke, phrasemachine, keybert, word2phrase | |
| pyate | Automated Term Extraction | |
| Knowledge | conceptnet-lite | |
| stanford-openie | Knowledge Graphs | |
| verbnet-parser | VerbNet parser | |
| Knowledge Distillation | textbrewer, aquvitae | |
| Language Model Scoring | lm-scorer, bertscore, kenlm, spacy_kenlm, mlm-scoring | |
| Lexical Simplification | easee | Evaluation metric |
| Metrics | seqeval | NER, POS tagging |
| ranking-metrics, cute_ranking | Metrics for Information Retrieval | |
| mir_eval | Music Information Retrieval Metrics | |
| Morphology | unimorph | Morphology data for many languages |
| Multilingual support | polyglot, trankit | |
| inltk, indic_nlp | Indic Languages | |
| cltk | NLP for latin and classic languages | |
| langrank | Auto-select optimal transfer language | |
| Named Entity Recognition(NER) | spaCy , Stanford NER, sklearn-crfsuite | |
| med7 | Spacy NER for medical records | |
| Nearest neighbor | faiss, sparse_dot_topn, n2, autofaiss | |
| NLU | snips-nlu | |
| ParlAI | Dialogue System | |
| Paraphrasing | parrot | |
| pegasus | Question Paraphrasing | |
| paraphrase_diversity_ranker | Rank paraphrases of sentence | |
| sentaugment | Paraphrase mining | |
| Phonetics | epitran | Transliterate text into IPA |
| allosaurus | Recognize phone for 2000 languages | |
| Phonology | panphon | Generate phonological feature representations |
| phoible | Database of segment inventories for 2186 languages | |
| Probabilistic parsing | parserator | Create domain-specific parser for address, name etc. |
| Profanity detection | profanity-check | |
| Pronunciation | pronouncing | |
| Question Answering | haystack | Build end-to-end QA system |
| mcQA | Multiple Choice Question Answering | |
| TAPAS | Table Question Answering | |
| Question Generation | question-generation, questiongen.ai | Question Generation Pipeline for Transformers |
| Ranking | transformer-rankers | |
| Relation Extraction | OpenNRE | |
| Search | elasticsearch-dsl | Wrapper for elastic search |
| jina | production-level neural semantic search | |
| mellisearch-python | ||
| Semantic parsing | quepy | |
| Sentiment | vaderSentiment, afinn | Rule based |
| absa | Aspect Based Sentiment Analysis | |
| xlm-t | Models | |
| Spacy Extensions | spacy-pattern-builder | Generate dependency matcher patterns automatically |
| spacy_grammar | Rule-based grammar error detection | |
| role-pattern-builder | Pattern based SRL | |
| textpipeliner | Extract RDF triples | |
| tenseflow | Convert tense of sentence | |
| camphr | Wrapper to transformers, elmo, udify | |
| spleno | Domain-specific lemmatization | |
| spacy-udpipe | Use UDPipe from Spacy | |
| spacymoji | Add emoji metadata to spacy docs | |
| String match | phrase-seeker, textsearch | |
| jellyfish, fuzzy, doublemetaphone | Perform string and phonetic comparison | |
| clavier | Edit distance based on keyboard layout | |
| flashtext | Super-fast extract and replace keywords | |
| pythonverbalexpressions | Verbally describe regex | |
| commonregex | Ready-made regex for email/phone etc. | |
| textdistance, editdistance, word-mover-distance, edlib | Text distances | |
| wmd-relax | Word mover distance for spacy | |
| fuzzywuzzy, spaczz, PolyFuzz, rapidfuzz, fuzzymatcher | Fuzzy Search | |
| deduplipy, dedupe | Active-Learning based fuzzy matching | |
| recordlinkage | Record Linkage | |
| Summarization | textrank, pytldr, bert-extractive-summarizer, sumy, fast-pagerank, sumeval | |
| doc2query | Summarize document with queries | |
| summarizers | Controllable summarization | |
| insight_extractor | Extract insightful sentences from docs | |
| Text Extraction | textract (Image, Audio, PDF) | |
| Text Generation | gp2client, textgenrnn, gpt-2-simple, aitextgen | GPT-2 |
| markovify | Markov chains | |
| accelerated-text | Template-based generation | |
| keytotext | Keyword to Sentence Generation | |
| Transliteration | wiktra | |
| Machine Translation | MarianMT, Opus-MT, joeynmt, OpenNMT, EasyNMT, argos-translate, dl-translate | |
| googletrans, word2word, translate-python, deep_translator | Translation libraries | |
| mosesdecoder | Statistical MT | |
| apertium | RBMT | |
| translators | Free calls to multiple translation APIs | |
| giza++, fastalign, simalign, eflomal, awesome-align | Word Alignment | |
| Thesaurus | python-datamuse | |
| Toxicity Detection | detoxify | |
| Topic Modeling | gensim, guidedlda, enstop, top2vec, contextualized-topic-models, corex_topic, lda2vec, bertopic, tomotopy, ToModAPI | |
| zeroshot_topics | Zero-shot topic modeling | |
| octis | Evaluate topic models | |
| Typology | lang2vec | Compare typological features of languages |
| Visualization | stylecloud | Word Clouds |
| scattertext | Compare word usage across segments | |
| picture-text | Interactive tree-maps for hierarchical clustering | |
| ipymarkup | Visualize NER and syntax | |
| Verb Conjugation | nodebox_linguistics_extended, mlconj3 | |
| Word Sense Disambiguation | pywsd, ewiser, supwsd | |
| frame-english-fast | Verb Disambiguation | |
| Zero Shot Learning | setfit |
Computer Vision
| Category | Tool | Remarks |
|---|---|---|
| Face recognition | face_recognition, mtcnn, insightface, face-detection | |
| face-alignment | Find facial landmarks | |
| Facial-Expression-Recognition.Pytorch | Face Emotion | |
| Face swapping | faceit, faceit-live, avatarify | |
| GANS | mimicry, imaginaire, pytorch-lightning-gans | |
| High-level libraries | terran | Face detection, recognition, pose estimation |
| Image Hashing | ImageHash, imagededup | |
| Image Inpainting | GAN Image Inpainting | |
| Image Processing | scikit-image, imutils, opencv-wrapper, opencv-python | |
| torchio | Medical Images | |
| Object detection | luminoth, detectron2, mmdetection, icevision | |
| OCR | keras-ocr, pytesseract, keras-craft, ocropy, doc2text | |
| easyocr, kraken, PaddleOCR | Multilingual OCR | |
| layout-parser, pdftabextract | OCR tables from document | |
| Segmentation | segmentation_models | Keras |
| segmentation_models.pytorch | Segmentation models in PyTorch | |
| Semantic Search | scoper | Video |
| Video summarization | videodigest |
Speech
| Category | Tool | Remarks |
|---|---|---|
| Diarization | resemblyzer | |
| Feature Engineering | python_speech_features | Convert raw audio to features |
| Libraries | speechbrain, pyannotate, librosa, espnet | |
| silero-models | Pre-trained models | |
| Source Separation | spleeter, nussl, open-unmix-pytorch, asteroid | |
| Speech Recognition | kaldi, speech_recognition, delta, pocketsphinx-python, deepspeech, stt, vosk | |
| Speech Synthesis | festvox, cmuflite, tts |
Recommendation System
| Category | Tool | Remarks |
|---|---|---|
| Apriori algorithm | apyori | |
| Collaborative Filtering | implicit | |
| Libraries | xlearn, DeepCTR, RankFM | Factorization machines (FM), and field-aware factorization machines (FFM) |
| libmf-python | Matrix Factorization | |
| lightfm, spotlight | Popular Recsys algos | |
| tensorflow_recommenders | Recommendation System in Tensorflow | |
| Metrics | rs_metrics | |
| Recommendation System in Pytorch | CaseRecommender | |
| Scikit-learn like API | surprise |
Timeseries
| Category | Tool | Remarks |
|---|---|---|
| Libraries | prophet, tslearn, pyts, seglearn, cesium, stumpy, darts, gluon-ts, stldecompose | |
| sktime | Scikit-learn like API | |
| atspy | Automated time-series models | |
| Anomaly Detection | orion, luminaire | Unsupervised time-series anomaly detection |
| ARIMA models | pmdarima |
Hyperparameter Optimization
| Category | Tool | Remarks |
|---|---|---|
| General | hyperopt, optuna, evol, talos | |
| Keras | keras-tuner | |
| Parameter optimization | ParameterImportance | |
| Scikit-learn | hyperopt-sklearn, scikit-optimize | Bayesian Optimization |
| sklearn-deap, sklearn-generic-opt | Evolutionary algorithm |
Phase: Validation
Experiment Monitoring
| Category | Tool | Remarks |
|---|---|---|
| Experiment tracking | tensorboard, mlflow | |
| lrcurve, livelossplot | Plot realtime learning curve in Keras | |
| GPU Usage | gpumonitor, nvtop | |
| jupyterlab-nvdashboard | See GPU Usage in jupyterlab | |
| MLOps | clearml, wandb, neptune.ai, replicate.ai | |
| Notification | knockknock | Get notified by slack/email |
| jupyter-notify | Notify when task is completed in jupyter | |
| apprise | Notify to any platform | |
| pynotifier | Generate desktop notification |
Visualization
| Category | Tool | Remarks |
|---|---|---|
| Diagrams | dl-visuals, ml-visuals | |
| chalk | Declarative drawing API | |
| Libraries | matplotlib, seaborn, pygal, plotly, plotnine | |
| yellowbrick, scikit-plot | Visualization for scikit-learn | |
| pyldavis | Visualize topics models | |
| dtreeviz | Visualize decision tree | |
| txtmarker | Highlight text in PDF | |
| metriculous | Visualize model performance | |
| Animated charts | bar_chart_race | Bar chart race animation |
| pandas_alive | Animated charts in pandas | |
| High dimensional visualization | umap | |
| ivis | Ivis Algorithm | |
| Interactive charts | bokeh | |
| flourish-studio | Create interactive charts online | |
| mpld3 | Matplotlib to D3 Converter | |
| Model Visualization | netron, nn-svg | Architecture |
| keract | Activation maps for keras | |
| keras-vis | Visualize keras models | |
| PlotNeuralNet | Latex code for drawing neural network | |
| loss-landscape-anim | Generate loss landscape of optimizer | |
| Styling | open-color | Color Schemes |
| mplcyberpunk | Cyberpunk style for matplotlib | |
| chart.xkcd | XKCD like charts | |
| adjustText | Prevent overlap when plotting point text label | |
| Generate graphs using markdown | mermaid | |
| Tree-map chart | squarify | |
| 3D charts | babyplots |
Phase: Production
Model Export
| Category | Tool | Remarks |
|---|---|---|
| Benchmarking | torchprof | Profile pytorch layers |
| scalene, pyinstrument | Profile python code | |
| k6 | Load test API | |
| ai-benchmark | Bechmark VM on 19 different models | |
| Cloud Storage | Zenodo, Github Releases, OneDrive, Google Drive, Dropbox, S3, mega, DAGsHub, huggingface-hub | |
| Data Pipeline | pypeln | |
| Dependencies | pip-chill | pip freeze without dependencies |
| pipreqs | Generate requirements.txt based on imports | |
| conda-pack | Export conda for offline use | |
| Distributed training | horovod | |
| Model Store | modelstore | |
| Optimization | nn_pruning | Movement Pruning |
| aimet, tensorflow-lite | Quantization | |
| Serialization | sklearn-porter, m2cgen | Transpile sklearn model to C, Java, JavaScript and others |
| onnxmltools | Classic ML models to onnx format | |
| hummingbird | Convert ML models to PyTorch | |
| cloudpickle, jsonpickle | Pickle extensions |
Inference
| Category | Tool | Remarks |
|---|---|---|
| Authentication | pyjwt (JWT), auth0, okta, cognito | |
| Batch Jobs | airflow, luigi, dagster, oozie, prefect, kubernetes-cron-jobs, argo | |
| rq, schedule, huey | Task Queue | |
| mlq | Queue ML Tasks in Flask | |
| Caching | cachetools, cachew (cache to local sqlite) | |
| redis-py, pymemcache | ||
| Cloud Monitoring | datadog | |
| Configuration Management | config, python-decouple, python-dotenv, dynaconf | |
| CORS | flask-cors | CORS in Flask |
| Database | flask-sqlalchemy, tinydb, flask-pymongo, odmantic | |
| tortoise-orm | Asyncio ORM similar to Django | |
| Monitoring | whylogs | Data Logging |
| grafana, prometheus | Metric | |
| sentry, honeybadger | Error Reporting | |
| Data Validation | schema, jsonschema, cerebrus, pydantic, marshmallow, validators | |
| Dashboard | streamlit | Generate frontend with python |
| gradio | Fast UI generation for prototyping | |
| dash | React Dashboard using Python | |
| voila | Convert Jupyter notebooks into dashboard | |
| streamlit-drawable-canvas | Drawable Canvas for Streamlit | |
| streamlit-terran-timeline | Show timeline of faces in videos | |
| streamlit components | Collection of streamlit components | |
| Deployment Checklist | ml-checklist | |
| Documentation | mkdocs, pdoc | |
| Drift Detection | alibi-detect, torchdrift, boxkite | Outlier and drift detection |
| Edge Deployment | Tensorfow Lite, coreml, Tensorflow.js) | |
| Logging | loguru | |
| Model Serving | cortex, torchserve, ray-serve, bentoml, seldon-core | Serving Framework |
| flask, fastapi | API Frameworks | |
| Processing | pyspark, hive | |
| Serverless | magnum | Use FastAPI in Lambda |
| Server-Side Events | sse-starlette | Server-side events for FastAPI |
| Stream Processing | flink, kafka, apache beam | |
| Testing | schemathesis | Automatic test generation from Swagger |
| pytest-benchmark | Profile time in pytest | |
| exdown | Extract code from markdown files | |
| mktestdocs | Test code present in markdown files |
Python libraries
| Category | Tool | Remarks |
|---|---|---|
| Async | tomorrow | |
| Audio | simpleaudio | Play audio using python |
| Automation | pyuserinput, pyautogui, pynput | Control mouse and keyboard |
| bloom filter | python-bloomfilter | |
| CLI Formatting | rich | |
| Concurrent database | pickleshare | |
| Code to Maths | latexify-py, handcalcs | |
| Create interactive prompts | prompt-toolkit | |
| Collections | bidict | Bidirectional dictionary |
| sortedcontainers | Sorted list, set and dict | |
| munch | Dictionary with dot access | |
| Correlation Metric | xicor | |
| Date and Time | pendulum | |
| Decorators | retrying (retry some function) | |
| Debugging | PySnooper | |
| Improved doctest | xdoctest | |
| Linting | pylint, pycodestyle | Code Formatting |
| pydocstyle | Check docstring | |
| safety, bandit, shellcheck | Check vulnerabilities | |
| mypy | Check types | |
| black | Automated Formatting | |
| Leaflet maps from python | folium | |
| Multiprocessing | filelock | Lock files during access from multiple process |
| Path-like interface to remote files | pathy | |
| Pretty print tables in CLI | tabulate | |
| Progress bar | fastprogress, tqdm | |
| Run python libraries in sandbox | pipx | |
| Shell commands as functions | sh | |
| Standard Library Extension | ubelt | |
| Subprocess | delegator.py | |
| Testing | crosshair(find failure cases for functions) | |
| Virtual webcam | pyfakewebcam |
Utilities
| Category | Tool | Remarks |
|---|---|---|
| Colab | colab-cli | Manager colab notebook from command line |
| Drive | drive-cli | Use google drive similar to git |
| Database | mlab | Free 500 MB MongoDB |
| Data Visualization | flourish-studio | |
| Git | gitjk | Undo what you just did in git |
| Linux | ripgrep | |
| Trade-off tools | egograph | Find alternatives to anything |