30 open-source projects similar to alfred1984/interesting-python, ranked by how many features they have in common. Compare stars, activity and what each one does to find the best Interesting Python alternative.
CrawlerTutorial is a comprehensive Python web scraping tutorial and framework designed for extracting data from static and dynamic websites. It functions as a web data extraction pipeline and an HTTP request orchestrator, covering the full lifecycle of scraping applications from initial fetching to final data storage. The project provides specialized guidance on anti-bot bypass techniques and web API reverse engineering. It includes methods for evading browser detection through identity masking and proxy rotation, as well as techniques for identifying hidden API endpoints by analyzing network
This project is a Python machine learning library and data science toolkit designed for building predictive models and analyzing complex datasets. It provides a collection of implementations for common supervised and unsupervised algorithms using the Scikit-Learn framework. The toolkit includes a predictive modeling suite for generating predictions from historical data and a statistical analysis framework for applying Bayesian modeling and causality tests. It also features a data visualization suite based on Matplotlib for rendering static charts and graphs to interpret classifier boundaries
This project is a comprehensive collection of Python programming education materials, including tutorials, exercises, and curated code samples. It serves as a learning curriculum and software engineering toolkit, utilizing Jupyter Notebooks to combine executable code with descriptive educational text. The repository provides practical implementation guides for building large language model applications, such as retrieval-augmented generation systems, stateful AI agents, and machine learning workflows. It distinguishes itself by offering a structured approach to agentic coding workflows, cover
VADER (Valence Aware Dictionary and sEntiment Reasoner) is a rule-based, lexicon-driven sentiment analyzer that assigns polarity scores to text by matching words against a curated sentiment dictionary and applying linguistic heuristics. It processes text at the sentence level, returning a compound score normalized between -1 (negative) and +1 (positive) along with separate positive, neutral, and negative intensity breakdowns. What distinguishes VADER from simpler lexicon models is its built-in grammatical rule engine. It adjusts scores for negation (e.g., “not good” reduces positivity), contr
AndroidHttpCapture is a mobile application for intercepting and analyzing HTTP and HTTPS network traffic directly on an Android device. It functions as a local proxy server and traffic interceptor to capture and monitor requests and responses from other installed mobile applications. The tool provides capabilities for MITM HTTPS decryption through root certificate installation and supports exporting captured network sessions as HAR files for external analysis. It allows for real-time response body injection and the modification of request headers via user agent spoofing. The project includes
CocoaDebug is a debugging framework for iOS that provides a toolkit for inspecting application logs, network traffic, and sandbox files directly on a device. It functions as a suite of specialized tools for auditing device hardware, monitoring performance, and inspecting network activity. The framework includes an on-device network inspector for capturing and filtering HTTP requests and payloads, as well as a sandbox file manager to list and modify files and folders within the application container. It further provides a device auditor to display system build versions and hardware details for
ThinkStats2 is a computational statistics course and educational library designed to teach probability and statistics through a programmatic approach. It provides a framework for studying statistical concepts by writing Python code and running simulations on real-world datasets. The project uses interactive notebooks and a collection of Python modules to deliver guided lessons. It emphasizes the verification of theoretical statistical laws through iterative computational experiments and simulation-driven testing. The resource covers broad capabilities in data analysis and data science traini
trump2cash is a sentiment-based stock trading bot and social media market monitor. It uses a natural language processing sentiment analysis tool to scan real-time social media feeds for mentions of publicly traded companies and translates the emotional tone of that text into automated buy or short stock market orders. The system utilizes a ticker mapping utility to resolve company names, subsidiaries, and brands into valid public stock market ticker symbols. To verify the efficacy of these sentiment-driven signals, it includes an algorithmic trading backtester that evaluates trading strategie
Sonar is a mobile app debugging platform and extensible toolkit that allows developers to inspect the internal state, network traffic, and system logs of mobile devices via a desktop interface. It functions as a centralized system for monitoring application behavior and troubleshooting logic and performance issues. The platform is distinguished by a plugin-based extension system that enables the development of custom debugging tools. These plugins can visualize specific application data and facilitate event exchange between a mobile device and a computer. The toolkit covers several core obse
This project is a Python data analysis library and exploratory data analysis framework designed for processing raw datasets. It provides a suite of tools for examining data, identifying anomalies, and applying statistical methods to uncover patterns. The repository functions as a machine learning modeling toolkit and a statistical data modeling suite. It includes predictive algorithms and mathematical models used to analyze relationships between data variables and derive insights from complex datasets. The project covers a broad range of capabilities including data science, machine learning
This is an interactive notebook-based course that teaches machine learning from Python fundamentals through deep learning and natural language processing. It uses real datasets and multiple frameworks within a structured, hands-on curriculum that combines concise explanations with executable code cells, built-in datasets, and embedded exercise checkpoints. Learning progresses through data preparation and exploration, classical machine learning workflows, computer vision with convolutional neural networks, and natural language processing with deep learning, all delivered as a cohesive progressi
nlp.js is a JavaScript natural language processing library and development framework used to build natural language understanding engines. It provides a toolkit for creating local machine learning models for intent classification and acts as a multilingual text processor that detects languages and normalizes text across various dialects. The framework distinguishes itself by supporting local execution on both servers and mobile devices, enabling chatbot functionality without an internet connection. It features a specialized system for conversational slot filling to collect mandatory informati
ECommerceCrawlers is an educational collection of Python-based crawler scripts designed to extract data from a variety of public websites, including e-commerce platforms, social media sites, news outlets, and multimedia sources. The project serves as a learning resource for web scraping techniques, offering ready-to-run examples that demonstrate practical data extraction methods. The toolkit covers a broad range of data types, including product listings and prices from online retail platforms, public posts and profiles from social networking sites, articles from news and blogging platforms, p
LinkFinder is a security reconnaissance and static analysis tool designed for JavaScript endpoint discovery. It extracts absolute and relative URLs and parameters from JavaScript files to map the attack surface of web applications and identify hidden API routes. The tool operates through static code analysis and regular expression pattern matching to find endpoints without executing the source code. It includes a data processor for importing exported files from Burp Suite, enabling the batch analysis of multiple JavaScript assets in a single execution. The system provides capabilities for do
Wireshark is a network protocol analyzer and traffic inspector used for capturing and inspecting network traffic. It functions as a packet capture tool that intercepts live data from network interfaces and a TCP/IP dissector that decodes network protocol layers to translate raw binary packets into human-readable fields. The system provides capabilities for protocol stream reconstruction, grouping related packets into cohesive conversations between endpoints. It also operates as a packet file converter, allowing for the reading, modification, and conversion of network capture files across vari
so-novel is a web novel downloader and scraping engine designed to extract structured text from websites and convert it into electronic book formats. It functions as a multi-interface content extractor, providing a shared backend accessible via a web-based management dashboard, a terminal user interface, and a command line interface. The system utilizes a rule-driven approach for data extraction, using CSS selectors and XPath rules defined in external configuration files to map web elements to specific data fields. To maintain access to content, it includes a proxy-routed request pipeline to
img2dataset is a high-performance image dataset pipeline and preprocessing tool designed to download and process millions of images from URLs for machine learning training. It functions as a distributed image downloader and cloud storage data exporter, moving large visual datasets from web sources directly into structured formats. The system prioritizes high-throughput data acquisition by distributing workloads across multiple CPU cores and machines. It integrates directly with remote cloud storage buckets and employs a manifest-based tracking system to resume interrupted downloads without re
This project is a Model Context Protocol server that connects large language models to web scraping and crawling tools. It functions as a bridge, allowing LLM clients to utilize a web crawling engine and scraping utilities to extract and process web data. The server integrates a markdown web converter that transforms dynamic web pages and PDF documents into clean markdown to optimize consumption by AI models. It also provides a browser automation interface for controlling headless sessions and bypassing access restrictions. The system covers broad capabilities including large-scale website d
Hetty is an HTTP intercepting proxy and web security research toolkit used to capture, inspect, and modify traffic between a browser and a server. It functions as an HTTP request editor for creating and replaying manual requests to test server behavior and as a project-based traffic logger that isolates network logs across different security research engagements. The tool provides a request-response interception loop that pauses outgoing requests and incoming responses in transit, allowing for manual editing or cancellation. It includes a manual request replay engine to construct and transmit
Zen Desktop is a cross-platform proxy client and network request filter for Windows, macOS, and Linux. It functions as a system-wide ad blocker and privacy protection tool that intercepts network traffic across all operating system applications to block advertisements, trackers, and malware. The software employs a network request filtering system that modifies HTTP headers and prunes JSON data using custom rules and regular expressions. It specifically removes tracking parameters and sanitizes network headers to prevent activity monitoring. The project provides capabilities for blocklist man
DotnetSpider is a .NET web crawling framework and C# data extraction tool designed for automated web page discovery and the retrieval of structured data from the internet at scale. It functions as a high-level web scraping library for collecting information from various websites. The framework provides capabilities for automated web crawling and large-scale data scraping. It enables web content extraction to facilitate the creation of local databases or the analysis of online information through programmatic web automation within the .NET ecosystem. The system utilizes a pipeline-based data
This project is a collection of Python scripts and source code examples designed for learning programming fundamentals through practical application. It serves as a toolkit for web scraping and browser automation, alongside a library of utilities for data processing. The repository includes scripts for simulating human interactions to automate repetitive web tasks and online booking processes. It also provides a structured database of administrative divisions, including provinces, cities, and districts, for geographic data management and address validation. The collection covers capabilities
Kubeshark is a network observability platform designed for Kubernetes environments, functioning as an eBPF-powered engine for cluster-wide traffic analysis. It captures, indexes, and visualizes network activity and API calls directly from the kernel, providing deep visibility into service-to-service communication without requiring sidecar proxies or manual code instrumentation. The platform distinguishes itself through its ability to perform protocol-aware traffic dissection and user-space cryptographic hooking, which allows for the inspection of encrypted traffic and the reconstruction of ap
This application is a desktop network traffic analyzer that provides real-time monitoring and forensic inspection of data packets. By interfacing directly with low-level system drivers, it captures raw network traffic from physical or virtual adapters to identify communication patterns, track bandwidth usage, and diagnose connectivity issues. The system distinguishes itself through an immediate-mode graphical interface that rebuilds the display state every frame, ensuring high responsiveness during live data updates. It maintains performance by using asynchronous message passing to decouple t
Playwright for Python is a browser automation framework designed for end-to-end testing, web scraping, and user interaction simulation. It functions as a headless browser controller that enables programmatic navigation, data extraction, and the execution of complex workflows across multiple rendering engines. The framework distinguishes itself through an actionability-aware interaction engine that automatically verifies element readiness before performing actions, significantly reducing test flakiness. It utilizes isolated browser contexts to maintain separate storage and cookies for parallel
Robin is an AI-powered open source intelligence framework and dark web investigation tool. It functions as a multi-model AI orchestrator that integrates search engines and web scrapers with language models to automate information gathering and data synthesis. The system utilizes a crawl-and-filter architecture to isolate high-value data from raw web content and employs a query-refinement pipeline to optimize search terms. It specifically supports dark web investigations by routing requests through proxies to access hidden services and using language models to analyze and summarize findings fr
w3af is a web penetration testing suite and security audit framework designed to identify and exploit vulnerabilities in web applications. It functions as a vulnerability scanner that crawls targets to find injection points and a fuzzer used to discover hidden endpoints and test input validation. The project distinguishes itself by providing an intercepting HTTP proxy for capturing and modifying traffic, combined with a knowledge-base driven exploitation system. It enables the execution of security exploits to gain remote shell access and supports post-exploitation activities, such as routing
This project is a specialized browser debugging interface designed to monitor DOM elements, network traffic, and JavaScript execution. It provides a client-side user interface for inspecting and debugging web applications, allowing for the real-time modification of CSS styles and the investigation of the JavaScript runtime. The toolkit includes dedicated analysis tools for WebAssembly, featuring disassembly highlighting, scope inspection, and binary execution profiling. It also provides a network traffic inspector for analyzing HTTP requests and a CSS style editor for testing properties and a
GoReplay is a network traffic recording and replay tool used to capture live HTTP and binary protocol requests. It functions as a traffic shadowing proxy that duplicates incoming network requests to test environments and a utility for recording traffic to local or cloud storage for later analysis and playback. The system is capable of processing non-textual data formats, such as Thrift and Protocol Buffers, allowing for the capture and replay of specialized application-to-application communication. The tool supports live traffic capture and asynchronous duplication to validate infrastructure