What are the best Awesome Data Engineering and Infrastructure GitHub Repositories?

Foundational tools for large-scale data collection, ingestion, storage management, and reliability. Explore 1,321 awesome GitHub repositories matching data & databases · Data Engineering and Infrastructure. Refine with filters or upvote what's useful. Top picks: openclaw/openclaw, kamranahmedse/developer-roadmap, donnemartin/system-design-primer, vinta/awesome-python, torvalds/linux, trimstray/the-book-of-secret-knowledge, affaan-m/ecc, significant-gravitas/autogpt, jackfrued/python-100-days,…

Why is openclaw/openclaw a recommended Data Engineering and Infrastructure GitHub Repositories repository?

Exports portable backups of workspace data, authentication credentials, and gateway configurations.

Why is kamranahmedse/developer-roadmap a recommended Data Engineering and Infrastructure GitHub Repositories repository?

Configures expiration policies for cached data to balance performance and data freshness.

Why is donnemartin/system-design-primer a recommended Data Engineering and Infrastructure GitHub Repositories repository?

Details mechanisms for storing frequently accessed data in memory to reduce latency and backend processing requirements.

Why is vinta/awesome-python a recommended Data Engineering and Infrastructure GitHub Repositories repository?

Boost system performance by memoizing frequently accessed data within memory-efficient storage structures.

Why is torvalds/linux a recommended Data Engineering and Infrastructure GitHub Repositories repository?

Manages filesystem operations to provide consistent data access and storage organization across physical media.

Why is trimstray/the-book-of-secret-knowledge a recommended Data Engineering and Infrastructure GitHub Repositories repository?

Navigate and manage file systems through terminal-based interfaces that simplify directory operations.

Why is affaan-m/ecc a recommended Data Engineering and Infrastructure GitHub Repositories repository?

Manages the persistent storage of session summaries and learned skills under configurable root directories.

Why is significant-gravitas/autogpt a recommended Data Engineering and Infrastructure GitHub Repositories repository?

Coordinates the full lifecycle of CSV data imports through dedicated creation, listing, and retrieval methods.

Why is jackfrued/python-100-days a recommended Data Engineering and Infrastructure GitHub Repositories repository?

Understand the fundamentals of web scraping, including ethical considerations and essential toolsets for data extraction.

Why is microsoft/markitdown a recommended Data Engineering and Infrastructure GitHub Repositories repository?

Interprets diverse file formats and generates structured, context-aware Markdown output using advanced language models.

1.3K रिपॉजिटरी

Awesome GitHub RepositoriesData Engineering and Infrastructure

Foundational tools for large-scale data collection, ingestion, storage management, and reliability.

Explore 1,321 awesome GitHub repositories matching data & databases · Data Engineering and Infrastructure. Refine with filters or upvote what's useful.

AI के साथ बेहतरीन रिपॉजिटरी खोजें।हम AI का उपयोग करके सबसे सटीक रिपॉजिटरी खोजेंगे।

openclaw/openclaw
openclaw/openclaw
380,031GitHub पर देखें
Openclaw एजेंट निष्पादन वातावरण को प्रबंधित करने के लिए एक प्लेटफ़ॉर्म है, जो एजेंट लाइफसाइकिल, सत्र स्थिति और वर्कस्पेस पर्सिस्टेंस को नियंत्रित करने के लिए इंफ्रास्ट्रक्चर प्रदान करता है। इसमें एक केंद्रीकृत गेटवे है जो मॉडल लूप, टूल इनवोकेशन और स्ट्रीमिंग इवेंट्स को संभालता है, साथ ही मल्टी-एजेंट रूटिंग और पर्सिस्टेंट मेमोरी प्रबंधन का समर्थन करता है। सिस्टम को टूल निष्पादन हस्ताक्षरों को सामान्य करने और क्रॉस-प्रदाता संगतता के लिए एक मानकीकृत इंटरफ़ेस प्रदान करने के लिए डिज़ाइन किया गया है। प्लेटफ़ॉर्म में व्यापक डेवलपर टूलिंग शामिल है, जैसे वर्कस्पेस प्रबंधन के लिए कमांड-लाइन इंटरफ़ेस, डायग्नोस्टिक लॉगिंग, और एक प्लगइन आर्किटेक्चर जो कस्टम टूल और क्षमताओं के पंजीकरण की अनुमति देता है। यह इवेंट-संचालित हुक, कार्य शेड्यूलिंग और बाहरी सेवाओं के साथ एकीकरण के माध्यम से स्वचालित वर्कफ़्लो का समर्थन करता है। सुरक्षा को निष्पादन नीतियों, क्रेडेंशियल पोर्टेबिलिटी और एजेंट कार्यों के लिए अनुमोदन वर्कफ़्लो के माध्यम से प्रबंधित किया जाता है। डिप्लॉयमेंट का समर्थन स्वचालित इंफ्रास्ट्रक्चर इंस्टॉलर और कंटेनरीकृत गेटवे हेल्पर्स के माध्यम से किया जाता है, जिसमें बैकअप और कॉन्फ़िगरेशन प्रबंधन के लिए अंतर्निहित यूटिलिटी शामिल हैं। सिस्टम मल्टी-स्टेप वर्कफ़्लो को ऑर्केस्ट्रेट करने के लिए एक संरचित प्रारूप प्रदान करता है और इसमें ब्राउज़र ऑटोमेशन और संरचित कोड पैचिंग के लिए विशेष टूल शामिल हैं।
Exports portable backups of workspace data, authentication credentials, and gateway configurations.
TypeScriptaiassistantcrustacean
GitHub पर देखें380,031
kamranahmedse/developer-roadmap
kamranahmedse/developer-roadmap
357,434GitHub पर देखें
Developer Roadmap एक समुदाय-संचालित प्लेटफ़ॉर्म है जो सॉफ्टवेयर इंजीनियरिंग के लिए संरचित, ग्राफ-आधारित शिक्षण पथ प्रदान करता है। यह एक व्यापक ज्ञान रिपॉजिटरी के रूप में कार्य करता है जहाँ तकनीकी डोमेन को पेशेवर कौशल अधिग्रहण और करियर विकास का मार्गदर्शन करने के लिए दृश्य अनुक्रमों में व्यवस्थित किया जाता है। यह प्रोजेक्ट एक सहयोगात्मक पारिस्थितिकी तंत्र के माध्यम से खुद को अलग करता है जो उपयोगकर्ताओं को रोडमैप में योगदान करने, उद्योग के सर्वोत्तम अभ्यासों को क्यूरेट करने और पेशेवर प्रोफाइल बनाए रखने में सक्षम बनाता है। यह तकनीकी दक्षता का मूल्यांकन करने के लिए डायग्नोस्टिक असेसमेंट फ्रेमवर्क को एकीकृत करता है, जिससे डेवलपर्स को ज्ञान के अंतराल की पहचान करने और लक्षित शिक्षण अनुक्रमों के माध्यम से पेशेवर साक्षात्कारों की तैयारी करने में मदद मिलती है। अपनी मुख्य मैपिंग क्षमताओं से परे, प्लेटफ़ॉर्म इंजीनियरिंग अवधारणाओं को सुदृढ़ करने के लिए व्यावहारिक प्रोजेक्ट विचार और इंटरैक्टिव ट्यूशन प्रदान करता है। यह समुदाय के लिए संसाधनों को साझा करने, प्रगतिशील कौशल विकास को ट्रैक करने और जटिल तकनीकी परिदृश्यों को नेविगेट करने के लिए एक केंद्रीकृत स्थान प्रदान करता है।
Configures expiration policies for cached data to balance performance and data freshness.
TypeScriptangular-roadmapbackend-roadmapblockchain-roadmap
GitHub पर देखें357,434
donnemartin/system-design-primer
donnemartin/system-design-primer
353,387GitHub पर देखें
यह प्रोजेक्ट वितरित सिस्टम आर्किटेक्चर और बैकएंड इंफ्रास्ट्रक्चर डिज़ाइन पर केंद्रित एक व्यापक शैक्षिक संसाधन और अध्ययन मार्गदर्शिका है। यह जटिल सॉफ्टवेयर सिस्टम को डिज़ाइन करने के लिए आवश्यक स्केलेबिलिटी, विश्वसनीयता और प्रदर्शन में महारत हासिल करने के लिए एक संरचित पाठ्यक्रम प्रदान करता है। रिपॉजिटरी तकनीकी साक्षात्कार की तैयारी के लिए एक व्यवस्थित दृष्टिकोण प्रदान करके खुद को अलग करती है, जिसमें डिज़ाइन पैटर्न, आर्किटेक्चरल ट्रेड-ऑफ और स्पेस रिपिटिशन टूल शामिल हैं ताकि उपयोगकर्ताओं को जटिल अवधारणाओं को याद रखने में मदद मिल सके। यह बाधा-संचालित विश्लेषण पर जोर देती है, उपयोगकर्ताओं को सिखाती है कि आर्किटेक्चरल डिज़ाइन तैयार करते समय विलंबता (latency), स्थिरता (consistency) और उपलब्धता (availability) जैसी प्रतिस्पर्धी आवश्यकताओं का मूल्यांकन कैसे करें। सामग्री सिस्टम डिज़ाइन क्षमताओं के एक व्यापक स्पेक्ट्रम को कवर करती है, जिसमें डेटाबेस स्केलिंग, ट्रैफ़िक प्रबंधन और इंफ्रास्ट्रक्चर ऑप्टिमाइज़ेशन की रणनीतियाँ शामिल हैं। यह हॉरिजॉन्टल स्केलिंग, मल्टी-लेयर्ड कैशिंग, एसिंक्रोनस संचार और सर्विस डिस्कवरी के लिए तकनीकों का विवरण देती है, साथ ही संसाधन अनुमान और क्षमता नियोजन करने के लिए फ्रेमवर्क भी प्रदान करती है। दस्तावेज़ीकरण को एक अध्ययन मार्गदर्शिका के रूप में व्यवस्थित किया गया है, जो बैकएंड इंजीनियरिंग और बड़े पैमाने पर सिस्टम डिज़ाइन के मूलभूत सिद्धांतों के माध्यम से एक व्यवस्थित पथ प्रदान करती है।
Details mechanisms for storing frequently accessed data in memory to reduce latency and backend processing requirements.
Pythondesigndesign-patternsdesign-system
GitHub पर देखें353,387
vinta/awesome-python
vinta/awesome-python
303,207GitHub पर देखें
यह प्रोजेक्ट एक व्यापक, समुदाय-क्यूरेटेड निर्देशिका है जो पायथन सॉफ्टवेयर लाइब्रेरी, फ्रेमवर्क और टूल के विशाल परिदृश्य को व्यवस्थित करती है। यह पारिस्थितिकी तंत्र नेविगेशन की सुविधा के लिए और पूरे सॉफ्टवेयर विकास लाइफसाइकिल में डेवलपर खोज को गति देने के लिए डिज़ाइन किया गया एक केंद्रीकृत नॉलेज बेस है। निर्देशिका तकनीकी डोमेन द्वारा वर्गीकृत संसाधनों का एक संरचित इंडेक्स प्रदान करके खुद को अलग करती है, जो मूलभूत विकास यूटिलिटी से लेकर विशेष इंजीनियरिंग क्षेत्रों तक फैला हुआ है। यह आर्टिफिशियल इंटेलिजेंस, डेटा साइंस, वेब डेवलपमेंट और इंफ्रास्ट्रक्चर प्रबंधन सहित उच्च-स्तरीय क्षमताओं को कवर करती है, जिससे डेवलपर्स विशिष्ट तकनीकी चुनौतियों के लिए परीक्षित समाधानों की पहचान कर सकते हैं। प्रोजेक्ट में निर्भरता प्रबंधन, स्टेटिक कोड विश्लेषण और स्वचालित परीक्षण के लिए टूल सहित क्षमताओं का एक व्यापक क्षेत्र शामिल है। यह पर्सिस्टेंट डेटा स्टोरेज, क्लाउड इंफ्रास्ट्रक्चर ऑर्केस्ट्रेशन और इंटरफ़ेस डेवलपमेंट के लिए संसाधनों को भी सूचीबद्ध करता है, जो जटिल सॉफ्टवेयर सिस्टम बनाने और बनाए रखने के लिए एक एकीकृत संदर्भ प्रदान करता है।
Boost system performance by memoizing frequently accessed data within memory-efficient storage structures.
Pythonawesomecollectionspython
GitHub पर देखें303,207
torvalds/linux
torvalds/linux
237,355GitHub पर देखें
Linux कर्नेल एक मोनोलिथिक ऑपरेटिंग सिस्टम कोर है जो विविध कंप्यूटिंग आर्किटेक्चर में हार्डवेयर संसाधनों, मेमोरी और प्रोसेस शेड्यूलिंग का प्रबंधन करता है। यह एप्लिकेशन निष्पादन के लिए एक मानकीकृत, POSIX-अनुपालन वातावरण प्रदान करता है, जबकि एक मॉड्यूलर ड्राइवर फ्रेमवर्क बनाए रखता है जो हार्डवेयर इंटरफेस को गतिशील रूप से लोड और हटाने की अनुमति देता है। प्रोजेक्ट अपने उच्च-प्रदर्शन समवर्ती टूलकिट द्वारा प्रतिष्ठित है, जो मल्टी-कोर वातावरण में साझा डेटा एक्सेस को प्रबंधित करने के लिए लॉकलेस सिंक्रोनाइज़ेशन प्रिमिटिव और रीड-कॉपी-अपडेट तंत्र का उपयोग करता है। इसमें एक व्यापक कर्नेल ट्रेसिंग और इंस्ट्रूमेंटेशन सूट शामिल है जो सिस्टम इवेंट्स, फ़ंक्शन निष्पादन और विलंबता मेट्रिक्स की गैर-घुसपैठ निगरानी को सक्षम बनाता है। इसके अलावा, कर्नेल आश्रित अनुप्रयोगों के लिए बैकवर्ड संगतता सुनिश्चित करने के लिए सख्त इंटरफ़ेस स्थिरता गारंटी और लाइफसाइकिल ट्रैकिंग लागू करता है। अपनी मुख्य पहचान से परे, सिस्टम में हार्डवेयर एब्स्ट्रैक्शन, नेटवर्क प्रोटोकॉल कार्यान्वयन और सुरक्षा नीति प्रवर्तन के लिए व्यापक क्षमताएं शामिल हैं। यह पावर स्टेट प्रबंधन, एम्बेडेड सिस्टम ऑप्टिमाइज़ेशन और फर्मवेयर-आधारित बूटिंग प्रक्रियाओं के माध्यम से विशेष इंजीनियरिंग आवश्यकताओं का समर्थन करता है। आर्किटेक्चर में मेमोरी विश्लेषण, सिस्टम निष्पादन सत्यापन और समवर्ती प्रोग्रामिंग मॉडल के सत्यापन के लिए मजबूत डायग्नोस्टिक फ्रेमवर्क भी शामिल हैं। स्रोत रिपॉजिटरी कोड को निष्पादन योग्य बाइनरी छवियों में बदलने के लिए एक पूर्ण बिल्ड सिस्टम प्रदान करती है, जिसमें विशिष्ट हार्डवेयर आवश्यकताओं के लिए आउटपुट को तैयार करने के लिए कर्नेल फीचर चयन और कॉन्फ़िगरेशन ऑप्टिमाइज़ेशन के लिए टूल शामिल हैं।
Manages filesystem operations to provide consistent data access and storage organization across physical media.
C
GitHub पर देखें237,355
trimstray/the-book-of-secret-knowledge
trimstray/the-book-of-secret-knowledge
228,641GitHub पर देखें
यह प्रोजेक्ट तकनीकी ज्ञान और प्रशासनिक संसाधनों की एक केंद्रीकृत, समुदाय-संचालित रिपॉजिटरी के रूप में कार्य करता है। यह एक संरचित वर्गीकरण प्रदान करता है जो अलग-अलग जानकारी को एक खोजने योग्य फ्रेमवर्क में एकत्रित करता है, जो सिस्टम प्रशासकों और साइबर सुरक्षा चिकित्सकों के लिए निरंतर सीखने और त्वरित समस्या-समाधान का समर्थन करता है। आक्रामक सुरक्षा, इंफ्रास्ट्रक्चर प्रबंधन और सॉफ्टवेयर विकास में संसाधनों को मैप करके, यह कौशल अधिग्रहण और पेशेवर संदर्भ के लिए एक एकीकृत पथ प्रदान करता है। प्रोजेक्ट को कमांड-लाइन-फर्स्ट डिज़ाइन दर्शन द्वारा परिभाषित किया गया है, जो कुशल सिस्टम प्रशासन और दोहराने योग्य सुरक्षा वर्कफ़्लो की सुविधा के लिए टर्मिनल-आधारित यूटिलिटी और स्क्रिप्ट करने योग्य इंटरफ़ेस को प्राथमिकता देता है। यह एक प्लेटफ़ॉर्म-अज्ञेयवादी दृष्टिकोण के माध्यम से खुद को अलग करता है, दस्तावेज़ीकरण और परिचालन मार्गदर्शिकाएं बनाए रखता है जो विविध Unix-जैसे और क्लाउड-आधारित वातावरणों में लागू रहती हैं। यह मॉड्यूलर टूलचेन एकीकरण उपयोगकर्ताओं को विशिष्ट प्रशासनिक या सुरक्षा कार्यों के लिए तैयार किए गए कस्टम वातावरण बनाने की अनुमति देता है। रिपॉजिटरी सिस्टम ऑडिटिंग, नेटवर्क प्रबंधन और इंफ्रास्ट्रक्चर हार्डनिंग के लिए व्यापक टूलकिट सहित क्षमताओं के एक व्यापक क्षेत्र को कवर करती है। यह साइबर सुरक्षा कौशल विकास के लिए संरचित शिक्षण पथ प्रदान करती है, जो एथिकल हैकिंग लैब और पेनेट्रेशन टेस्टिंग मानकों से लेकर भेद्यता मूल्यांकन और सिस्टम कॉन्फ़िगरेशन सर्वोत्तम प्रथाओं तक फैली हुई है। संग्रह में उत्पादकता टूल, डायग्नोस्टिक यूटिलिटी और शैक्षिक सामग्रियों की एक विस्तृत श्रृंखला भी शामिल है जिसे नियमित रखरखाव को सुव्यवस्थित करने और समग्र सुरक्षा स्थिति को बढ़ाने के लिए डिज़ाइन किया गया है।
Navigate and manage file systems through terminal-based interfaces that simplify directory operations.
awesomeawesome-listbsd
GitHub पर देखें228,641
affaan-m/ecc
affaan-m/ECC
221,981GitHub पर देखें
ECC एक LLM एजेंट ऑर्केस्ट्रेशन फ्रेमवर्क और क्रॉस-प्लेटफ़ॉर्म AI टूलिंग सूट है जिसे मल्टी-मॉडल वर्कफ़्लो का समन्वय करने के लिए डिज़ाइन किया गया है। यह विभिन्न AI-संचालित कोड संपादकों में जटिल सॉफ्टवेयर विकास कार्यों को निष्पादित करने के लिए विशेष एजेंट भूमिकाओं, पुन: प्रयोज्य कौशल और संरचित नियोजन को प्रबंधित करने के लिए एक सिस्टम प्रदान करता है। प्रोजेक्ट खुद को एक मॉडल कॉन्टेक्स्ट प्रोटोकॉल मैनेजर के रूप में अलग करता है, जो बाहरी सर्वर को एकीकृत करने और टूल निष्पादन का ऑडिट करने के लिए एक कॉन्फ़िगरेशन परत प्रदान करता है। यह आगे एक एजेंटिक सुरक्षा सैंडबॉक्स लागू करता है जो संवेदनशील फ़ाइल एक्सेस को प्रतिबंधित करता है और स्वायत्त वर्कफ़्लो को सुरक्षित करने के लिए गुप्त रिसाव (secret leakage) के लिए स्कैन करता है। फ्रेमवर्क AI कोडिंग वर्कफ़्लो ऑटोमेशन, टेस्ट-ड्रिवन डेवलपमेंट गार्डरेल्स, इंटेलिजेंट रूटिंग के माध्यम से मॉडल लागत ऑप्टिमाइज़ेशन और स्टेट-आइसोलेटेड मेमोरी प्रबंधन सहित व्यापक क्षमता क्षेत्रों को कवर करता है। इसमें भाषा-विशिष्ट कोडिंग मानकों को लागू करने और विभिन्न एकीकृत विकास वातावरणों में एजेंट व्यवहारों को प्रबंधित करने के लिए टूल भी शामिल हैं। सिस्टम को एक कमांड-लाइन इंटरफ़ेस के माध्यम से प्रबंधित किया जाता है जो टूल इंस्टॉलेशन, कॉन्फ़िगरेशन मरम्मत और टूलिंग प्रीसेट की तैनाती को संभालता है।
Manages the persistent storage of session summaries and learned skills under configurable root directories.
JavaScript
GitHub पर देखें221,981
significant-gravitas/autogpt
Significant-Gravitas/AutoGPT
184,973GitHub पर देखें
AutoGPT is an orchestration platform designed for building, managing, and deploying autonomous agents. It provides a visual canvas-based environment where users can assemble agents by connecting modular blocks that represent actions, data flows, and conditional logic. The platform supports the entire agent lifecycle, including task scheduling, execution monitoring, and configuration management, while offering a marketplace for discovering and sharing community-built workflows. The project includes a legacy framework for command-line agent execution and an extensible component system for devel
Coordinates the full lifecycle of CSV data imports through dedicated creation, listing, and retrieval methods.
Pythonaiartificial-intelligenceautonomous-agents
GitHub पर देखें184,973
jackfrued/python-100-days
jackfrued/Python-100-Days
183,425GitHub पर देखें
This project is a comprehensive, day-by-day curriculum designed to guide learners through the Python programming language and its professional applications. The content spans from fundamental syntax and object-oriented design to advanced topics including database management, web development, data analysis, and machine learning. The curriculum is structured into distinct modules that cover practical software engineering practices, such as version control, containerization, and system architecture. It also provides resources for technical interview preparation and an analysis of career paths wi
Understand the fundamentals of web scraping, including ethical considerations and essential toolsets for data extraction.
Jupyter Notebook
GitHub पर देखें183,425
microsoft/markitdown
microsoft/markitdown
154,485GitHub पर देखें
This project is an AI-powered document processing engine designed to transform diverse file formats into structured Markdown. By leveraging multimodal language models, it performs complex layout analysis and semantic text extraction, allowing for the conversion of both unstructured files and scanned images into machine-readable content. The toolkit distinguishes itself through a modular, plugin-based architecture that orchestrates multi-stage extraction pipelines. Users can steer the parsing behavior by injecting custom instructions, enabling the system to adapt to domain-specific document st
Interprets diverse file formats and generates structured, context-aware Markdown output using advanced language models.
Pythonautogenautogen-extensionlangchain
GitHub पर देखें154,485
langchain-ai/langchain
langchain-ai/langchain
139,458GitHub पर देखें
LangChain is an orchestration framework designed for building, managing, and deploying applications powered by large language models. It provides a unified integration layer that normalizes disparate model provider APIs into a consistent set of primitives, enabling developers to build complex, multi-step AI workflows that manage state, memory, and tool execution. The project distinguishes itself through a durable execution runtime that maintains persistent state across long-running processes by checkpointing progress to external storage. It models agent workflows as directed graphs, allowing
Organize directory hierarchies to manage machine-specific state and persistent application data effectively.
Pythonagentsaiai-agents
GitHub पर देखें139,458
mendableai/firecrawl
mendableai/firecrawl
139,399GitHub पर देखें
Firecrawl is a headless browser automation tool and web crawling engine designed to extract structured data from the web. It functions as an API that transforms raw website content and documents into clean markdown and JSON formats to serve as context for large language models. The project distinguishes itself by using natural language prompts to translate human instructions into targeted data extraction tasks and browser actions. It can execute interactive page navigation, such as clicking and scrolling, and perform automated web research to retrieve structured data without manual interventi
Navigates through entire websites to convert unstructured content into formats optimized for language models.
TypeScript
GitHub पर देखें139,399
firecrawl/firecrawl
firecrawl/firecrawl
133,479GitHub पर देखें
Firecrawl is a web data extraction platform designed to convert unstructured web content into clean, LLM-ready formats like markdown or JSON. It functions as an autonomous web crawler and scraper, capable of mapping entire domains, performing recursive navigation, and executing complex data gathering tasks. By leveraging headless browser orchestration, the system handles dynamic, JavaScript-heavy pages to ensure comprehensive data capture. The platform distinguishes itself through its focus on agentic workflows, providing a programmatic interface that allows autonomous agents to perform live
Transforms unstructured web pages into clean, structured formats specifically optimized for language model ingestion.
TypeScriptaiai-agentsai-crawler
GitHub पर देखें133,479
chalarangelo/30-seconds-of-code
Chalarangelo/30-seconds-of-code
128,121GitHub पर देखें
30-seconds-of-code is a comprehensive knowledge base and programming snippet library designed to support software engineering education and professional development. It provides a curated collection of reusable code units and technical guides that help developers master core language mechanics, design patterns, and architectural philosophies. The project distinguishes itself by offering a wide-ranging library of algorithmic solutions and web development patterns that are organized into modular, independently testable units. It emphasizes functional programming paradigms and declarative logic,
Provides tools for serializing and persisting data to the local file system.
JavaScriptastroawesome-listcss
GitHub पर देखें128,121
excalidraw/excalidraw
excalidraw/excalidraw
125,451GitHub पर देखें
This project is a virtual whiteboard component and vector graphics editor designed for creating diagrams with a hand-drawn aesthetic. It provides a canvas-based drawing engine that can be embedded directly into web applications, allowing users to manipulate shapes, upload images, and export visual data into standard formats like PNG, SVG, or JSON. The platform distinguishes itself through a real-time synchronization layer that supports multi-user collaboration across distributed environments. This engine utilizes end-to-end encryption to secure shared sessions and employs a local-first data p
Leverages browser-based storage to maintain application state locally, ensuring data availability and persistence even during offline operation.
TypeScriptcanvascollaborationdiagrams
GitHub पर देखें125,451
kubernetes/kubernetes
kubernetes/kubernetes
123,197GitHub पर देखें
Kubernetes is a distributed container orchestration platform that automates the deployment, scaling, and management of containerized applications across clusters of computing nodes. It functions as a declarative infrastructure controller, utilizing a control loop architecture that continuously monitors the current system state against user-defined configurations to ensure desired operational outcomes. The system relies on a centralized API-driven interface and a replicated key-value store to maintain a consistent source of truth for all cluster objects. The platform distinguishes itself throu
Maintains a consistent, replicated data store that serves as the reliable source of truth for distributed system states.
Gocncfcontainersgo
GitHub पर देखें123,197
comfyanonymous/comfyui
comfyanonymous/ComfyUI
117,322GitHub पर देखें
ComfyUI is a modular generative AI workflow orchestrator and node-based GUI for designing and executing complex diffusion model pipelines. It functions as both a visual interface for building generative logic graphs and a programmable backend API that exposes diffusion model operations for external integration. The system distinguishes itself through a graph-based execution model that supports differential workflow execution, re-running only modified nodes to reduce computation. It features dynamic model offloading to manage memory between system RAM and GPU VRAM and utilizes metadata-embedde
Enables saving and loading generation graphs as JSON files or extracting metadata from image and audio files.
Python
GitHub पर देखें117,322
papers-we-love/papers-we-love
papers-we-love/papers-we-love
107,093GitHub पर देखें
Papers We Love is a community-driven repository and learning network dedicated to the study and discussion of foundational computer science literature. It functions as a centralized educational archive, providing a structured environment where software professionals can engage with academic research to bridge the gap between theoretical concepts and practical application. The project distinguishes itself through a decentralized model of crowdsourced curation, where community members collectively maintain and categorize a vast index of technical resources. Beyond the repository itself, the ini
Parses documentation for external links to facilitate the retrieval of research documents for offline reading.
Shellawesomecomputer-sciencemeetup
GitHub पर देखें107,093
immich-app/immich
immich-app/immich
104,236GitHub पर देखें
Immich is a self-hosted media management platform designed to provide a centralized, private repository for photos and videos. It functions as a comprehensive system for organizing, backing up, and viewing personal media collections across mobile devices, web browsers, and external storage locations. By maintaining full control over data ownership and storage infrastructure, the platform ensures that users retain sovereignty over their digital assets. The system distinguishes itself through a distributed architecture that coordinates background media synchronization, real-time filesystem moni
Manages automated scheduling, retention policies, and manual triggers to protect essential system metadata and database snapshots.
TypeScriptbackup-toolfluttergoogle-photos
GitHub पर देखें104,236
pytorch/pytorch
pytorch/pytorch
100,814GitHub पर देखें
PyTorch is a machine learning framework centered on a GPU-ready tensor library that supports multi-dimensional array operations across both CPU and accelerator hardware. It provides a foundational infrastructure for mathematical computation and dynamic neural network construction, utilizing a tape-based automatic differentiation system that allows for flexible, non-static graph execution. The framework is designed for deep integration with Python, enabling natural usage alongside standard scientific computing ecosystems. It distinguishes itself through a comprehensive distributed training sui
Persists tensors and complex data structures to disk through native loading and saving mechanisms.
Pythonautograddeep-learninggpu
GitHub पर देखें100,814

Awesome Data Engineering and Infrastructure GitHub Repositories

openclaw/openclaw

kamranahmedse/developer-roadmap

donnemartin/system-design-primer

vinta/awesome-python

torvalds/linux

trimstray/the-book-of-secret-knowledge

affaan-m/ECC

Significant-Gravitas/AutoGPT

jackfrued/Python-100-Days

microsoft/markitdown

langchain-ai/langchain

mendableai/firecrawl

firecrawl/firecrawl

Chalarangelo/30-seconds-of-code

excalidraw/excalidraw

kubernetes/kubernetes

comfyanonymous/ComfyUI

papers-we-love/papers-we-love

immich-app/immich

pytorch/pytorch

सब-टैग एक्सप्लोर करें