Rezwan Corpus
An Ocean of Hadith in Your Hands

The most comprehensive intelligent database of Shia and Sunni hadith, enriched with the power of artificial intelligence.

Over 1.2 million hadith from authentic Shia and Sunni sources.
Request Corpus

Hadith Research
An Old Challenge A Modern Solution

Rezwan Corpus transforms traditional research barriers into opportunities for deeper knowledge discovery.

Traditional Challenges

  • Vast volume and dispersion of sources
  • Difficulty in examining chains of transmission (sanad)
  • Time-consuming and prone to human error
  • Lack of unified and searchable access

The Rezwan Solution

  • Fast and unified access to millions of narrations
  • Automated analysis and comparative comparison
  • Knowledge generation with unprecedented speed and accuracy
  • Discovering hidden semantic relationships in texts
Comparison of traditional and modern research methods.

Unprecedented Scale
An Ocean of Hadith Data

The power of any analysis depends on the comprehensiveness of its data. Rezwan provides the most complete hadith dataset.

Comprehensive Coverage

Access to books and hundreds of authentic Shia and Sunni sources in a unified platform.

Comparative Research

Enables the study and comparison of narrations between different Islamic schools of thought, which was previously very difficult.

Macro-Level Analysis

Identify recurring patterns, study the evolution of a concept throughout history, and discover intertextual relationships.

Continuous Updates

This corpus is periodically updated every three months with new sources and analyses.

Data Source and Corpus Generation Process

Transparency about data sources and production methods is the foundation of scientific trust.

Primary Data Source

For this purpose, the data from the Ahl al-Bayt Library software was used as the raw text, and numerous efforts were made to produce this corpus using various intelligent and computational methods.

As the goal was to create a large hadith corpus based on the Ahl al-Bayt Library software, all books in the hadith sources category were selected. Additionally, books from all other subject categories up to the end of the 5th century AH, and for the lexicography category, books up to the end of the 3rd century AH, were chosen as raw text. Based on the selected texts, we tried to extract hadiths from anywhere possible within these texts using various AI and machine learning methods to create a large collection of hadiths from both schools (Shia and Sunni).

Important Note

Although this process was designed and implemented with the presence of hadith studies specialists, its outputs have not been fully reviewed or confirmed by experts from seminaries and universities. Therefore, some results may contain errors or incorrect content.

Corpus Statistics

The statistics of the hadiths extracted from the sources are as follows:

  • Hadith corpus based on Ahl al-Bayt Library: 1289 books, 1394080 hadiths
    • Hadith Books: 601 books (984004 hadiths)
    • Historical Books: 264 books (188836 hadiths)
    • Quranic Studies: 94 books (110752 hadiths)
    • Fiqh Books: 61 books (59204 hadiths)
    • Ethical Books: 74 books (47387 hadiths)
    • Theological Books: 88 books (25292 hadiths)
    • Arabic Literature: 67 books (11870 hadiths)
    • Supplications & Visitations: 15 books (5590 hadiths)
    • Principles of Fiqh: 9 books (964 hadiths)
    • Logic & Philosophy: 6 books (740 hadiths)

Beyond Words
Discovering Semantic and Thematic Relationships

The unique ability of Rezwan Corpus lies in its deep understanding of hadith content.

AI art prompt: A conceptual illustration of AI understanding language.

Semantic Vectors (Embeddings)

In Rezwan, you can examine the semantic and thematic connections of a narration, even if different words are used in it.

Summarization and Content Analysis

With a high accuracy of 9.34 out of 10, our algorithms can extract the essence of a long narration and provide a precise summary.

Intelligent Data Enrichment

Each narration in this corpus is enriched with valuable layers of information to transform raw data into usable knowledge.

Separating Sanad and Matn

Intelligent separation of the chain of narrators from the main content of the narration.

Translation of Sanad and Matn

Providing fluent and accurate translations into various languages.

Diacritization

Accurate diacritization of Arabic words for correct pronunciation.

Word-by-Word Translation

Displaying the equivalent meaning of each word for lexical analysis.

Narration Summary

Generating a short and clear summary of the hadith's main message.

Narration Goals

Extracting the main objective that the hadith seeks to explain.

Narration Topics

Assigning key topics for categorization and search.

Commentary and Explanation

Providing detailed explanations to clarify complex concepts.

Quranic References

Identifying and linking related verses from the Holy Quran.

Biblical References

Discovering content connections with Old and New Testament texts.

Lexically Similar Hadiths

Finding narrations that are similar in terms of word structure and phrases.

Semantically Similar Hadiths

Identifying narrations that convey the same message and concept.

Proven Quality and Practical Applications

8.59/10
Overall Quality Score

Rezwan Corpus has been evaluated by a team of prominent hadith science specialists.

Researchers

Conducting innovative research in a short time

Academics

A powerful tool for teaching and research

Research Centers

Developing large-scale data-driven projects

Seminaries

Transforming traditional research and teaching methods

Easy Access for Everyone

We have designed two main pathways to leverage the power of Rezwan Corpus.

For Researchers and General Users

Utilizing the Rezwan Corpus is accessible through the Thiqat software; a system with biographical data, advanced searches, various filters, semantic network displays, and more.

Enter Thiqat Software

For Developers and Organizations (API)

Secure and documented APIs for fetching texts, searching, and utilizing other information available in Rezwan for integration into existing systems.

View API Docs (Coming Soon)

License and Data Usage Terms

We believe that knowledge should be freely available while its intellectual rights are preserved.

Creative Commons Attribution–ShareAlike (CC BY-SA 4.0)

Attribution-ShareAlike

This license gives you a lot of freedom, but it has one important condition: if you use data from Rezwan Corpus and create an improved or derivative version, you must also release that new product under the same license (CC BY-SA 4.0).

Condition Description Symbol
Freedom of Use The data is free to use for any purpose (commercial and non-commercial).
Attribution When using, you must credit "Rezwan Corpus" as the source.
Share-Alike Any new dataset created based on our data must be released under the same license to continue the cycle of free knowledge.

برای ارجاع‌دهی علمی به این پیکره، می‌توانید از مقاله زیر استفاده کنید:

Asgari-Bidhendi, M., Ghaseminia, M. A., Shahbazi A., Hossayni S. A., Torabian N., Minaei-Bidgoli B. (2025). Rezwan: Leveraging Large Language Models for Comprehensive Hadith Text Processing: A 1.2M Corpus Development

Request Corpus Access

برای آشنایی و دریافت پیکره، دو مسیر پیش روی شماست: می‌توانید ابتدا نمونه‌های داده را بررسی کنید یا مستقیماً برای دریافت پیکره کامل اقدام نمایید.

۱ Explore Sample Datasets

To familiarize yourself with the data structure and quality, you can download small samples of each dataset in JSON and CSV formats.

۲ درخواست پیکره کامل

To get full access to the corpus, please complete the multi-step form below.

Fill Form & Request Full Corpus