# Robots.txt - Rasit Dinc Digital Health & AI Research Platform # Last Updated: December 20, 2025 (v3.0 - Ultra SEO Optimized) # https://rasitdinc.com # ============================================ # AI SEARCH ENGINES & LLM CRAWLERS (2025) # ============================================ # OpenAI ChatGPT / SearchGPT User-agent: GPTBot Allow: / Crawl-delay: 1 User-agent: ChatGPT-User Allow: / # Google Gemini (Bard AI) User-agent: Google-Extended Allow: / User-agent: GoogleOther Allow: / # Perplexity AI User-agent: PerplexityBot Allow: / Crawl-delay: 1 # Anthropic Claude User-agent: ClaudeBot Allow: / Crawl-delay: 1 User-agent: Claude-Web Allow: / User-agent: anthropic-ai Allow: / # Cohere AI User-agent: cohere-ai Allow: / # Meta AI (Facebook/Llama) User-agent: Meta-ExternalAgent Allow: / User-agent: FacebookBot Allow: / User-agent: meta-externalagent Allow: / # Microsoft Copilot / Bing AI User-agent: Bingbot Allow: / Crawl-delay: 1 User-agent: bingbot Allow: / # You.com AI Search User-agent: YouBot Allow: / # Brave Search AI User-agent: Brave-Indexer Allow: / # Kagi Search User-agent: KagiBot Allow: / # DeepSeek AI User-agent: DeepSeekBot Allow: / # Mistral AI User-agent: MistralBot Allow: / # Phind (Developer AI) User-agent: PhindBot Allow: / # Grok (X.com AI) User-agent: GrokBot Allow: / # Academic AI Crawlers User-agent: ConsensusBot Allow: / User-agent: ElicitBot Allow: / User-agent: SemanticScholarBot Allow: / User-agent: ScholarBot Allow: / # ============================================ # TRADITIONAL SEARCH ENGINES # ============================================ # Google (All variants) User-agent: Googlebot Allow: / User-agent: Googlebot-Image Allow: / User-agent: Googlebot-News Allow: / User-agent: Googlebot-Video Allow: / User-agent: Storebot-Google Allow: / User-agent: Google-InspectionTool Allow: / # Bing (All variants) User-agent: msnbot Allow: / User-agent: BingPreview Allow: / User-agent: MicrosoftPreview Allow: / # Yandex User-agent: YandexBot Allow: / User-agent: YandexImages Allow: / User-agent: YandexNews Allow: / User-agent: YandexMedia Allow: / # Baidu (China) User-agent: Baiduspider Allow: / User-agent: Baiduspider-image Allow: / User-agent: Baiduspider-news Allow: / # DuckDuckGo User-agent: DuckDuckBot Allow: / # Yahoo User-agent: Slurp Allow: / # Naver (Korea) User-agent: Yeti Allow: / # Sogou (China) User-agent: Sogou Allow: / # Seznam (Czech) User-agent: SeznamBot Allow: / # Qwant (Europe) User-agent: Qwantify Allow: / # Ecosia User-agent: Ecosia Allow: / # ============================================ # ACADEMIC & RESEARCH PLATFORMS # ============================================ # Google Scholar User-agent: Googlebot-Scholar Allow: / # ResearchGate User-agent: ResearchGateBot Allow: / # Academia.edu User-agent: AcademiaBot Allow: / # Semantic Scholar User-agent: SemanticScholarBot Allow: / # PubMed Central User-agent: PMCBot Allow: / # CrossRef / DOI User-agent: CrossRefBot Allow: / # ORCID User-agent: ORCIDBot Allow: / # BASE (Bielefeld Academic) User-agent: BASEBot Allow: / # CORE User-agent: COREBot Allow: / # Microsoft Academic User-agent: MicrosoftAcademicBot Allow: / # Elsevier / Scopus User-agent: ElsevierBot Allow: / # Web of Science / Clarivate User-agent: ClarivateBot Allow: / # ============================================ # SPECIALIZED CRAWLERS # ============================================ # Apple Siri & Spotlight User-agent: Applebot Allow: / # Amazon Alexa User-agent: ia_archiver Allow: / # Twitter/X User-agent: Twitterbot Allow: / # LinkedIn User-agent: LinkedInBot Allow: / # Pinterest User-agent: Pinterest Allow: / # WhatsApp User-agent: WhatsApp Allow: / # Telegram User-agent: TelegramBot Allow: / # Slack User-agent: Slackbot Allow: / # Discord User-agent: Discordbot Allow: / # ============================================ # NEWS & CONTENT AGGREGATORS # ============================================ # Google News User-agent: Googlebot-News Allow: / # Apple News User-agent: AppleNewsBot Allow: / # Feedly User-agent: Feedlybot Allow: / # NewsNow User-agent: NewsNowBot Allow: / # Flipboard User-agent: Flipboard Allow: / # SmartNews User-agent: SmartNews Allow: / # ============================================ # MONITORING & SEO TOOLS # ============================================ # Google Verification User-agent: Google-Site-Verification Allow: / # Bing Webmaster User-agent: BingWebmaster Allow: / # Semrush User-agent: SemrushBot Allow: / Crawl-delay: 2 # Ahrefs User-agent: AhrefsBot Allow: / Crawl-delay: 3 # Moz User-agent: rogerbot Allow: / User-agent: dotbot Allow: / # Majestic User-agent: MJ12bot Allow: / Crawl-delay: 3 # Screaming Frog User-agent: Screaming Frog SEO Spider Allow: / # GTmetrix User-agent: GTmetrix Allow: / # ============================================ # BLOCKED CRAWLERS (Security & Quality) # ============================================ # Aggressive scrapers User-agent: CCBot Disallow: / User-agent: GPTBot-Training Disallow: /private/ # Known bad bots User-agent: BLEXBot Disallow: / User-agent: DataForSeoBot Disallow: / User-agent: Bytespider Disallow: / User-agent: PetalBot Disallow: / # ============================================ # SITEMAP DECLARATIONS # ============================================ # Main Sitemap (Dynamic - includes all pages) Sitemap: https://rasitdinc.com/sitemap.xml # News Sitemap (Recent Articles - Google News) Sitemap: https://rasitdinc.com/news-sitemap.xml # Image Sitemap Sitemap: https://rasitdinc.com/image-sitemap.xml # Video Sitemap (YouTube Educational Videos) Sitemap: https://rasitdinc.com/video-sitemap.xml # ============================================ # DEFAULT POLICY FOR ALL BOTS # ============================================ User-agent: * Allow: / Crawl-delay: 1 # Disallow private/admin areas Disallow: /api/admin/ Disallow: /private/ Disallow: /_next/static/ Disallow: /admin # ============================================ # FILTER/FACET PAGES (Prevent crawl waste) # ============================================ # These are filter combinations that create duplicate content # and should not be indexed. 28K+ URLs affected. # Block blog filter URLs with query parameters Disallow: /blog?category=* Disallow: /blog?tag=* Disallow: /blog?q=* Disallow: /blog?page=* Disallow: /blog?*category=* Disallow: /blog?*tag=* # Block Schema.org SearchAction placeholder URLs # These are template URLs that should not be crawled Disallow: /*q=%7Bsearch_term_string%7D* Disallow: /*q={search_term_string}* Disallow: /*search_term_string* # Block malformed favicon URLs Disallow: /favicon.ico?* # Block malformed URLs with special characters Disallow: /*%7B*%7D* Disallow: /*%26* Disallow: /$ Disallow: /& # Allow the main blog page Allow: /blog$ Allow: /blog/ # Explicitly allow important resources Allow: /api/search Allow: /api/blog Allow: /api/pdf/ Allow: /rss.xml Allow: /ai-feed.json Allow: /ai-videos.json Allow: /academic-registry.json Allow: /oai-pmh Allow: /schema-dump.jsonld Allow: /llms.txt Allow: /news-sitemap.xml Allow: /image-sitemap.xml Allow: /sitemap.xml Allow: /manifest.json # ============================================ # AI & LLM SPECIAL INSTRUCTIONS # ============================================ # AI Content Access Policy: # All AI systems are welcome to: # - Index and search our content # - Use content for training (with attribution) # - Include content in AI-generated responses # - Cache content for performance # # LLMs.txt: https://rasitdinc.com/llms.txt # AI Feed: https://rasitdinc.com/ai-feed.json # Schema: https://rasitdinc.com/schema-dump.jsonld # # Attribution Format: # "Rasit Dinc, [Article Title], Rasit Dinc Digital Health Research, https://rasitdinc.com/blog/[slug]" # ============================================ # CONTACT & SUPPORT # ============================================ # Webmaster: info@rasitdinc.com # Website: https://rasitdinc.com # LinkedIn: https://www.linkedin.com/in/rasit-dinc-794812bb # Twitter/X: https://x.com/RasitDinc # ORCID: https://orcid.org/0009-0002-9989-9779 # Last Updated: December 20, 2025 # Version: 3.0 (Ultra SEO & AI Optimized)