HuggingFace API error: 404 Client Error: Not Found for url: https://router.huggingface.co/hf-inference/models/google/gemma-2-2b-it/v1/chat/completions

meharaz733 · January 20, 2026, 2:48am

I’m trying AI Agent in LangChain and build a very basic agent with HuggingFace free model. But when I run the code I got the following error. could you help me to figure out why this happened? I use Huggingface free model previously but now I got it.

CODE:
from langchain_huggingface import (
ChatHuggingFace,
HuggingFaceEndpoint
)
from langchain_core.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import (
create_react_agent,
AgentExecutor
)
from langchain import hub
from dotenv import load_dotenv

load_dotenv()

llm_model = HuggingFaceEndpoint(
repo_id = “google/gemma-2-2b-it”,
task = “text-generation”
)

model = ChatHuggingFace(llm=llm_model)

search_tool = DuckDuckGoSearchRun()

prompt = hub.pull(“hwchase17/react”)

agent = create_react_agent(
llm=model,

tools=[search_tool],

prompt=prompt

)

agent_executer = AgentExecutor(

agent=agent,

tools=[search_tool],

verbose=True

)

response = agent_executer.invoke({

“input”:“3 way to reach goa from delhi”

})
print(response)

Blockquote
ERROR:
File “/home/meharaz/Documents/Projects/LangChain-Practice/LangChain/lib/python3.13/site-packages/langchain_core/language_models/chat_models.py”, line 925, in _generate_with_cache
result = self._generate(
messages, stop=stop, run_manager=run_manager, **kwargs
)
File “/home/meharaz/Documents/Projects/LangChain-Practice/LangChain/lib/python3.13/site-packages/langchain_huggingface/chat_models/huggingface.py”, line 370, in _generate
answer = self.llm.client.chat_completion(messages=message_dicts, **kwargs)
File “/home/meharaz/Documents/Projects/LangChain-Practice/LangChain/lib/python3.13/site-packages/huggingface_hub/inference/_client.py”, line 992, in chat_completion
data = self._inner_post(request_parameters, stream=stream)
File “/home/meharaz/Documents/Projects/LangChain-Practice/LangChain/lib/python3.13/site-packages/huggingface_hub/inference/_client.py”, line 357, in _inner_post
hf_raise_for_status(response)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
File “/home/meharaz/Documents/Projects/LangChain-Practice/LangChain/lib/python3.13/site-packages/huggingface_hub/utils/_http.py”, line 482, in hf_raise_for_status
raise _format(HfHubHTTPError, str(e), response) from e
huggingface_hub.errors.HfHubHTTPError: 404 Client Error: Not Found for url: https://router.huggingface.co/hf-inference/models/google/gemma-2-2b-it/v1/chat/completions (Request ID: Root=1-696eeb4f-3b421b5235b5e633725c642a;bfa2288a-0f4c-4f3d-8364-72987c241759)

pawel-twardziak · January 20, 2026, 8:05am

hi @meharaz733

could you try out this approach:

from langchain_huggingface import HuggingFaceEndpoint
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub

llm = HuggingFaceEndpoint(
    repo_id="google/gemma-2-2b-it",
    task="text-generation",
    max_new_tokens=512,
    # optional: provider="auto"
)

tools = [DuckDuckGoSearchRun()]
prompt = hub.pull("hwchase17/react")

agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

print(executor.invoke({"input": "3 ways to reach Goa from Delhi"}))

As far as I can read the source code, ChatHuggingFace is not using the “text-generation” route - when it wraps a HuggingFaceEndpoint, it calls Hugging Face’s chat-completions API (client.chat_completion(...)), which hits /v1/chat/completions on the router. That’s visible in LangChain’s Hugging Face integration code:

        if _is_huggingface_endpoint(self.llm):
            ...
            answer = self.llm.client.chat_completion(messages=message_dicts, **params)
            return self._create_chat_result(answer)

For google/gemma-2-2b-it on the hf-inference provider, that chat-completions route isn’t available, so the router returns 404 Not Found. (Hugging Face documents chat-completions as its own task/endpoint, and not every model/provider supports it: see Chat Completion

meharaz733 · January 20, 2026, 6:24pm

@pawel-twardziak Thank you very much for your response.

I tried the suggested approach, but unfortunately it didn’t work. I tested multiple models and encountered the following errors:

Blockquote

openai/gpt-oss-120b
Bad Request: The endpoint is paused, ask a maintainer to restart it.
HuggingFaceTB/SmolLM3-3B
404 Client Error: Not Found
deepseek-ai/DeepSeek-R1
404 Client Error: Not Found

From what I understand, these models currently do not provide free inference endpoints.

Could you please point me to a resource or page where I can see which models are available for free inference on Hugging Face?

Thank you!

pawel-twardziak · January 20, 2026, 8:52pm

Hi @meharaz733

gpt-oss-120b is available locally with Ollama. SmolLM3-3B and DeepSeek-R1 are also available there. For free. Locally.

I don’t know what models are for free on HuggingFace (Hugging Face – Pricing), but you always have to provide HF_TOKEN for the connection.

keenborder786 · January 21, 2026, 11:36pm

@meharaz733 Hi, you go to the following link: Models – Hugging Face

Use the “Inference Providers” Filter: On the left sidebar, look for the “Inference Providers” section.
Select “HF Inference”: Check the box for “HF Inference” (or the name of other third-party serverless providers like Together, Replicate, etc.) to filter the list of models. The models displayed will be those available through the serverless API.

Models that support Serverless inference are also usually testable directly on their model pages via an interactive widget. If the widget is present and functional, it’s a good indicator that serverless inference is available for that model.

Check this out as well: Serverless Inference API - Hugging Face Open-Source AI Cookbook

meharaz733 · January 24, 2026, 4:49am

@keenborder786 @pawel-twardziak thank you very much dude! I got it. The problem was about the langchain framework. I was used old version and huggingface updated their API policy but old version can’t handle it. After upgrading the library and related packages, now it works. Thank again!

Topic		Replies	Views
ChatOpenAI with HF Inference API endpoint no longer working! LangChain python-help	1	450	July 18, 2025
Tool/Function Calling with Llama-3.2-3B-Instruct model (local) LangChain python-help	8	670	January 3, 2026
ChatHuggingFace + HuggingFacePipeline code never parses tool call code LangChain product-feedback , python-help	4	377	December 20, 2025
Agent with ChatHuggingFace LLM does not support both tools and response format? LangChain python-help	5	262	January 10, 2026
LangGraph + OpenAI Responses API: 400 Error 'web_search_call' was provided without its required 'reasoning' item LangGraph product-feedback , python-help	2	1031	October 4, 2025

HuggingFace API error: 404 Client Error: Not Found for url: https://router.huggingface.co/hf-inference/models/google/gemma-2-2b-it/v1/chat/completions

Related topics