HuggingFace API error: 404 Client Error: Not Found for url: https://router.huggingface.co/hf-inference/models/google/gemma-2-2b-it/v1/chat/completions

I’m trying AI Agent in LangChain and build a very basic agent with HuggingFace free model. But when I run the code I got the following error. could you help me to figure out why this happened? I use Huggingface free model previously but now I got it.

CODE:
from langchain_huggingface import (
ChatHuggingFace,
HuggingFaceEndpoint
)
from langchain_core.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import (
create_react_agent,
AgentExecutor
)
from langchain import hub
from dotenv import load_dotenv

load_dotenv()

llm_model = HuggingFaceEndpoint(
repo_id = “google/gemma-2-2b-it”,
task = “text-generation”
)

model = ChatHuggingFace(llm=llm_model)

search_tool = DuckDuckGoSearchRun()

prompt = hub.pull(“hwchase17/react”)

agent = create_react_agent(
llm=model,

tools=[search_tool],

prompt=prompt

)

agent_executer = AgentExecutor(

agent=agent,

tools=[search_tool],

verbose=True

)

response = agent_executer.invoke({

“input”:“3 way to reach goa from delhi”

})
print(response)

Blockquote
ERROR:
File “/home/meharaz/Documents/Projects/LangChain-Practice/LangChain/lib/python3.13/site-packages/langchain_core/language_models/chat_models.py”, line 925, in _generate_with_cache
result = self._generate(
messages, stop=stop, run_manager=run_manager, **kwargs
)
File “/home/meharaz/Documents/Projects/LangChain-Practice/LangChain/lib/python3.13/site-packages/langchain_huggingface/chat_models/huggingface.py”, line 370, in _generate
answer = self.llm.client.chat_completion(messages=message_dicts, **kwargs)
File “/home/meharaz/Documents/Projects/LangChain-Practice/LangChain/lib/python3.13/site-packages/huggingface_hub/inference/_client.py”, line 992, in chat_completion
data = self._inner_post(request_parameters, stream=stream)
File “/home/meharaz/Documents/Projects/LangChain-Practice/LangChain/lib/python3.13/site-packages/huggingface_hub/inference/_client.py”, line 357, in _inner_post
hf_raise_for_status(response)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
File “/home/meharaz/Documents/Projects/LangChain-Practice/LangChain/lib/python3.13/site-packages/huggingface_hub/utils/_http.py”, line 482, in hf_raise_for_status
raise _format(HfHubHTTPError, str(e), response) from e
huggingface_hub.errors.HfHubHTTPError: 404 Client Error: Not Found for url: https://router.huggingface.co/hf-inference/models/google/gemma-2-2b-it/v1/chat/completions (Request ID: Root=1-696eeb4f-3b421b5235b5e633725c642a;bfa2288a-0f4c-4f3d-8364-72987c241759)

hi @meharaz733

could you try out this approach:

from langchain_huggingface import HuggingFaceEndpoint
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.agents import create_react_agent, AgentExecutor
from langchain import hub

llm = HuggingFaceEndpoint(
    repo_id="google/gemma-2-2b-it",
    task="text-generation",
    max_new_tokens=512,
    # optional: provider="auto"
)

tools = [DuckDuckGoSearchRun()]
prompt = hub.pull("hwchase17/react")

agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)
executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

print(executor.invoke({"input": "3 ways to reach Goa from Delhi"}))

As far as I can read the source code, ChatHuggingFace is not using the “text-generation” route - when it wraps a HuggingFaceEndpoint, it calls Hugging Face’s chat-completions API (client.chat_completion(...)), which hits /v1/chat/completions on the router. That’s visible in LangChain’s Hugging Face integration code:

        if _is_huggingface_endpoint(self.llm):
            ...
            answer = self.llm.client.chat_completion(messages=message_dicts, **params)
            return self._create_chat_result(answer)

For google/gemma-2-2b-it on the hf-inference provider, that chat-completions route isn’t available, so the router returns 404 Not Found. (Hugging Face documents chat-completions as its own task/endpoint, and not every model/provider supports it: see Chat Completion

@pawel-twardziak Thank you very much for your response.

I tried the suggested approach, but unfortunately it didn’t work. I tested multiple models and encountered the following errors:

Blockquote

  • openai/gpt-oss-120b
    Bad Request: The endpoint is paused, ask a maintainer to restart it.
  • HuggingFaceTB/SmolLM3-3B
    404 Client Error: Not Found
  • deepseek-ai/DeepSeek-R1
    404 Client Error: Not Found

From what I understand, these models currently do not provide free inference endpoints.

Could you please point me to a resource or page where I can see which models are available for free inference on Hugging Face?

Thank you!

Hi @meharaz733

gpt-oss-120b is available locally with Ollama. SmolLM3-3B and DeepSeek-R1 are also available there. For free. Locally.

I don’t know what models are for free on HuggingFace (Hugging Face – Pricing), but you always have to provide HF_TOKEN for the connection.

@meharaz733 Hi, you go to the following link: Models – Hugging Face

  1. Use the “Inference Providers” Filter: On the left sidebar, look for the “Inference Providers” section.

  2. Select “HF Inference”: Check the box for “HF Inference” (or the name of other third-party serverless providers like Together, Replicate, etc.) to filter the list of models. The models displayed will be those available through the serverless API.

Models that support Serverless inference are also usually testable directly on their model pages via an interactive widget. If the widget is present and functional, it’s a good indicator that serverless inference is available for that model.

Check this out as well: Serverless Inference API - Hugging Face Open-Source AI Cookbook

1 Like

@keenborder786 @pawel-twardziak thank you very much dude! I got it. The problem was about the langchain framework. I was used old version and huggingface updated their API policy but old version can’t handle it. After upgrading the library and related packages, now it works. Thank again!

2 Likes