Data and AI on Power

Data and AI on Power

IBM Power systems provide a robust and scalable platform for a wide range of data and AI workloads, offering benefits in performance, security, and ease of use.

 View Only

Run vLLM on ppc64le Architecture

By Manjunath Kumatagi posted Thu June 27, 2024 06:39 AM

  

Introduction

Large Language Models (LLMs) are revolutionizing various fields, and vLLM emerges as a powerful library for LLM inference and serving. Great news for users with ppc64le hardware! Recent developments (https://github.com/vllm-project/vllm/pull/5652) indicate that vLLM now added support for this architecture. This blog outlines the steps to get started with vLLM on ppc64le.

Build the container image:

$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm/
$ podman build --security-opt label=disable --format docker -t vllm:ppc64le -f Dockerfile.ppc64le .

Using the Built Image:

# listing the images built
$ podman images
REPOSITORY                     TAG         IMAGE ID      CREATED         SIZE
localhost/vllm                 ppc64le     9a44a2021b41  38 minutes ago  4.32 GB
<none>                         <none>      a80bc1de136c  43 minutes ago  2.51 GB
docker.io/mambaorg/micromamba  latest      358d7e727885  9 days ago      137 MB
$
# creating the directory for caching the models from the huggingface
$ mkdir -p ~/.cache/huggingface
$ podman run -ti -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host localhost/vllm:ppc64le

Above command starts the server with the default model(facebook/opt-125m) pulled from hugginface and this server can be queried in the same format as OpenAI API. For example,

List the models:

$ curl http://localhost:8000/v1/models | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   484  100   484    0     0   236k      0 --:--:-- --:--:-- --:--:--  236k
{
  "object": "list",
  "data": [
    {
      "id": "facebook/opt-125m",
      "object": "model",
      "created": 1719484379,
      "owned_by": "vllm",
      "root": "facebook/opt-125m",
      "parent": null,
      "max_model_len": 2048,
      "permission": [
        {
          "id": "modelperm-8ba9ddf949764d359f2db7eb1fa92090",
          "object": "model_permission",
          "created": 1719484379,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

Chat completion:

$ curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "facebook/opt-125m",
        "prompt": "San Francisco is a",
        "max_tokens": 7,
        "temperature": 0
    }' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   443  100   308  100   135    306    134  0:00:01  0:00:01 --:--:--   440
{
  "id": "cmpl-f77c7f6e64df4221836d85d64d28ae04",
  "object": "text_completion",
  "created": 1719484486,
  "model": "facebook/opt-125m",
  "choices": [
    {
      "index": 0,
      "text": " great place to live.  I",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 12,
    "completion_tokens": 7
  }
}

For more information on the usage please refer the vllm document - https://docs.vllm.ai/en/stable/index.html


#Featured-area-1
#Featured-area-1-home

5 comments
82 views

Permalink

Comments

Tue January 21, 2025 03:26 PM

Have the same problem now as this user on Github 

https://github.com/vllm-project/vllm/issues/11837

ERROR 01-21 20:18:03 engine.py:136] return forward_call(*args, **kwargs)
ERROR 01-21 20:18:03 engine.py:136] file ‘/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/attention/layer.py’, line 161, in forward
ERROR 01-21 20:18:03 engine.py:136] return torch.ops.vllm.unified_attention(
ERROR 01-21 20:18:03 engine.py:136] file ‘/opt/conda/lib/python3.10/site-packages/torch/_ops.py’, line 921, in __getattr__
ERROR 01-21 20:18:03 engine.py:136] raise AttributeError(
ERROR 01-21 20:18:03 engine.py:136] AttributeError: ‘_OpNamespace’ ‘vllm’ object has no attribute ‘unified_attention’
INFO: Shutdown
INFO: Waiting for the application to shut down.
INFO: Application shutdown completed.

(base) [root@andre-infra ~]# curl http://localhost:8000/v1/models | jq
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100   484  100   484    0     0   472k      0 --:--:-- --:--:-- --:--:--  472k
{
 "object": "list",
 "data": [
   {
     "id": "facebook/opt-125m",
     "object": "model",
     "created": 1737490653,
     "owned_by": "vllm",
     "root": "facebook/opt-125m",
     "parent": null,
     "max_model_len": 2048,
     "permission": [
       {
         "id": "modelperm-429ffd8065de4c40acbafac6631800f3",
         "object": "model_permission",
         "created": 1737490653,
         "allow_create_engine": false,
         "allow_sampling": true,
         "allow_logprobs": true,
         "allow_search_indices": false,
         "allow_view": true,
         "allow_fine_tuning": false,
         "organization": "*",
         "group": null,
         "is_blocking": false
       }
     ]
   }
 ]
}
(base) [root@andre-infra ~]# curl http://localhost:8000/v1/completions \
   -H "Content-Type: application/json" \
   -d '{
       "model": "facebook/opt-125m",
       "prompt": "San Francisco is a",
       "max_tokens": 7,
       "temperature": 0
   }' | jq
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100   135    0     0  100   135      0   2177 --:--:-- --:--:-- --:--:--  2177

Sun January 19, 2025 06:55 AM

similar behaviour seen while running on one of the machine with 16GB with default model, when I checked the dmesg and seen following messages:

[  803.573297] __vm_enough_memory: pid: 3672, comm: pt_main_thread, not enough memory for the allocation
[  809.613661] __vm_enough_memory: pid: 3678, comm: pt_main_thread, not enough memory for the allocation

Tried the following workaround and able to run the model on the same machine without any issues.

sysctl -w vm.overcommit_memory=1

Note: this may unblock you for the testing if same error but not really sure about its side effect.

Fri January 17, 2025 02:16 PM

How much memory did your Lpar have?

Fri January 17, 2025 12:43 PM

OSError: [Errno 12] Cannot allocate memory

I see above error in the trace, wondering if you can bump the memory for the lpar and retry?

Fri January 17, 2025 11:40 AM

Hello,

I get these errors . 
Can you help me with this? 

[root@andre-infra vllm]# podman run -ti -v ~/.ollama/vllm/ -p 8000:8000 --ipc=host localhost/vllm:ppc64le
/opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1
017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libj
peg` or `libpng` installed before building `torchvision` from source?
 warn(
INFO 01-17 16:36:29 __init__.py:179] Automatically detected platform cpu.
INFO 01-17 16:36:31 api_server.py:768] vLLM API server version 0.6.6.post2.dev256+g87a0c076
INFO 01-17 16:36:31 api_server.py:769] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=No
ne, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_
path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='
', model='facebook/opt-125m', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local
_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_back
end='xgrammar', logits_processor_pattern=None, distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_ns
ight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, nu
m_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=F
alse, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, dis
able_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, f
ully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, ena
ble_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculat
ive_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_pos
terior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name
=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config
=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=Fals
e)
INFO 01-17 16:36:31 api_server.py:195] Started engine process with PID 5
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 651/651 [00:00<00:00, 6.39MB/s]
INFO 01-17 16:36:32 config.py:2273] For POWERPC, we cast models to bfloat16 instead of using float16 by default. Float16 is not currently supported for POWERPC.
WARNING 01-17 16:36:32 config.py:2317] Casting torch.float16 to torch.bfloat16.
/opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1
017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libj
peg` or `libpng` installed before building `torchvision` from source?
 warn(
INFO 01-17 16:36:37 __init__.py:179] Automatically detected platform cpu.
INFO 01-17 16:36:40 config.py:2273] For POWERPC, we cast models to bfloat16 instead of using float16 by default. Float16 is not currently supported for POWERPC.
WARNING 01-17 16:36:40 config.py:2317] Casting torch.float16 to torch.bfloat16.
ERROR 01-17 16:36:40 registry.py:296] Error in inspecting model architecture 'OPTForCausalLM'
ERROR 01-17 16:36:40 registry.py:296] Traceback (most recent call last):
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 491, in _run
_in_subprocess
ERROR 01-17 16:36:40 registry.py:296]     returned.check_returncode()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/subprocess.py", line 457, in check_returncode
ERROR 01-17 16:36:40 registry.py:296]     raise CalledProcessError(self.returncode, self.args, self.stdout,
ERROR 01-17 16:36:40 registry.py:296] subprocess.CalledProcessError: Command '['/opt/conda/bin/python3', '-m', 'vllm.model_executor.models.registry']' returned non-zero exit status 1.
ERROR 01-17 16:36:40 registry.py:296]  
ERROR 01-17 16:36:40 registry.py:296] The above exception was the direct cause of the following exception:
ERROR 01-17 16:36:40 registry.py:296]  
ERROR 01-17 16:36:40 registry.py:296] Traceback (most recent call last):
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 294, in _try
_inspect_model_cls
ERROR 01-17 16:36:40 registry.py:296]     return model.inspect_model_cls()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 265, in insp
ect_model_cls
ERROR 01-17 16:36:40 registry.py:296]     return _run_in_subprocess(
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 494, in _run
_in_subprocess
ERROR 01-17 16:36:40 registry.py:296]     raise RuntimeError(f"Error raised in subprocess:\n"
ERROR 01-17 16:36:40 registry.py:296] RuntimeError: Error raised in subprocess:
ERROR 01-17 16:36:40 registry.py:296] /opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvis
ion/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong wit
h your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
ERROR 01-17 16:36:40 registry.py:296]   warn(
ERROR 01-17 16:36:40 registry.py:296] /opt/conda/lib/python3.10/runpy.py:126: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models',
but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
ERROR 01-17 16:36:40 registry.py:296]   warn(RuntimeWarning(msg))
ERROR 01-17 16:36:40 registry.py:296] Traceback (most recent call last):
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
ERROR 01-17 16:36:40 registry.py:296]     return _run_code(code, main_globals, None,
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
ERROR 01-17 16:36:40 registry.py:296]     exec(code, run_globals)
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 515, in <mod
ule>
ERROR 01-17 16:36:40 registry.py:296]     _run()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 508, in _run
ERROR 01-17 16:36:40 registry.py:296]     result = fn()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 266, in <lam
bda>
ERROR 01-17 16:36:40 registry.py:296]     lambda: _ModelInfo.from_model_cls(self.load_model_cls()))
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 269, in load
_model_cls
ERROR 01-17 16:36:40 registry.py:296]     mod = importlib.import_module(self.module_name)
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/importlib/__init__.py", line 126, in import_module
ERROR 01-17 16:36:40 registry.py:296]     return _bootstrap._gcd_import(name[level:], package, level)
ERROR 01-17 16:36:40 registry.py:296]   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
ERROR 01-17 16:36:40 registry.py:296]   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
ERROR 01-17 16:36:40 registry.py:296]   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
ERROR 01-17 16:36:40 registry.py:296]   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
ERROR 01-17 16:36:40 registry.py:296]   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
ERROR 01-17 16:36:40 registry.py:296]   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/opt.py", line 34, in <module>
ERROR 01-17 16:36:40 registry.py:296]     from vllm.model_executor.layers.logits_processor import LogitsProcessor
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/logits_processor.py", line 12,
in <module>
ERROR 01-17 16:36:40 registry.py:296]     from vllm.model_executor.layers.vocab_parallel_embedding import (
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/vocab_parallel_embedding.py", l
ine 137, in <module>
ERROR 01-17 16:36:40 registry.py:296]     def get_masked_input_and_mask(
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1875, in fn
ERROR 01-17 16:36:40 registry.py:296]     return compile(model,
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1893, in compile
ERROR 01-17 16:36:40 registry.py:296]     return torch._dynamo.optimize(backend=backend, nopython=fullgraph, dynamic=dynamic, disable=disable)(model)
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 681, in optimize
ERROR 01-17 16:36:40 registry.py:296]     compiler_config=backend.get_compiler_config()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1734, in get_compiler_config
ERROR 01-17 16:36:40 registry.py:296]     from torch._inductor.compile_fx import get_patched_config_dict
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 40, in <module>
ERROR 01-17 16:36:40 registry.py:296]     from torch._inductor.codecache import code_hash, CompiledFxGraph, FxGraphCache
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2727, in <module>
ERROR 01-17 16:36:40 registry.py:296]     AsyncCompile.warm_pool()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2632, in warm_pool
ERROR 01-17 16:36:40 registry.py:296]     pool._adjust_process_count()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 697, in _adjust_process_count
ERROR 01-17 16:36:40 registry.py:296]     self._spawn_process()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 714, in _spawn_process
ERROR 01-17 16:36:40 registry.py:296]     p.start()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 121, in start
ERROR 01-17 16:36:40 registry.py:296]     self._popen = self._Popen(self)
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/context.py", line 281, in _Popen
ERROR 01-17 16:36:40 registry.py:296]     return Popen(process_obj)
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
ERROR 01-17 16:36:40 registry.py:296]     self._launch(process_obj)
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/popen_fork.py", line 66, in _launch
ERROR 01-17 16:36:40 registry.py:296]     self.pid = os.fork()
ERROR 01-17 16:36:40 registry.py:296] OSError: [Errno 12] Cannot allocate memory
ERROR 01-17 16:36:40 registry.py:296]  
Traceback (most recent call last):
 File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
   return _run_code(code, main_globals, None,
 File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
   exec(code, run_globals)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/entrypoints/openai/api_server.py", line 832, in <module>
   uvloop.run(run_server(args))
 File "/opt/conda/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
   return loop.run_until_complete(wrapper())
 File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
 File "/opt/conda/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
   return await main
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/entrypoints/openai/api_server.py", line 796, in run_server
   async with build_async_engine_client(args) as engine_client:
 File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__
   return await anext(self.gen)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/entrypoints/openai/api_server.py", line 125, in build_async_engine_client
   async with build_async_engine_client_from_engine_args(
 File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__
   return await anext(self.gen)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/entrypoints/openai/api_server.py", line 206, in build_async_engine_client_from_engine_args
   engine_config = engine_args.create_engine_config()
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 1043, in create_engine_config
   model_config = self.create_model_config()
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 969, in create_model_config
   return ModelConfig(
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 342, in __init__
   self.multimodal_config = self._init_multimodal_config(
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 402, in _init_multimodal_config
   if ModelRegistry.is_multimodal_model(architectures):
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 430, in is_multimodal_model
   model_cls, _ = self.inspect_model_cls(architectures)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 390, in inspect_model_cls
   return self._raise_for_unsupported(architectures)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 347, in _raise_for_unsupported
   raise ValueError(
ValueError: Model architectures ['OPTForCausalLM'] failed to be inspected. Please check the logs for more details.
ERROR 01-17 16:36:48 registry.py:296] Error in inspecting model architecture 'OPTForCausalLM'
ERROR 01-17 16:36:48 registry.py:296] Traceback (most recent call last):
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 491, in _run
_in_subprocess
ERROR 01-17 16:36:48 registry.py:296]     returned.check_returncode()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/subprocess.py", line 457, in check_returncode
ERROR 01-17 16:36:48 registry.py:296]     raise CalledProcessError(self.returncode, self.args, self.stdout,
ERROR 01-17 16:36:48 registry.py:296] subprocess.CalledProcessError: Command '['/opt/conda/bin/python3', '-m', 'vllm.model_executor.models.registry']' returned non-zero exit status 1.
ERROR 01-17 16:36:48 registry.py:296]  
ERROR 01-17 16:36:48 registry.py:296] The above exception was the direct cause of the following exception:
ERROR 01-17 16:36:48 registry.py:296]  
ERROR 01-17 16:36:48 registry.py:296] Traceback (most recent call last):
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 294, in _try
_inspect_model_cls
ERROR 01-17 16:36:48 registry.py:296]     return model.inspect_model_cls()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 265, in insp
ect_model_cls
ERROR 01-17 16:36:48 registry.py:296]     return _run_in_subprocess(
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 494, in _run
_in_subprocess
ERROR 01-17 16:36:48 registry.py:296]     raise RuntimeError(f"Error raised in subprocess:\n"
ERROR 01-17 16:36:48 registry.py:296] RuntimeError: Error raised in subprocess:
ERROR 01-17 16:36:48 registry.py:296] /opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvis
ion/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong wit
h your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
ERROR 01-17 16:36:48 registry.py:296]   warn(
ERROR 01-17 16:36:48 registry.py:296] /opt/conda/lib/python3.10/runpy.py:126: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models',
but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
ERROR 01-17 16:36:48 registry.py:296]   warn(RuntimeWarning(msg))
ERROR 01-17 16:36:48 registry.py:296] Traceback (most recent call last):
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
ERROR 01-17 16:36:48 registry.py:296]     return _run_code(code, main_globals, None,
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
ERROR 01-17 16:36:48 registry.py:296]     exec(code, run_globals)
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 515, in <mod
ule>
ERROR 01-17 16:36:48 registry.py:296]     _run()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 508, in _run
ERROR 01-17 16:36:48 registry.py:296]     result = fn()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 266, in <lam
bda>
ERROR 01-17 16:36:48 registry.py:296]     lambda: _ModelInfo.from_model_cls(self.load_model_cls()))
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 269, in load
_model_cls
ERROR 01-17 16:36:48 registry.py:296]     mod = importlib.import_module(self.module_name)
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/importlib/__init__.py", line 126, in import_module
ERROR 01-17 16:36:48 registry.py:296]     return _bootstrap._gcd_import(name[level:], package, level)
ERROR 01-17 16:36:48 registry.py:296]   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
ERROR 01-17 16:36:48 registry.py:296]   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
ERROR 01-17 16:36:48 registry.py:296]   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
ERROR 01-17 16:36:48 registry.py:296]   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
ERROR 01-17 16:36:48 registry.py:296]   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
ERROR 01-17 16:36:48 registry.py:296]   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/opt.py", line 34, in <module>
ERROR 01-17 16:36:48 registry.py:296]     from vllm.model_executor.layers.logits_processor import LogitsProcessor
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/logits_processor.py", line 12,
in <module>
ERROR 01-17 16:36:48 registry.py:296]     from vllm.model_executor.layers.vocab_parallel_embedding import (
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/vocab_parallel_embedding.py", l
ine 137, in <module>
ERROR 01-17 16:36:48 registry.py:296]     def get_masked_input_and_mask(
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1875, in fn
ERROR 01-17 16:36:48 registry.py:296]     return compile(model,
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1893, in compile
ERROR 01-17 16:36:48 registry.py:296]     return torch._dynamo.optimize(backend=backend, nopython=fullgraph, dynamic=dynamic, disable=disable)(model)
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 681, in optimize
ERROR 01-17 16:36:48 registry.py:296]     compiler_config=backend.get_compiler_config()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1734, in get_compiler_config
ERROR 01-17 16:36:48 registry.py:296]     from torch._inductor.compile_fx import get_patched_config_dict
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 40, in <module>
ERROR 01-17 16:36:48 registry.py:296]     from torch._inductor.codecache import code_hash, CompiledFxGraph, FxGraphCache
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2727, in <module>
ERROR 01-17 16:36:48 registry.py:296]     AsyncCompile.warm_pool()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2632, in warm_pool
ERROR 01-17 16:36:48 registry.py:296]     pool._adjust_process_count()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 697, in _adjust_process_count
ERROR 01-17 16:36:48 registry.py:296]     self._spawn_process()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 714, in _spawn_process
ERROR 01-17 16:36:48 registry.py:296]     p.start()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 121, in start
ERROR 01-17 16:36:48 registry.py:296]     self._popen = self._Popen(self)
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/context.py", line 281, in _Popen
ERROR 01-17 16:36:48 registry.py:296]     return Popen(process_obj)
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
ERROR 01-17 16:36:48 registry.py:296]     self._launch(process_obj)
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/popen_fork.py", line 66, in _launch
ERROR 01-17 16:36:48 registry.py:296]     self.pid = os.fork()
ERROR 01-17 16:36:48 registry.py:296] OSError: [Errno 12] Cannot allocate memory
ERROR 01-17 16:36:48 registry.py:296]  
ERROR 01-17 16:36:48 engine.py:381] Model architectures ['OPTForCausalLM'] failed to be inspected. Please check the logs for more details.
ERROR 01-17 16:36:48 engine.py:381] Traceback (most recent call last):
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 372, in run_mp_
engine
ERROR 01-17 16:36:48 engine.py:381]     engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 115, in from_en
gine_args
ERROR 01-17 16:36:48 engine.py:381]     engine_config = engine_args.create_engine_config(usage_context)
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 1043, in create_engine_confi
g
ERROR 01-17 16:36:48 engine.py:381]     model_config = self.create_model_config()
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 969, in create_model_config
ERROR 01-17 16:36:48 engine.py:381]     return ModelConfig(
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 342, in __init__
ERROR 01-17 16:36:48 engine.py:381]     self.multimodal_config = self._init_multimodal_config(
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 402, in _init_multimodal_config
ERROR 01-17 16:36:48 engine.py:381]     if ModelRegistry.is_multimodal_model(architectures):
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 430, in is_mul
timodal_model
ERROR 01-17 16:36:48 engine.py:381]     model_cls, _ = self.inspect_model_cls(architectures)
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 390, in inspec
t_model_cls
ERROR 01-17 16:36:48 engine.py:381]     return self._raise_for_unsupported(architectures)
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 347, in _raise
_for_unsupported
ERROR 01-17 16:36:48 engine.py:381]     raise ValueError(
ERROR 01-17 16:36:48 engine.py:381] ValueError: Model architectures ['OPTForCausalLM'] failed to be inspected. Please check the logs for more details.
Process SpawnProcess-1:
Traceback (most recent call last):
 File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
   self.run()
 File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
   self._target(*self._args, **self._kwargs)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 383, in run_mp_engine
   raise e
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 372, in run_mp_engine
   engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 115, in from_engine_args
   engine_config = engine_args.create_engine_config(usage_context)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 1043, in create_engine_config
   model_config = self.create_model_config()
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 969, in create_model_config
   return ModelConfig(
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 342, in __init__
   self.multimodal_config = self._init_multimodal_config(
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 402, in _init_multimodal_config
   if ModelRegistry.is_multimodal_model(architectures):
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 430, in is_multimodal_model
   model_cls, _ = self.inspect_model_cls(architectures)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 390, in inspect_model_cls
   return self._raise_for_unsupported(architectures)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 347, in _raise_for_unsupported
   raise ValueError(
ValueError: Model architectures ['OPTForCausalLM'] failed to be inspected. Please check the logs for more details.