Data and AI on Power

 View Only



LinkedIn Share on LinkedIn

Run vLLM on ppc64le Architecture

By Manjunath Kumatagi posted Thu June 27, 2024 06:39 AM

  

Introduction

Large Language Models (LLMs) are revolutionizing various fields, and vLLM emerges as a powerful library for LLM inference and serving. Great news for users with ppc64le hardware! Recent developments (https://github.com/vllm-project/vllm/pull/5652) indicate that vLLM now added support for this architecture. This blog outlines the steps to get started with vLLM on ppc64le.

Build the container image:

$ git clone https://github.com/vllm-project/vllm.git
$ cd vllm/
$ podman build --security-opt label=disable --format docker -t vllm:ppc64le -f Dockerfile.ppc64le .

Using the Built Image:

# listing the images built
$ podman images
REPOSITORY                     TAG         IMAGE ID      CREATED         SIZE
localhost/vllm                 ppc64le     9a44a2021b41  38 minutes ago  4.32 GB
<none>                         <none>      a80bc1de136c  43 minutes ago  2.51 GB
docker.io/mambaorg/micromamba  latest      358d7e727885  9 days ago      137 MB
$
# creating the directory for caching the models from the huggingface
$ mkdir -p ~/.cache/huggingface
$ podman run -ti -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host localhost/vllm:ppc64le

Above command starts the server with the default model(facebook/opt-125m) pulled from hugginface and this server can be queried in the same format as OpenAI API. For example,

List the models:

$ curl http://localhost:8000/v1/models | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   484  100   484    0     0   236k      0 --:--:-- --:--:-- --:--:--  236k
{
  "object": "list",
  "data": [
    {
      "id": "facebook/opt-125m",
      "object": "model",
      "created": 1719484379,
      "owned_by": "vllm",
      "root": "facebook/opt-125m",
      "parent": null,
      "max_model_len": 2048,
      "permission": [
        {
          "id": "modelperm-8ba9ddf949764d359f2db7eb1fa92090",
          "object": "model_permission",
          "created": 1719484379,
          "allow_create_engine": false,
          "allow_sampling": true,
          "allow_logprobs": true,
          "allow_search_indices": false,
          "allow_view": true,
          "allow_fine_tuning": false,
          "organization": "*",
          "group": null,
          "is_blocking": false
        }
      ]
    }
  ]
}

Chat completion:

$ curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "facebook/opt-125m",
        "prompt": "San Francisco is a",
        "max_tokens": 7,
        "temperature": 0
    }' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   443  100   308  100   135    306    134  0:00:01  0:00:01 --:--:--   440
{
  "id": "cmpl-f77c7f6e64df4221836d85d64d28ae04",
  "object": "text_completion",
  "created": 1719484486,
  "model": "facebook/opt-125m",
  "choices": [
    {
      "index": 0,
      "text": " great place to live.  I",
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 12,
    "completion_tokens": 7
  }
}

For more information on the usage please refer the vllm document - https://docs.vllm.ai/en/stable/index.html


#Featured-area-1
#Featured-area-1-home

5 comments
81 views

Permalink

Comments

Tue January 21, 2025 03:26 PM

Have the same problem now as this user on Github 

https://github.com/vllm-project/vllm/issues/11837

ERROR 01-21 20:18:03 engine.py:136] return forward_call(*args, **kwargs)
ERROR 01-21 20:18:03 engine.py:136] file ‘/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/attention/layer.py’, line 161, in forward
ERROR 01-21 20:18:03 engine.py:136] return torch.ops.vllm.unified_attention(
ERROR 01-21 20:18:03 engine.py:136] file ‘/opt/conda/lib/python3.10/site-packages/torch/_ops.py’, line 921, in __getattr__
ERROR 01-21 20:18:03 engine.py:136] raise AttributeError(
ERROR 01-21 20:18:03 engine.py:136] AttributeError: ‘_OpNamespace’ ‘vllm’ object has no attribute ‘unified_attention’
INFO: Shutdown
INFO: Waiting for the application to shut down.
INFO: Application shutdown completed.

(base) [root@andre-infra ~]# curl http://localhost:8000/v1/models | jq
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100   484  100   484    0     0   472k      0 --:--:-- --:--:-- --:--:--  472k
{
 "object": "list",
 "data": [
   {
     "id": "facebook/opt-125m",
     "object": "model",
     "created": 1737490653,
     "owned_by": "vllm",
     "root": "facebook/opt-125m",
     "parent": null,
     "max_model_len": 2048,
     "permission": [
       {
         "id": "modelperm-429ffd8065de4c40acbafac6631800f3",
         "object": "model_permission",
         "created": 1737490653,
         "allow_create_engine": false,
         "allow_sampling": true,
         "allow_logprobs": true,
         "allow_search_indices": false,
         "allow_view": true,
         "allow_fine_tuning": false,
         "organization": "*",
         "group": null,
         "is_blocking": false
       }
     ]
   }
 ]
}
(base) [root@andre-infra ~]# curl http://localhost:8000/v1/completions \
   -H "Content-Type: application/json" \
   -d '{
       "model": "facebook/opt-125m",
       "prompt": "San Francisco is a",
       "max_tokens": 7,
       "temperature": 0
   }' | jq
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100   135    0     0  100   135      0   2177 --:--:-- --:--:-- --:--:--  2177

Sun January 19, 2025 06:55 AM

similar behaviour seen while running on one of the machine with 16GB with default model, when I checked the dmesg and seen following messages:

[  803.573297] __vm_enough_memory: pid: 3672, comm: pt_main_thread, not enough memory for the allocation
[  809.613661] __vm_enough_memory: pid: 3678, comm: pt_main_thread, not enough memory for the allocation

Tried the following workaround and able to run the model on the same machine without any issues.

sysctl -w vm.overcommit_memory=1

Note: this may unblock you for the testing if same error but not really sure about its side effect.

Fri January 17, 2025 02:16 PM

How much memory did your Lpar have?

Fri January 17, 2025 12:43 PM

OSError: [Errno 12] Cannot allocate memory

I see above error in the trace, wondering if you can bump the memory for the lpar and retry?

Fri January 17, 2025 11:40 AM

Hello,

I get these errors . 
Can you help me with this? 

[root@andre-infra vllm]# podman run -ti -v ~/.ollama/vllm/ -p 8000:8000 --ipc=host localhost/vllm:ppc64le
/opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1
017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libj
peg` or `libpng` installed before building `torchvision` from source?
 warn(
INFO 01-17 16:36:29 __init__.py:179] Automatically detected platform cpu.
INFO 01-17 16:36:31 api_server.py:768] vLLM API server version 0.6.6.post2.dev256+g87a0c076
INFO 01-17 16:36:31 api_server.py:769] args: Namespace(host=None, port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=No
ne, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_
path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='
', model='facebook/opt-125m', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local
_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_back
end='xgrammar', logits_processor_pattern=None, distributed_executor_backend=None, worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=1, max_parallel_loading_workers=None, ray_workers_use_ns
ight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, nu
m_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=F
alse, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, dis
able_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, f
ully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, ena
ble_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculat
ive_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_pos
terior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name
=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config
=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', generation_config=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=Fals
e)
INFO 01-17 16:36:31 api_server.py:195] Started engine process with PID 5
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 651/651 [00:00<00:00, 6.39MB/s]
INFO 01-17 16:36:32 config.py:2273] For POWERPC, we cast models to bfloat16 instead of using float16 by default. Float16 is not currently supported for POWERPC.
WARNING 01-17 16:36:32 config.py:2317] Casting torch.float16 to torch.bfloat16.
/opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1
017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libj
peg` or `libpng` installed before building `torchvision` from source?
 warn(
INFO 01-17 16:36:37 __init__.py:179] Automatically detected platform cpu.
INFO 01-17 16:36:40 config.py:2273] For POWERPC, we cast models to bfloat16 instead of using float16 by default. Float16 is not currently supported for POWERPC.
WARNING 01-17 16:36:40 config.py:2317] Casting torch.float16 to torch.bfloat16.
ERROR 01-17 16:36:40 registry.py:296] Error in inspecting model architecture 'OPTForCausalLM'
ERROR 01-17 16:36:40 registry.py:296] Traceback (most recent call last):
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 491, in _run
_in_subprocess
ERROR 01-17 16:36:40 registry.py:296]     returned.check_returncode()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/subprocess.py", line 457, in check_returncode
ERROR 01-17 16:36:40 registry.py:296]     raise CalledProcessError(self.returncode, self.args, self.stdout,
ERROR 01-17 16:36:40 registry.py:296] subprocess.CalledProcessError: Command '['/opt/conda/bin/python3', '-m', 'vllm.model_executor.models.registry']' returned non-zero exit status 1.
ERROR 01-17 16:36:40 registry.py:296]  
ERROR 01-17 16:36:40 registry.py:296] The above exception was the direct cause of the following exception:
ERROR 01-17 16:36:40 registry.py:296]  
ERROR 01-17 16:36:40 registry.py:296] Traceback (most recent call last):
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 294, in _try
_inspect_model_cls
ERROR 01-17 16:36:40 registry.py:296]     return model.inspect_model_cls()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 265, in insp
ect_model_cls
ERROR 01-17 16:36:40 registry.py:296]     return _run_in_subprocess(
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 494, in _run
_in_subprocess
ERROR 01-17 16:36:40 registry.py:296]     raise RuntimeError(f"Error raised in subprocess:\n"
ERROR 01-17 16:36:40 registry.py:296] RuntimeError: Error raised in subprocess:
ERROR 01-17 16:36:40 registry.py:296] /opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvis
ion/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong wit
h your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
ERROR 01-17 16:36:40 registry.py:296]   warn(
ERROR 01-17 16:36:40 registry.py:296] /opt/conda/lib/python3.10/runpy.py:126: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models',
but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
ERROR 01-17 16:36:40 registry.py:296]   warn(RuntimeWarning(msg))
ERROR 01-17 16:36:40 registry.py:296] Traceback (most recent call last):
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
ERROR 01-17 16:36:40 registry.py:296]     return _run_code(code, main_globals, None,
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
ERROR 01-17 16:36:40 registry.py:296]     exec(code, run_globals)
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 515, in <mod
ule>
ERROR 01-17 16:36:40 registry.py:296]     _run()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 508, in _run
ERROR 01-17 16:36:40 registry.py:296]     result = fn()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 266, in <lam
bda>
ERROR 01-17 16:36:40 registry.py:296]     lambda: _ModelInfo.from_model_cls(self.load_model_cls()))
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 269, in load
_model_cls
ERROR 01-17 16:36:40 registry.py:296]     mod = importlib.import_module(self.module_name)
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/importlib/__init__.py", line 126, in import_module
ERROR 01-17 16:36:40 registry.py:296]     return _bootstrap._gcd_import(name[level:], package, level)
ERROR 01-17 16:36:40 registry.py:296]   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
ERROR 01-17 16:36:40 registry.py:296]   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
ERROR 01-17 16:36:40 registry.py:296]   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
ERROR 01-17 16:36:40 registry.py:296]   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
ERROR 01-17 16:36:40 registry.py:296]   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
ERROR 01-17 16:36:40 registry.py:296]   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/opt.py", line 34, in <module>
ERROR 01-17 16:36:40 registry.py:296]     from vllm.model_executor.layers.logits_processor import LogitsProcessor
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/logits_processor.py", line 12,
in <module>
ERROR 01-17 16:36:40 registry.py:296]     from vllm.model_executor.layers.vocab_parallel_embedding import (
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/vocab_parallel_embedding.py", l
ine 137, in <module>
ERROR 01-17 16:36:40 registry.py:296]     def get_masked_input_and_mask(
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1875, in fn
ERROR 01-17 16:36:40 registry.py:296]     return compile(model,
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1893, in compile
ERROR 01-17 16:36:40 registry.py:296]     return torch._dynamo.optimize(backend=backend, nopython=fullgraph, dynamic=dynamic, disable=disable)(model)
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 681, in optimize
ERROR 01-17 16:36:40 registry.py:296]     compiler_config=backend.get_compiler_config()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1734, in get_compiler_config
ERROR 01-17 16:36:40 registry.py:296]     from torch._inductor.compile_fx import get_patched_config_dict
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 40, in <module>
ERROR 01-17 16:36:40 registry.py:296]     from torch._inductor.codecache import code_hash, CompiledFxGraph, FxGraphCache
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2727, in <module>
ERROR 01-17 16:36:40 registry.py:296]     AsyncCompile.warm_pool()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2632, in warm_pool
ERROR 01-17 16:36:40 registry.py:296]     pool._adjust_process_count()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 697, in _adjust_process_count
ERROR 01-17 16:36:40 registry.py:296]     self._spawn_process()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 714, in _spawn_process
ERROR 01-17 16:36:40 registry.py:296]     p.start()
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 121, in start
ERROR 01-17 16:36:40 registry.py:296]     self._popen = self._Popen(self)
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/context.py", line 281, in _Popen
ERROR 01-17 16:36:40 registry.py:296]     return Popen(process_obj)
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
ERROR 01-17 16:36:40 registry.py:296]     self._launch(process_obj)
ERROR 01-17 16:36:40 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/popen_fork.py", line 66, in _launch
ERROR 01-17 16:36:40 registry.py:296]     self.pid = os.fork()
ERROR 01-17 16:36:40 registry.py:296] OSError: [Errno 12] Cannot allocate memory
ERROR 01-17 16:36:40 registry.py:296]  
Traceback (most recent call last):
 File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
   return _run_code(code, main_globals, None,
 File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
   exec(code, run_globals)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/entrypoints/openai/api_server.py", line 832, in <module>
   uvloop.run(run_server(args))
 File "/opt/conda/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
   return loop.run_until_complete(wrapper())
 File "uvloop/loop.pyx", line 1517, in uvloop.loop.Loop.run_until_complete
 File "/opt/conda/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
   return await main
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/entrypoints/openai/api_server.py", line 796, in run_server
   async with build_async_engine_client(args) as engine_client:
 File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__
   return await anext(self.gen)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/entrypoints/openai/api_server.py", line 125, in build_async_engine_client
   async with build_async_engine_client_from_engine_args(
 File "/opt/conda/lib/python3.10/contextlib.py", line 199, in __aenter__
   return await anext(self.gen)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/entrypoints/openai/api_server.py", line 206, in build_async_engine_client_from_engine_args
   engine_config = engine_args.create_engine_config()
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 1043, in create_engine_config
   model_config = self.create_model_config()
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 969, in create_model_config
   return ModelConfig(
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 342, in __init__
   self.multimodal_config = self._init_multimodal_config(
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 402, in _init_multimodal_config
   if ModelRegistry.is_multimodal_model(architectures):
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 430, in is_multimodal_model
   model_cls, _ = self.inspect_model_cls(architectures)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 390, in inspect_model_cls
   return self._raise_for_unsupported(architectures)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 347, in _raise_for_unsupported
   raise ValueError(
ValueError: Model architectures ['OPTForCausalLM'] failed to be inspected. Please check the logs for more details.
ERROR 01-17 16:36:48 registry.py:296] Error in inspecting model architecture 'OPTForCausalLM'
ERROR 01-17 16:36:48 registry.py:296] Traceback (most recent call last):
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 491, in _run
_in_subprocess
ERROR 01-17 16:36:48 registry.py:296]     returned.check_returncode()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/subprocess.py", line 457, in check_returncode
ERROR 01-17 16:36:48 registry.py:296]     raise CalledProcessError(self.returncode, self.args, self.stdout,
ERROR 01-17 16:36:48 registry.py:296] subprocess.CalledProcessError: Command '['/opt/conda/bin/python3', '-m', 'vllm.model_executor.models.registry']' returned non-zero exit status 1.
ERROR 01-17 16:36:48 registry.py:296]  
ERROR 01-17 16:36:48 registry.py:296] The above exception was the direct cause of the following exception:
ERROR 01-17 16:36:48 registry.py:296]  
ERROR 01-17 16:36:48 registry.py:296] Traceback (most recent call last):
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 294, in _try
_inspect_model_cls
ERROR 01-17 16:36:48 registry.py:296]     return model.inspect_model_cls()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 265, in insp
ect_model_cls
ERROR 01-17 16:36:48 registry.py:296]     return _run_in_subprocess(
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 494, in _run
_in_subprocess
ERROR 01-17 16:36:48 registry.py:296]     raise RuntimeError(f"Error raised in subprocess:\n"
ERROR 01-17 16:36:48 registry.py:296] RuntimeError: Error raised in subprocess:
ERROR 01-17 16:36:48 registry.py:296] /opt/conda/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/opt/conda/lib/python3.10/site-packages/torchvis
ion/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong wit
h your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
ERROR 01-17 16:36:48 registry.py:296]   warn(
ERROR 01-17 16:36:48 registry.py:296] /opt/conda/lib/python3.10/runpy.py:126: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models',
but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
ERROR 01-17 16:36:48 registry.py:296]   warn(RuntimeWarning(msg))
ERROR 01-17 16:36:48 registry.py:296] Traceback (most recent call last):
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
ERROR 01-17 16:36:48 registry.py:296]     return _run_code(code, main_globals, None,
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
ERROR 01-17 16:36:48 registry.py:296]     exec(code, run_globals)
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 515, in <mod
ule>
ERROR 01-17 16:36:48 registry.py:296]     _run()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 508, in _run
ERROR 01-17 16:36:48 registry.py:296]     result = fn()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 266, in <lam
bda>
ERROR 01-17 16:36:48 registry.py:296]     lambda: _ModelInfo.from_model_cls(self.load_model_cls()))
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 269, in load
_model_cls
ERROR 01-17 16:36:48 registry.py:296]     mod = importlib.import_module(self.module_name)
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/importlib/__init__.py", line 126, in import_module
ERROR 01-17 16:36:48 registry.py:296]     return _bootstrap._gcd_import(name[level:], package, level)
ERROR 01-17 16:36:48 registry.py:296]   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
ERROR 01-17 16:36:48 registry.py:296]   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
ERROR 01-17 16:36:48 registry.py:296]   File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
ERROR 01-17 16:36:48 registry.py:296]   File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
ERROR 01-17 16:36:48 registry.py:296]   File "<frozen importlib._bootstrap_external>", line 883, in exec_module
ERROR 01-17 16:36:48 registry.py:296]   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/opt.py", line 34, in <module>
ERROR 01-17 16:36:48 registry.py:296]     from vllm.model_executor.layers.logits_processor import LogitsProcessor
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/logits_processor.py", line 12,
in <module>
ERROR 01-17 16:36:48 registry.py:296]     from vllm.model_executor.layers.vocab_parallel_embedding import (
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/layers/vocab_parallel_embedding.py", l
ine 137, in <module>
ERROR 01-17 16:36:48 registry.py:296]     def get_masked_input_and_mask(
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1875, in fn
ERROR 01-17 16:36:48 registry.py:296]     return compile(model,
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1893, in compile
ERROR 01-17 16:36:48 registry.py:296]     return torch._dynamo.optimize(backend=backend, nopython=fullgraph, dynamic=dynamic, disable=disable)(model)
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 681, in optimize
ERROR 01-17 16:36:48 registry.py:296]     compiler_config=backend.get_compiler_config()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/__init__.py", line 1734, in get_compiler_config
ERROR 01-17 16:36:48 registry.py:296]     from torch._inductor.compile_fx import get_patched_config_dict
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 40, in <module>
ERROR 01-17 16:36:48 registry.py:296]     from torch._inductor.codecache import code_hash, CompiledFxGraph, FxGraphCache
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2727, in <module>
ERROR 01-17 16:36:48 registry.py:296]     AsyncCompile.warm_pool()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/site-packages/torch/_inductor/codecache.py", line 2632, in warm_pool
ERROR 01-17 16:36:48 registry.py:296]     pool._adjust_process_count()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 697, in _adjust_process_count
ERROR 01-17 16:36:48 registry.py:296]     self._spawn_process()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/concurrent/futures/process.py", line 714, in _spawn_process
ERROR 01-17 16:36:48 registry.py:296]     p.start()
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 121, in start
ERROR 01-17 16:36:48 registry.py:296]     self._popen = self._Popen(self)
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/context.py", line 281, in _Popen
ERROR 01-17 16:36:48 registry.py:296]     return Popen(process_obj)
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
ERROR 01-17 16:36:48 registry.py:296]     self._launch(process_obj)
ERROR 01-17 16:36:48 registry.py:296]   File "/opt/conda/lib/python3.10/multiprocessing/popen_fork.py", line 66, in _launch
ERROR 01-17 16:36:48 registry.py:296]     self.pid = os.fork()
ERROR 01-17 16:36:48 registry.py:296] OSError: [Errno 12] Cannot allocate memory
ERROR 01-17 16:36:48 registry.py:296]  
ERROR 01-17 16:36:48 engine.py:381] Model architectures ['OPTForCausalLM'] failed to be inspected. Please check the logs for more details.
ERROR 01-17 16:36:48 engine.py:381] Traceback (most recent call last):
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 372, in run_mp_
engine
ERROR 01-17 16:36:48 engine.py:381]     engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 115, in from_en
gine_args
ERROR 01-17 16:36:48 engine.py:381]     engine_config = engine_args.create_engine_config(usage_context)
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 1043, in create_engine_confi
g
ERROR 01-17 16:36:48 engine.py:381]     model_config = self.create_model_config()
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 969, in create_model_config
ERROR 01-17 16:36:48 engine.py:381]     return ModelConfig(
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 342, in __init__
ERROR 01-17 16:36:48 engine.py:381]     self.multimodal_config = self._init_multimodal_config(
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 402, in _init_multimodal_config
ERROR 01-17 16:36:48 engine.py:381]     if ModelRegistry.is_multimodal_model(architectures):
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 430, in is_mul
timodal_model
ERROR 01-17 16:36:48 engine.py:381]     model_cls, _ = self.inspect_model_cls(architectures)
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 390, in inspec
t_model_cls
ERROR 01-17 16:36:48 engine.py:381]     return self._raise_for_unsupported(architectures)
ERROR 01-17 16:36:48 engine.py:381]   File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 347, in _raise
_for_unsupported
ERROR 01-17 16:36:48 engine.py:381]     raise ValueError(
ERROR 01-17 16:36:48 engine.py:381] ValueError: Model architectures ['OPTForCausalLM'] failed to be inspected. Please check the logs for more details.
Process SpawnProcess-1:
Traceback (most recent call last):
 File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
   self.run()
 File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
   self._target(*self._args, **self._kwargs)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 383, in run_mp_engine
   raise e
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 372, in run_mp_engine
   engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/multiprocessing/engine.py", line 115, in from_engine_args
   engine_config = engine_args.create_engine_config(usage_context)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 1043, in create_engine_config
   model_config = self.create_model_config()
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/engine/arg_utils.py", line 969, in create_model_config
   return ModelConfig(
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 342, in __init__
   self.multimodal_config = self._init_multimodal_config(
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/config.py", line 402, in _init_multimodal_config
   if ModelRegistry.is_multimodal_model(architectures):
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 430, in is_multimodal_model
   model_cls, _ = self.inspect_model_cls(architectures)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 390, in inspect_model_cls
   return self._raise_for_unsupported(architectures)
 File "/opt/conda/lib/python3.10/site-packages/vllm-0.6.6.post2.dev256+g87a0c076.cpu-py3.10-linux-ppc64le.egg/vllm/model_executor/models/registry.py", line 347, in _raise_for_unsupported
   raise ValueError(
ValueError: Model architectures ['OPTForCausalLM'] failed to be inspected. Please check the logs for more details.