64fa37e7709be3ff68986e7f
Discord
Public  ·  
March 3, 2024
Discord
Public  ·  
March 3, 2024
Posted by 
ferret ai/official community
  •  
March 3, 2024
 at 
1:50

Discord

64f616bc373ce02ecde195cf
Contact Us
Public  ·  
March 3, 2024
Contact Us
Public  ·  
March 3, 2024
Posted by 
ferret ai/official community
  •  
March 3, 2024
 at 
1:50

Contact Us

637a9e89389f7a5bb5a99f24
What is Apple's "Ferret AI"?
Public  ·  
March 3, 2024

Apple's AI:

Ferret: Refer and Ground Anything Anywhere at Any Granularity

An End-to-End MLLM that Accepts Any-Form Referring and Ground Anything in Response.

"We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions. To unify referring and grounding in the LLM paradigm, Ferret employs a novel and powerful hybrid region representation that integrates discrete coordinates and continuous features jointly to represent a region in the image. To extract the continuous features of versatile regions, we propose a spatial-aware visual sampler, adept at handling varying sparsity across different shapes. Consequently, Ferret can accept diverse region inputs, such as points, bounding boxes, and free-form shapes. To bolster the desired capability of Ferret, we curate GRIT, a comprehensive refer-and-ground instruction tuning dataset including 1.1M samples that contain rich hierarchical spatial knowledge, with 95K hard negative data to promote model robustness. The resulting model not only achieves superior performance in classical referring and grounding tasks, but also greatly outperforms existing MLLMs in region-based and localization-demanded multimodal chatting. Our evaluations also reveal a significantly improved capability of describing image details and a remarkable alleviation in object hallucination."

Overview:

Key Contributions:

  • Ferret Model - Hybrid Region Representation + Spatial-aware Visual Sampler enable fine-grained and open-vocabulary referring and grounding in MLLM.
  • GRIT Dataset (~1.1M) - A Large-scale, Hierarchical, Robust ground-and-refer instruction tuning dataset.
  • Ferret-Bench - A multimodal evaluation benchmark that jointly requires Referring/Grounding, Semantics, Knowledge, and Reasoning.


Learn More On The Github: https://github.com/apple/ml-ferret

What is Apple's "Ferret AI"?
Public  ·  
March 3, 2024
Posted by 
ferret ai/official community
  •  
March 3, 2024
 at 
1:50

What is Apple's "Ferret AI"?

Apple's AI:

Ferret: Refer and Ground Anything Anywhere at Any Granularity

An End-to-End MLLM that Accepts Any-Form Referring and Ground Anything in Response.

"We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions. To unify referring and grounding in the LLM paradigm, Ferret employs a novel and powerful hybrid region representation that integrates discrete coordinates and continuous features jointly to represent a region in the image. To extract the continuous features of versatile regions, we propose a spatial-aware visual sampler, adept at handling varying sparsity across different shapes. Consequently, Ferret can accept diverse region inputs, such as points, bounding boxes, and free-form shapes. To bolster the desired capability of Ferret, we curate GRIT, a comprehensive refer-and-ground instruction tuning dataset including 1.1M samples that contain rich hierarchical spatial knowledge, with 95K hard negative data to promote model robustness. The resulting model not only achieves superior performance in classical referring and grounding tasks, but also greatly outperforms existing MLLMs in region-based and localization-demanded multimodal chatting. Our evaluations also reveal a significantly improved capability of describing image details and a remarkable alleviation in object hallucination."

Overview:

Key Contributions:

  • Ferret Model - Hybrid Region Representation + Spatial-aware Visual Sampler enable fine-grained and open-vocabulary referring and grounding in MLLM.
  • GRIT Dataset (~1.1M) - A Large-scale, Hierarchical, Robust ground-and-refer instruction tuning dataset.
  • Ferret-Bench - A multimodal evaluation benchmark that jointly requires Referring/Grounding, Semantics, Knowledge, and Reasoning.


Learn More On The Github: https://github.com/apple/ml-ferret

64f6f69f17f469cd79a1bb43
Flooz - Buy Ferret With Apple Pay!
Public  ·  
March 3, 2024
Flooz - Buy Ferret With Apple Pay!
Public  ·  
March 3, 2024
Posted by 
ferret ai/official community
  •  
March 3, 2024
 at 
1:50

Flooz - Buy Ferret With Apple Pay!

64fa37cd249c73c09a513115
Exchanges
Public  ·  
March 3, 2024

Poloniex

Poloniex Exchange (@Poloniex) / X

Bitmart


Exchanges
Public  ·  
March 3, 2024
Posted by 
ferret ai/official community
  •  
March 3, 2024
 at 
1:50

Exchanges

Poloniex

Poloniex Exchange (@Poloniex) / X

Bitmart


637beb4ef68e5a613e1f9585
Twitter
Public  ·  
March 3, 2024
Twitter
Public  ·  
March 3, 2024
Posted by 
ferret ai/official community
  •  
March 3, 2024
 at 
1:50

Twitter

638545d1616184386799241a
Telegram
Public  ·  
March 3, 2024
Telegram
Public  ·  
March 3, 2024
Posted by 
ferret ai/official community
  •  
March 3, 2024
 at 
1:50

Telegram

64f37f6280ec80e9645f7126
Dex
Public  ·  
March 3, 2024
Posted by 
ferret ai/official community
  •  
March 3, 2024
 at 
1:50

Dex

0
Notes
Ferret AI
"Apple researchers quietly revealed ‘Ferret’. Ferret is a new open-source multimodal LLM that can use regions of images for queries.With this, and the recent rumors of an upgraded version of Siri at WWDC, I wouldn’t count Apple out of the AI race."

- Brett Adcock (@adcock_brett)

Contract Address:
0xBcBDA13Bd60bC0e91745186E274D1445078D6b33