Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
For example, in the above diagram, to return the feature map from the first stage of the Swin backbone, you can set out_indices=(1,):
from transformers import AutoImageProcessor, AutoBackbone
import torch
from PIL import Image
import requests
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
processor = AutoImageProcessor.from_pretrained("microsoft/swin-tiny-patch4-window7-224")
model = AutoBackbone.from_pretrained("microsoft/swin-tiny-patch4-window7-224", out_indices=(1,))
inputs = processor(image, return_tensors="pt")
outputs = model(**inputs)
feature_maps = outputs.feature_maps
Now you can access the feature_maps object from the first stage of the backbone:
list(feature_maps[0].shape)
[1, 96, 56, 56]
AutoFeatureExtractor
For audio tasks, a feature extractor processes the audio signal the correct input format.