For example, in the above diagram, to return the feature map from the first stage of the Swin backbone, you can set out_indices=(1,): | |
from transformers import AutoImageProcessor, AutoBackbone | |
import torch | |
from PIL import Image | |
import requests | |
url = "http://images.cocodataset.org/val2017/000000039769.jpg" | |
image = Image.open(requests.get(url, stream=True).raw) | |
processor = AutoImageProcessor.from_pretrained("microsoft/swin-tiny-patch4-window7-224") | |
model = AutoBackbone.from_pretrained("microsoft/swin-tiny-patch4-window7-224", out_indices=(1,)) | |
inputs = processor(image, return_tensors="pt") | |
outputs = model(**inputs) | |
feature_maps = outputs.feature_maps | |
Now you can access the feature_maps object from the first stage of the backbone: | |
list(feature_maps[0].shape) | |
[1, 96, 56, 56] | |
AutoFeatureExtractor | |
For audio tasks, a feature extractor processes the audio signal the correct input format. |