--- tags: - bertopic library_name: bertopic pipeline_tag: text-classification --- # cnn_dailymail_108_3000_1500_train This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. ## Usage To use this model, please install BERTopic: ``` pip install -U bertopic ``` You can use the model as follows: ```python from bertopic import BERTopic topic_model = BERTopic.load("KingKazma/cnn_dailymail_108_3000_1500_train") topic_model.get_topic_info() ``` ## Topic overview * Number of topics: 51 * Number of training documents: 3000
Click here for an overview of all topics. | Topic ID | Topic Keywords | Topic Frequency | Label | |----------|----------------|-----------------|-------| | -1 | said - one - people - year - would | 10 | -1_said_one_people_year | | 0 | league - player - cup - club - game | 954 | 0_league_player_cup_club | | 1 | police - said - court - told - murder | 308 | 1_police_said_court_told | | 2 | dog - animal - cat - elephant - zoo | 290 | 2_dog_animal_cat_elephant | | 3 | mr - minister - labour - cameron - prime | 113 | 3_mr_minister_labour_cameron | | 4 | obama - clinton - president - republican - campaign | 104 | 4_obama_clinton_president_republican | | 5 | school - teacher - student - nfl - said | 84 | 5_school_teacher_student_nfl | | 6 | food - milk - drink - wine - bottle | 72 | 6_food_milk_drink_wine | | 7 | flight - plane - passenger - pilot - aircraft | 49 | 7_flight_plane_passenger_pilot | | 8 | user - facebook - google - ipad - device | 48 | 8_user_facebook_google_ipad | | 9 | olympic - gold - race - games - medal | 46 | 9_olympic_gold_race_games | | 10 | doll - dress - fashion - look - style | 44 | 10_doll_dress_fashion_look | | 11 | afghan - afghanistan - taliban - military - pakistan | 43 | 11_afghan_afghanistan_taliban_military | | 12 | transplant - patient - heart - hospital - cancer | 42 | 12_transplant_patient_heart_hospital | | 13 | iran - syrian - said - president - egypt | 42 | 13_iran_syrian_said_president | | 14 | show - film - million - like - movie | 39 | 14_show_film_million_like | | 15 | property - house - price - home - apartment | 38 | 15_property_house_price_home | | 16 | earth - asteroid - moon - volcano - planet | 34 | 16_earth_asteroid_moon_volcano | | 17 | federer - djokovic - match - murray - seed | 33 | 17_federer_djokovic_match_murray | | 18 | jackson - jacksons - album - song - music | 31 | 18_jackson_jacksons_album_song | | 19 | ship - boat - coast - said - vessel | 30 | 19_ship_boat_coast_said | | 20 | russia - russian - putin - ukraine - moscow | 30 | 20_russia_russian_putin_ukraine | | 21 | snow - weather - temperature - climate - water | 29 | 21_snow_weather_temperature_climate | | 22 | police - station - mr - man - gang | 28 | 22_police_station_mr_man | | 23 | ebola - disease - vaccine - virus - health | 28 | 23_ebola_disease_vaccine_virus | | 24 | weight - fat - diet - burn - exercise | 28 | 24_weight_fat_diet_burn | | 25 | syria - isis - islamic - muslims - alqudsi | 23 | 25_syria_isis_islamic_muslims | | 26 | boko - haram - nigeria - nigerian - turkana | 23 | 26_boko_haram_nigeria_nigerian | | 27 | korea - north - korean - kim - pyongyang | 22 | 27_korea_north_korean_kim | | 28 | driver - driving - road - car - speed | 22 | 28_driver_driving_road_car | | 29 | school - child - education - internet - english | 21 | 29_school_child_education_internet | | 30 | mcilroy - woods - pga - tournament - round | 20 | 30_mcilroy_woods_pga_tournament | | 31 | race - car - driver - team - f1 | 19 | 31_race_car_driver_team | | 32 | princess - prince - diana - royal - palace | 18 | 32_princess_prince_diana_royal | | 33 | climbing - climb - mountain - everest - ang | 18 | 33_climbing_climb_mountain_everest | | 34 | wedding - bieber - couple - together - love | 18 | 34_wedding_bieber_couple_together | | 35 | nhs - care - patient - hospital - health | 17 | 35_nhs_care_patient_hospital | | 36 | iraq - iraqi - isis - baghdad - kurdish | 16 | 36_iraq_iraqi_isis_baghdad | | 37 | cartel - drug - mexican - mexico - crack | 15 | 37_cartel_drug_mexican_mexico | | 38 | painting - picasso - art - artist - gogh | 15 | 38_painting_picasso_art_artist | | 39 | castro - zelaya - fidel - micheletti - president | 14 | 39_castro_zelaya_fidel_micheletti | | 40 | french - ford - traveller - southampton - taxi | 14 | 40_french_ford_traveller_southampton | | 41 | fire - florissant - bell - firefighter - burned | 14 | 41_fire_florissant_bell_firefighter | | 42 | fight - ali - heavyweight - pacquiao - title | 13 | 42_fight_ali_heavyweight_pacquiao | | 43 | fish - sea - jellyfish - manta - swell | 13 | 43_fish_sea_jellyfish_manta | | 44 | pope - francis - vatican - falkland - islands | 12 | 44_pope_francis_vatican_falkland | | 45 | gay - samesex - lgbt - marriage - state | 12 | 45_gay_samesex_lgbt_marriage | | 46 | castle - tower - building - brent - lego | 12 | 46_castle_tower_building_brent | | 47 | chinese - china - xinhua - chinas - communist | 12 | 47_chinese_china_xinhua_chinas | | 48 | delivery - customer - market - vacuum - coin | 10 | 48_delivery_customer_market_vacuum | | 49 | water - rain - storm - flooding - methane | 10 | 49_water_rain_storm_flooding |
## Training hyperparameters * calculate_probabilities: True * language: english * low_memory: False * min_topic_size: 10 * n_gram_range: (1, 1) * nr_topics: None * seed_topic_list: None * top_n_words: 10 * verbose: False ## Framework versions * Numpy: 1.22.4 * HDBSCAN: 0.8.33 * UMAP: 0.5.3 * Pandas: 1.5.3 * Scikit-Learn: 1.2.2 * Sentence-transformers: 2.2.2 * Transformers: 4.31.0 * Numba: 0.56.4 * Plotly: 5.13.1 * Python: 3.10.6