File size: 222,035 Bytes
7f7b773
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from typing import Iterable, Iterator\n",
    "from langchain.docstore.document import Document\n",
    "from langchain.embeddings import HuggingFaceEmbeddings\n",
    "\n",
    "model_name = \"sentence-transformers/all-mpnet-base-v2\"\n",
    "model_kwargs = {'device': 'cpu'}\n",
    "encode_kwargs = {'normalize_embeddings': False}\n",
    "model = HuggingFaceEmbeddings(\n",
    "    model_name=model_name,\n",
    "    model_kwargs=model_kwargs,\n",
    "    encode_kwargs=encode_kwargs\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain.vectorstores import FAISS\n",
    "path = \"/data/tommaso/llm4scilit/data/vector_store\"\n",
    "db = FAISS.load_local(path, model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(page_content='These serum proteins have strong potential to serve as diagnostic and prognostic biomarkers of RA and can also be evaluated to fill the gaps in the current knowledge of pathogenesis of RA.These\\n\\nfindings can be validated in larger cohorts from different populations to identify diagnostic and prognostic biomarkers of RA.', metadata={'text': 'RA is a complex disease that is influenced by an intricate interactome of various environmental, genetic and microbial factors that influence the immune homeostasis.Owing to the complex genetic architecture accompanied by a plethora of microbial and environmental triggers that an organism is exposed to this has made the identification of diagnostic and prognostic markers challenging.Our study has explored the serum proteomics of this complex autoimmune disorder in a relatively understudied Pakistani population to identify disease biomarkers that are DE among various serotypes of RA patients and healthy controls.We identified that PZP, SELENOP, C4BP beta chain, ApoM, NAMLAA, CPN catalytic chain, OIT3, CPN subunit 2, ApoC1 and ApoCIII were DE between the RA patients and healthy controls.These serum proteins have strong potential to serve as diagnostic and prognostic biomarkers of RA and can also be evaluated to fill the gaps in the current knowledge of pathogenesis of RA.These findings can be validated in larger cohorts from different populations to identify diagnostic and prognostic biomarkers of RA.', 'para': '5', 'bboxes': \"[[{'page': '15', 'x': '187.65', 'y': '173.66', 'h': '371.62', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '186.22', 'h': '394.62', 'w': '9.58'}], [{'page': '15', 'x': '166.39', 'y': '198.77', 'h': '392.88', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '211.32', 'h': '392.88', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '223.88', 'h': '229.10', 'w': '9.58'}], [{'page': '15', 'x': '401.31', 'y': '223.88', 'h': '157.97', 'w': '9.58'}, {'page': '15', 'x': '166.10', 'y': '236.43', 'h': '393.18', 'w': '9.58'}, {'page': '15', 'x': '166.10', 'y': '248.98', 'h': '393.57', 'w': '9.58'}, {'page': '15', 'x': '166.10', 'y': '261.54', 'h': '130.46', 'w': '9.58'}], [{'page': '15', 'x': '299.65', 'y': '261.54', 'h': '260.87', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '274.09', 'h': '392.88', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '286.64', 'h': '201.22', 'w': '9.58'}], [{'page': '15', 'x': '370.71', 'y': '286.64', 'h': '188.57', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '299.19', 'h': '392.88', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '311.75', 'h': '238.67', 'w': '9.58'}], [{'page': '15', 'x': '407.54', 'y': '311.75', 'h': '151.74', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '324.30', 'h': '392.88', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '336.85', 'h': '28.14', 'w': '9.58'}]]\", 'pages': \"('15', '15')\", 'section_title': 'Conclusions', 'section_number': '5.', 'paper_title': 'LC-MS/MS-Based Serum Protein Profiling for Identification of Candidate Biomarkers in Pakistani Rheumatoid Arthritis Patients', 'file_path': '/data/tommaso/llm4scilit/data/papers/1.pdf'}),\n",
       " Document(page_content='Rheumatoid factor (RF) and anti-citrullinated peptide antibodies (ACPA) are considered as the main serological markers for RA that have been included in the 2010 American College of Rheumatology (ACR)/European League against Rheumatism (EULAR) classification criteria for RA [7][8][9].Based on 2010 ACR/EULAR classification criteria for RA, clinically diagnosed RA patients can be categorized into four serotypes: (i) positive for both RF and ACPA, (ii) positive for RF and negative for ACPA, (iii) negative for RF and positive for ACPA and (iv) negative for both RF and ACPA.However\\n\\n, the levels of RF are also perturbed in connective tissue diseases [10] and some chronic infectious diseases such as hepatitis B and hepatitis C virus infections [11].RF\\n\\nis thus not a specific diagnostic marker for', metadata={'text': 'Rheumatoid factor (RF) and anti-citrullinated peptide antibodies (ACPA) are considered as the main serological markers for RA that have been included in the 2010 American College of Rheumatology (ACR)/European League against Rheumatism (EULAR) classification criteria for RA [7][8][9].Based on 2010 ACR/EULAR classification criteria for RA, clinically diagnosed RA patients can be categorized into four serotypes: (i) positive for both RF and ACPA, (ii) positive for RF and negative for ACPA, (iii) negative for RF and positive for ACPA and (iv) negative for both RF and ACPA.However, the levels of RF are also perturbed in connective tissue diseases [10] and some chronic infectious diseases such as hepatitis B and hepatitis C virus infections [11].RF is thus not a specific diagnostic marker for RA.ACPA is comparatively a more specific biomarker and two-thirds of the individuals ultimately diagnosed with RA were tested positive for ACPAs 6-10 years before diagnosis [12,13].A total of 1-3% of the healthy population may also test positive for ACPAs suggesting the decreased specificity of this biomarker [14][15][16][17].Therefore, it is important to discover the biomarkers for the diagnosis of RA with both increased sensitivity and specificity.', 'para': '6', 'bboxes': \"[[{'page': '2', 'x': '187.65', 'y': '223.58', 'h': '373.27', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '236.13', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '248.68', 'h': '394.53', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '261.24', 'h': '133.81', 'w': '9.58'}], [{'page': '2', 'x': '303.29', 'y': '261.24', 'h': '257.23', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '273.79', 'h': '393.08', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '286.34', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.10', 'y': '298.90', 'h': '272.66', 'w': '9.58'}], [{'page': '2', 'x': '441.85', 'y': '298.90', 'h': '117.43', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '311.45', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '324.00', 'h': '240.16', 'w': '9.58'}], [{'page': '2', 'x': '409.64', 'y': '324.00', 'h': '149.63', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '336.55', 'h': '67.99', 'w': '9.58'}], [{'page': '2', 'x': '236.99', 'y': '336.55', 'h': '322.28', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '349.11', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '361.66', 'h': '107.38', 'w': '9.58'}], [{'page': '2', 'x': '276.86', 'y': '361.66', 'h': '282.42', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '374.21', 'h': '325.69', 'w': '9.58'}], [{'page': '2', 'x': '495.20', 'y': '374.21', 'h': '64.08', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '386.77', 'h': '393.27', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '399.32', 'h': '65.18', 'w': '9.58'}]]\", 'pages': \"('2', '2')\", 'section_title': 'Introduction', 'section_number': '1.', 'paper_title': 'LC-MS/MS-Based Serum Protein Profiling for Identification of Candidate Biomarkers in Pakistani Rheumatoid Arthritis Patients', 'file_path': '/data/tommaso/llm4scilit/data/papers/1.pdf'}),\n",
       " Document(page_content='is thus not a specific diagnostic marker for\\n\\nRA.ACPA is comparatively a more specific biomarker and two-thirds of the individuals ultimately diagnosed with RA were tested positive for ACPAs 6-10 years before diagnosis [12,13].A total of 1-3% of the healthy population may also test positive for ACPAs suggesting the decreased specificity of this biomarker [14][15][16][17].Therefore\\n\\n, it is important to discover the biomarkers for the diagnosis of RA with both increased sensitivity and specificity.', metadata={'text': 'Rheumatoid factor (RF) and anti-citrullinated peptide antibodies (ACPA) are considered as the main serological markers for RA that have been included in the 2010 American College of Rheumatology (ACR)/European League against Rheumatism (EULAR) classification criteria for RA [7][8][9].Based on 2010 ACR/EULAR classification criteria for RA, clinically diagnosed RA patients can be categorized into four serotypes: (i) positive for both RF and ACPA, (ii) positive for RF and negative for ACPA, (iii) negative for RF and positive for ACPA and (iv) negative for both RF and ACPA.However, the levels of RF are also perturbed in connective tissue diseases [10] and some chronic infectious diseases such as hepatitis B and hepatitis C virus infections [11].RF is thus not a specific diagnostic marker for RA.ACPA is comparatively a more specific biomarker and two-thirds of the individuals ultimately diagnosed with RA were tested positive for ACPAs 6-10 years before diagnosis [12,13].A total of 1-3% of the healthy population may also test positive for ACPAs suggesting the decreased specificity of this biomarker [14][15][16][17].Therefore, it is important to discover the biomarkers for the diagnosis of RA with both increased sensitivity and specificity.', 'para': '6', 'bboxes': \"[[{'page': '2', 'x': '187.65', 'y': '223.58', 'h': '373.27', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '236.13', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '248.68', 'h': '394.53', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '261.24', 'h': '133.81', 'w': '9.58'}], [{'page': '2', 'x': '303.29', 'y': '261.24', 'h': '257.23', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '273.79', 'h': '393.08', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '286.34', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.10', 'y': '298.90', 'h': '272.66', 'w': '9.58'}], [{'page': '2', 'x': '441.85', 'y': '298.90', 'h': '117.43', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '311.45', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '324.00', 'h': '240.16', 'w': '9.58'}], [{'page': '2', 'x': '409.64', 'y': '324.00', 'h': '149.63', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '336.55', 'h': '67.99', 'w': '9.58'}], [{'page': '2', 'x': '236.99', 'y': '336.55', 'h': '322.28', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '349.11', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '361.66', 'h': '107.38', 'w': '9.58'}], [{'page': '2', 'x': '276.86', 'y': '361.66', 'h': '282.42', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '374.21', 'h': '325.69', 'w': '9.58'}], [{'page': '2', 'x': '495.20', 'y': '374.21', 'h': '64.08', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '386.77', 'h': '393.27', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '399.32', 'h': '65.18', 'w': '9.58'}]]\", 'pages': \"('2', '2')\", 'section_title': 'Introduction', 'section_number': '1.', 'paper_title': 'LC-MS/MS-Based Serum Protein Profiling for Identification of Candidate Biomarkers in Pakistani Rheumatoid Arthritis Patients', 'file_path': '/data/tommaso/llm4scilit/data/papers/1.pdf'}),\n",
       " Document(page_content='For validation, serum samples were collected and processed from RA patients (n = 60) (mean age ± SD = 41.495 ± 12.8275) and healthy controls (n = 20) (mean age ± SD = 45.4 ± 11.31) from the same population.\\n\\nThe demographics and clinical characteristics of the experimental and validation cohort are shown in Table 1.', metadata={'text': 'For validation, serum samples were collected and processed from RA patients (n = 60) (mean age ± SD = 41.495 ± 12.8275) and healthy controls (n = 20) (mean age ± SD = 45.4 ± 11.31) from the same population.The demographics and clinical characteristics of the experimental and validation cohort are shown in Table 1.', 'para': '1', 'bboxes': \"[[{'page': '3', 'x': '187.65', 'y': '160.81', 'h': '372.02', 'w': '9.58'}, {'page': '3', 'x': '166.10', 'y': '173.05', 'h': '394.17', 'w': '9.90'}, {'page': '3', 'x': '166.07', 'y': '185.60', 'h': '256.73', 'w': '9.90'}], [{'page': '3', 'x': '425.92', 'y': '185.92', 'h': '133.36', 'w': '9.58'}, {'page': '3', 'x': '166.39', 'y': '198.47', 'h': '343.00', 'w': '9.58'}]]\", 'pages': \"('3', '3')\", 'section_title': 'Study Subjects and Serum Collection', 'section_number': '2.1.', 'paper_title': 'LC-MS/MS-Based Serum Protein Profiling for Identification of Candidate Biomarkers in Pakistani Rheumatoid Arthritis Patients', 'file_path': '/data/tommaso/llm4scilit/data/papers/1.pdf'})]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.as_retriever().get_relevant_documents(\"What are the main serological markers for RA?\", metadata={\"paper_title\": \"LC-MS/MS-Based Serum Protein Profiling for Identification of Candidate Biomarkers in Pakistani Rheumatoid Arthritis Patients\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "60"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.index.ntotal"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pathlib import Path\n",
    "DATA_PATH = Path(\"/data/tommaso/llm4scilit/data\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['/data/tommaso/llm4scilit/data/papers/3.pdf',\n",
       " '/data/tommaso/llm4scilit/data/papers/2.pdf',\n",
       " '/data/tommaso/llm4scilit/data/papers/7.pdf',\n",
       " '/data/tommaso/llm4scilit/data/papers/1.pdf',\n",
       " '/data/tommaso/llm4scilit/data/papers/6.pdf',\n",
       " '/data/tommaso/llm4scilit/data/papers/10.pdf',\n",
       " '/data/tommaso/llm4scilit/data/papers/5.pdf',\n",
       " '/data/tommaso/llm4scilit/data/papers/4.pdf',\n",
       " '/data/tommaso/llm4scilit/data/papers/9.pdf',\n",
       " '/data/tommaso/llm4scilit/data/papers/8.pdf']"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import glob\n",
    "glob.glob(str(DATA_PATH / \"papers/*\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(page_content='We determined that 144 proteins showed significant differential abundance between the IA and control SF proteomes, of which 11 protein candidates were selected for future follow-up studies.Similar analyses applied to our peptidomic data identified 15 peptide sequences, originating from 4 protein precursors, to have significant differential abundance in IA compared to the control SF peptidome.Pathway enrichment analysis of the IA SF peptidome along with AMP prediction suggests a possible mechanistic role of microbes in eliciting an immune response which drives the development of IA.', metadata={'text': 'We determined that 144 proteins showed significant differential abundance between the IA and control SF proteomes, of which 11 protein candidates were selected for future follow-up studies.Similar analyses applied to our peptidomic data identified 15 peptide sequences, originating from 4 protein precursors, to have significant differential abundance in IA compared to the control SF peptidome.Pathway enrichment analysis of the IA SF peptidome along with AMP prediction suggests a possible mechanistic role of microbes in eliciting an immune response which drives the development of IA.', 'para': '2', 'bboxes': \"[[{'page': '1', 'x': '101.12', 'y': '422.98', 'h': '424.81', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '434.98', 'h': '340.13', 'w': '9.24'}], [{'page': '1', 'x': '405.45', 'y': '434.98', 'h': '120.66', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '446.98', 'h': '468.92', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '458.98', 'h': '225.40', 'w': '9.24'}], [{'page': '1', 'x': '290.71', 'y': '458.98', 'h': '234.48', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '470.98', 'h': '460.78', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '482.98', 'h': '91.59', 'w': '9.24'}]]\", 'pages': \"('1', '1')\", 'section_title': 'Results:', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='The discovery-phase data generated herein has provided a basis for the identification of candidates with the greatest potential to serve as novel serum biomarkers specific to inflammatory arthritides.Moreover, these findings facilitate the understanding of possible disease mechanisms specific to each subtype.', metadata={'text': 'The discovery-phase data generated herein has provided a basis for the identification of candidates with the greatest potential to serve as novel serum biomarkers specific to inflammatory arthritides.Moreover, these findings facilitate the understanding of possible disease mechanisms specific to each subtype.', 'para': '1', 'bboxes': \"[[{'page': '1', 'x': '122.15', 'y': '497.98', 'h': '394.30', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '509.98', 'h': '391.31', 'w': '9.24'}], [{'page': '1', 'x': '456.63', 'y': '509.98', 'h': '63.75', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '521.98', 'h': '374.26', 'w': '9.24'}]]\", 'pages': \"('1', '1')\", 'section_title': 'Conclusions:', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content=\"Inflammatory arthritis (IA) is characterized by synovial hyperplasia leading to degradation of adjacent articular cartilage and bone [1].The term encompasses several forms of inflammatory joint diseases that when taken together, have an annual incidence ranging from 115 to 271 per 100,000 adults [2].IA is a multifactorial disease driven by the complex interplay of both genetics and the environment.Rheumatoid arthritis (RA), the most common and potentially destructive IA, has a well-established association with class II major histocompatibility complex (MHC) alleles while the spondyloarthritides, such as psoriatic arthritis (PsA), are more frequently associated with class I MHC alleles [3].Susceptibility to IA increases when genetic predisposition is complemented by environmental risk factors such as smoking, obesity and more recently, microbial infection and intestinal dysbiosis [4][5][6].The exact etiology of IA is still poorly understood with studies aimed at delineating the molecular pathways driving loss of immunological tolerance to the body's self-antigens.Alterations to the adaptive and innate immune system perpetuate systemic inflammation and lead to an elevated risk of developing comorbid conditions such as cardiovascular disease, metabolic syndrome, diabetes and depression [7,8].Naturally, there is a compelling need to identify markers of aberrant immune pathways relevant to IA which may advance current insights into the molecular mechanisms of the disease and serve as clinical markers for disease monitoring and treatment responses.\", metadata={'text': \"Inflammatory arthritis (IA) is characterized by synovial hyperplasia leading to degradation of adjacent articular cartilage and bone [1].The term encompasses several forms of inflammatory joint diseases that when taken together, have an annual incidence ranging from 115 to 271 per 100,000 adults [2].IA is a multifactorial disease driven by the complex interplay of both genetics and the environment.Rheumatoid arthritis (RA), the most common and potentially destructive IA, has a well-established association with class II major histocompatibility complex (MHC) alleles while the spondyloarthritides, such as psoriatic arthritis (PsA), are more frequently associated with class I MHC alleles [3].Susceptibility to IA increases when genetic predisposition is complemented by environmental risk factors such as smoking, obesity and more recently, microbial infection and intestinal dysbiosis [4][5][6].The exact etiology of IA is still poorly understood with studies aimed at delineating the molecular pathways driving loss of immunological tolerance to the body's self-antigens.Alterations to the adaptive and innate immune system perpetuate systemic inflammation and lead to an elevated risk of developing comorbid conditions such as cardiovascular disease, metabolic syndrome, diabetes and depression [7,8].Naturally, there is a compelling need to identify markers of aberrant immune pathways relevant to IA which may advance current insights into the molecular mechanisms of the disease and serve as clinical markers for disease monitoring and treatment responses.\", 'para': '7', 'bboxes': \"[[{'page': '2', 'x': '56.69', 'y': '101.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '113.84', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '125.84', 'h': '98.24', 'w': '11.68'}], [{'page': '2', 'x': '158.95', 'y': '125.84', 'h': '131.59', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '137.83', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '149.83', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '161.83', 'h': '124.09', 'w': '11.68'}], [{'page': '2', 'x': '183.72', 'y': '161.83', 'h': '106.83', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '173.85', 'h': '233.83', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '185.84', 'h': '94.37', 'w': '11.68'}], [{'page': '2', 'x': '155.55', 'y': '185.84', 'h': '135.01', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '197.84', 'h': '233.83', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '209.83', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '221.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '233.85', 'h': '233.84', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '245.84', 'h': '212.58', 'w': '11.68'}], [{'page': '2', 'x': '272.28', 'y': '245.84', 'h': '18.27', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '257.85', 'h': '233.83', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '269.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '281.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '293.83', 'h': '127.47', 'w': '11.68'}], [{'page': '2', 'x': '187.45', 'y': '293.83', 'h': '103.09', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '305.83', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '317.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '329.85', 'h': '184.18', 'w': '11.68'}], [{'page': '2', 'x': '243.59', 'y': '329.85', 'h': '46.94', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '341.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '353.84', 'h': '233.84', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '365.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '377.83', 'h': '233.85', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '389.83', 'h': '24.69', 'w': '11.68'}], [{'page': '2', 'x': '84.82', 'y': '389.83', 'h': '205.76', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '401.82', 'h': '233.85', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '413.82', 'h': '233.84', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '425.81', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '437.81', 'h': '203.55', 'w': '11.68'}]]\", 'pages': \"('2', '2')\", 'section_title': 'Introduction', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='The rise in high-throughput technologies, such as next-generation gene sequencing and mass spectrometry (MS), facilitate the discovery of key modulators of disease.Specifically, MS-based approaches provide an essential analytical platform for the identification, quantification and characterization of candidate biomarkers.Biomarkers may come in the form of a molecular signature, a clinical feature or even as an imaging parameter.Molecular biomarkers may be further subtyped into the domains of genomics, transcriptomics, proteomics, metabolomics or peptidomics.Due to the importance of proteins in pathophysiological processes, there is increased interest in resolving the proteomic profile of biospecimens related to IA.Similarly, peptides play a seminal role in mediating physiological functions by serving as neurotransmitters, hormones, antibiotics and immune regulators [9].During IA, joint pain and inflammation are driven by aberrant proteolysis resulting in the production of inflammatory peptides and the destruction of joint cartilage and bone.Synovial fluid (SF), a proximal fluid which bathes the intrinsic joint structures, is an important reservoir of putative protein and peptide biomarkers whose abundance levels fluctuate in response to pathological changes due to disease [10].', metadata={'text': 'The rise in high-throughput technologies, such as next-generation gene sequencing and mass spectrometry (MS), facilitate the discovery of key modulators of disease.Specifically, MS-based approaches provide an essential analytical platform for the identification, quantification and characterization of candidate biomarkers.Biomarkers may come in the form of a molecular signature, a clinical feature or even as an imaging parameter.Molecular biomarkers may be further subtyped into the domains of genomics, transcriptomics, proteomics, metabolomics or peptidomics.Due to the importance of proteins in pathophysiological processes, there is increased interest in resolving the proteomic profile of biospecimens related to IA.Similarly, peptides play a seminal role in mediating physiological functions by serving as neurotransmitters, hormones, antibiotics and immune regulators [9].During IA, joint pain and inflammation are driven by aberrant proteolysis resulting in the production of inflammatory peptides and the destruction of joint cartilage and bone.Synovial fluid (SF), a proximal fluid which bathes the intrinsic joint structures, is an important reservoir of putative protein and peptide biomarkers whose abundance levels fluctuate in response to pathological changes due to disease [10].', 'para': '7', 'bboxes': \"[[{'page': '2', 'x': '64.69', 'y': '449.80', 'h': '225.84', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '461.80', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '473.85', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '485.84', 'h': '45.00', 'w': '11.68'}], [{'page': '2', 'x': '106.03', 'y': '485.84', 'h': '184.52', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '497.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '509.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '521.85', 'h': '36.67', 'w': '11.68'}], [{'page': '2', 'x': '96.18', 'y': '521.85', 'h': '194.37', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '533.85', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '545.84', 'h': '44.89', 'w': '11.68'}], [{'page': '2', 'x': '105.44', 'y': '545.84', 'h': '185.11', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '557.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '569.85', 'h': '200.97', 'w': '11.68'}], [{'page': '2', 'x': '261.20', 'y': '569.85', 'h': '29.37', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '581.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '593.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '605.85', 'h': '191.41', 'w': '11.68'}], [{'page': '2', 'x': '251.27', 'y': '605.85', 'h': '39.28', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '617.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '629.84', 'h': '233.84', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '641.83', 'h': '177.40', 'w': '11.68'}], [{'page': '2', 'x': '240.69', 'y': '641.83', 'h': '49.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '653.83', 'h': '233.85', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '665.83', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '677.85', 'h': '233.88', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '689.84', 'h': '23.12', 'w': '11.68'}], [{'page': '2', 'x': '82.70', 'y': '689.84', 'h': '207.82', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '701.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '713.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '89.32', 'h': '233.88', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '101.32', 'h': '116.75', 'w': '11.68'}]]\", 'pages': \"('2', '2')\", 'section_title': 'Introduction', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='In the current study, we performed MS-based proteomic and peptidomic analyses of SF from RA and PsA patients to identify and quantify significant proteins and peptides related to the aetiopathogenesis of IA.Differential abundance analyses highlighted the capacity for dysregulated SF proteins and peptides to reflect disease activity while pathway analysis and antimicrobial peptide (AMP) prediction alluded to a larger role of microbes in the initiation and progression of IA.These findings provide the means for discovering novel candidates to serve as possible biomarkers of IA while simultaneously, highlighting possible mechanistic networks responsible for the disease progression of RA and PsA.', metadata={'text': 'In the current study, we performed MS-based proteomic and peptidomic analyses of SF from RA and PsA patients to identify and quantify significant proteins and peptides related to the aetiopathogenesis of IA.Differential abundance analyses highlighted the capacity for dysregulated SF proteins and peptides to reflect disease activity while pathway analysis and antimicrobial peptide (AMP) prediction alluded to a larger role of microbes in the initiation and progression of IA.These findings provide the means for discovering novel candidates to serve as possible biomarkers of IA while simultaneously, highlighting possible mechanistic networks responsible for the disease progression of RA and PsA.', 'para': '2', 'bboxes': \"[[{'page': '2', 'x': '312.72', 'y': '113.32', 'h': '225.87', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '125.33', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '137.32', 'h': '233.88', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '149.32', 'h': '202.76', 'w': '11.68'}], [{'page': '2', 'x': '511.58', 'y': '149.32', 'h': '27.00', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '161.33', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '173.32', 'h': '233.85', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '185.32', 'h': '233.88', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '197.31', 'h': '233.85', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '209.31', 'h': '150.32', 'w': '11.68'}], [{'page': '2', 'x': '458.13', 'y': '209.31', 'h': '80.45', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '221.33', 'h': '233.88', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '233.32', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '245.33', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '257.32', 'h': '159.57', 'w': '11.68'}]]\", 'pages': \"('2', '2')\", 'section_title': 'Introduction', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Research ethics board approval was received for the study from the University Health Network, Mount Sinai Hospital and the University of Calgary.Informed consent was obtained from all patients.', metadata={'text': 'Research ethics board approval was received for the study from the University Health Network, Mount Sinai Hospital and the University of Calgary.Informed consent was obtained from all patients.', 'para': '1', 'bboxes': \"[[{'page': '2', 'x': '304.72', 'y': '305.33', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '317.32', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '329.32', 'h': '158.46', 'w': '11.68'}], [{'page': '2', 'x': '465.58', 'y': '329.32', 'h': '73.00', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '341.31', 'h': '125.37', 'w': '11.68'}]]\", 'pages': \"('2', '2')\", 'section_title': 'Patients and SF collection', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='SF samples for the study were obtained, retrospectively, from 10 cases with RA, 10 cases with PsA and 10 cadaveric controls.RA patients were classified according to the 1987 American College of Rheumatology (ACR) classification criteria [11].PsA patients satisfied the Classification Criteria for Psoriatic Arthritis (CASPAR) [12].', metadata={'text': 'SF samples for the study were obtained, retrospectively, from 10 cases with RA, 10 cases with PsA and 10 cadaveric controls.RA patients were classified according to the 1987 American College of Rheumatology (ACR) classification criteria [11].PsA patients satisfied the Classification Criteria for Psoriatic Arthritis (CASPAR) [12].', 'para': '2', 'bboxes': \"[[{'page': '2', 'x': '312.72', 'y': '353.31', 'h': '225.86', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '365.30', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '377.33', 'h': '53.52', 'w': '11.68'}], [{'page': '2', 'x': '360.58', 'y': '377.33', 'h': '178.02', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '389.32', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '401.33', 'h': '80.06', 'w': '11.68'}], [{'page': '2', 'x': '388.11', 'y': '401.33', 'h': '150.47', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '413.33', 'h': '207.09', 'w': '11.68'}]]\", 'pages': \"('2', '2')\", 'section_title': 'Patients and SF collection', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Cadaveric control SF were obtained from joints through the Southern Alberta Tissue Donation Program.Inclusion criteria consisted of an age of 18 years or older with no medical history of arthritis, joint injury or joint surgery (including visual inspection of cartilage surfaces during recovery), no prescription anti-inflammatory medications and availability within 4 h of death.Exclusion criteria for all disease cohorts included patients receiving therapeutic biological drugs and the presence of other causes of inflammation (e.g.infection and/or crystal disease) or co-morbidities (e.g.cancer).', metadata={'text': 'Cadaveric control SF were obtained from joints through the Southern Alberta Tissue Donation Program.Inclusion criteria consisted of an age of 18 years or older with no medical history of arthritis, joint injury or joint surgery (including visual inspection of cartilage surfaces during recovery), no prescription anti-inflammatory medications and availability within 4 h of death.Exclusion criteria for all disease cohorts included patients receiving therapeutic biological drugs and the presence of other causes of inflammation (e.g.infection and/or crystal disease) or co-morbidities (e.g.cancer).', 'para': '4', 'bboxes': \"[[{'page': '2', 'x': '312.72', 'y': '425.32', 'h': '225.87', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '437.32', 'h': '233.88', 'w': '11.68'}], [{'page': '2', 'x': '304.72', 'y': '449.31', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '461.31', 'h': '233.89', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '473.30', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '485.30', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '497.29', 'h': '204.09', 'w': '11.68'}], [{'page': '2', 'x': '512.45', 'y': '497.29', 'h': '26.13', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '509.33', 'h': '233.88', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '521.32', 'h': '233.83', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '533.32', 'h': '160.24', 'w': '11.68'}], [{'page': '2', 'x': '469.66', 'y': '533.32', 'h': '68.90', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '545.31', 'h': '154.80', 'w': '11.68'}], [{'page': '2', 'x': '461.83', 'y': '545.31', 'h': '32.43', 'w': '11.68'}]]\", 'pages': \"('2', '2')\", 'section_title': 'Patients and SF collection', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='IA SF samples (both RA and PsA) were obtained through needle aspiration of knee joints and kept on ice.Samples were transferred to centrifuge tubes and spun at 160 RCF for 10 min at 4 °C.The supernatant was transferred to a sterile 1.5 mL centrifuge tube and spun at 2000 RCF for another 10 min at 4 °C.Samples were immediately stored at -80 °C until further processing.SF samples from cadavers were collected without the use of lavage.Samples were centrifuged at 3000 RCF for 15 min and stored at -80 °C until further processing.', metadata={'text': 'IA SF samples (both RA and PsA) were obtained through needle aspiration of knee joints and kept on ice.Samples were transferred to centrifuge tubes and spun at 160 RCF for 10 min at 4 °C.The supernatant was transferred to a sterile 1.5 mL centrifuge tube and spun at 2000 RCF for another 10 min at 4 °C.Samples were immediately stored at -80 °C until further processing.SF samples from cadavers were collected without the use of lavage.Samples were centrifuged at 3000 RCF for 15 min and stored at -80 °C until further processing.', 'para': '5', 'bboxes': \"[[{'page': '2', 'x': '304.72', 'y': '581.33', 'h': '233.90', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '593.32', 'h': '197.31', 'w': '11.68'}], [{'page': '2', 'x': '504.76', 'y': '593.32', 'h': '33.83', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '605.32', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '617.31', 'h': '77.57', 'w': '11.68'}], [{'page': '2', 'x': '385.33', 'y': '617.31', 'h': '153.23', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '629.31', 'h': '233.85', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '641.30', 'h': '94.55', 'w': '11.68'}], [{'page': '2', 'x': '401.69', 'y': '641.30', 'h': '136.92', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '653.30', 'h': '7.62', 'w': '11.68'}, {'page': '2', 'x': '317.47', 'y': '652.90', 'h': '7.64', 'w': '16.21'}, {'page': '2', 'x': '326.34', 'y': '653.30', 'h': '134.05', 'w': '11.68'}], [{'page': '2', 'x': '465.52', 'y': '653.30', 'h': '73.08', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '665.29', 'h': '209.12', 'w': '11.68'}], [{'page': '2', 'x': '517.19', 'y': '665.29', 'h': '21.40', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '677.33', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '689.32', 'h': '7.62', 'w': '11.68'}, {'page': '2', 'x': '314.65', 'y': '688.92', 'h': '7.64', 'w': '16.21'}, {'page': '2', 'x': '323.52', 'y': '689.32', 'h': '122.76', 'w': '11.68'}]]\", 'pages': \"('2', '2')\", 'section_title': 'SF sample preparation', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='At the time of analysis, samples were blinded, randomized, thawed on ice and their respective total protein concentrations were measured with a Pierce Coomassie (Bradford) total protein assay.', metadata={'text': 'At the time of analysis, samples were blinded, randomized, thawed on ice and their respective total protein concentrations were measured with a Pierce Coomassie (Bradford) total protein assay.', 'para': '0', 'bboxes': \"[[{'page': '2', 'x': '312.72', 'y': '701.32', 'h': '225.86', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '713.33', 'h': '233.88', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '89.33', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '101.32', 'h': '120.95', 'w': '11.68'}]]\", 'pages': \"('2', '3')\", 'section_title': 'SF sample preparation', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='For proteomic investigations, SF samples were first adjusted to 300 µg total protein in 50 mM ammonium bicarbonate (ABC).Protein concentration was conducted using Amicon Ultra-0.5 centrifugal filter units (10 kDa molecular weight cut-off; MilliporeSigma) which were pre-equilibrated with 400 uL of 50 mM ABC.Samples were loaded and spun at 10,000 RPM for 35 min at 4 °C and transferred to a new tube by spinning upside at 5000 RPM for 2 min.', metadata={'text': 'For proteomic investigations, SF samples were first adjusted to 300 µg total protein in 50 mM ammonium bicarbonate (ABC).Protein concentration was conducted using Amicon Ultra-0.5 centrifugal filter units (10 kDa molecular weight cut-off; MilliporeSigma) which were pre-equilibrated with 400 uL of 50 mM ABC.Samples were loaded and spun at 10,000 RPM for 35 min at 4 °C and transferred to a new tube by spinning upside at 5000 RPM for 2 min.', 'para': '2', 'bboxes': \"[[{'page': '3', 'x': '56.69', 'y': '145.21', 'h': '233.88', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '157.21', 'h': '233.84', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '169.20', 'h': '79.32', 'w': '11.68'}], [{'page': '3', 'x': '138.23', 'y': '169.20', 'h': '152.33', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '181.20', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '193.19', 'h': '233.84', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '205.19', 'h': '195.97', 'w': '11.68'}], [{'page': '3', 'x': '256.70', 'y': '205.19', 'h': '33.83', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '217.18', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '229.18', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '241.18', 'h': '62.93', 'w': '11.68'}]]\", 'pages': \"('3', '3')\", 'section_title': 'SF sample preparation for proteomic analysis', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Concentrates were collected and brought to a total volume of 100 µL with 50 mM ABC.Proteins were denatured with powdered urea to a final concentration of 8 M. Dithiothreitol (DTT) (Sigma-Aldrich) was added to each concentrate sample to a final concentration of 5 mM and incubated at 60 °C for 45 min.This was followed by alkylation with 15 mM iodoacetamide (IAM) (Sigma-Aldrich) at room temperature in the dark for 45 min.Samples were diluted fivefold with 50 mM ABC to prevent inhibition of trypsin activity by high concentrations of urea.Concentrate samples were digested with trypsin (Sigma-Aldrich) in a 1:50 (trypsin to total protein) ratio for 20 h at 37 °C and then dropwise acidified to a pH of 2 with formic acid (FA) to inhibit trypsin activity.Samples were reduced to 300 µL via speed vacuum concentration and stored at -20 °C until subjected to liquid chromatography-tandem mass spectrometry (LC-MS/MS).', metadata={'text': 'Concentrates were collected and brought to a total volume of 100 µL with 50 mM ABC.Proteins were denatured with powdered urea to a final concentration of 8 M. Dithiothreitol (DTT) (Sigma-Aldrich) was added to each concentrate sample to a final concentration of 5 mM and incubated at 60 °C for 45 min.This was followed by alkylation with 15 mM iodoacetamide (IAM) (Sigma-Aldrich) at room temperature in the dark for 45 min.Samples were diluted fivefold with 50 mM ABC to prevent inhibition of trypsin activity by high concentrations of urea.Concentrate samples were digested with trypsin (Sigma-Aldrich) in a 1:50 (trypsin to total protein) ratio for 20 h at 37 °C and then dropwise acidified to a pH of 2 with formic acid (FA) to inhibit trypsin activity.Samples were reduced to 300 µL via speed vacuum concentration and stored at -20 °C until subjected to liquid chromatography-tandem mass spectrometry (LC-MS/MS).', 'para': '5', 'bboxes': \"[[{'page': '3', 'x': '64.69', 'y': '253.17', 'h': '225.86', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '265.21', 'h': '145.83', 'w': '11.68'}], [{'page': '3', 'x': '206.29', 'y': '265.21', 'h': '84.27', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '277.21', 'h': '233.84', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '289.21', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '301.20', 'h': '233.89', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '313.20', 'h': '144.15', 'w': '11.68'}], [{'page': '3', 'x': '203.88', 'y': '313.20', 'h': '86.69', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '325.19', 'h': '233.84', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '337.19', 'h': '233.86', 'w': '11.68'}], [{'page': '3', 'x': '56.69', 'y': '349.18', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '361.21', 'h': '233.88', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '373.21', 'h': '31.59', 'w': '11.68'}], [{'page': '3', 'x': '91.32', 'y': '373.21', 'h': '199.24', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '385.20', 'h': '233.87', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '397.20', 'h': '233.88', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '409.19', 'h': '197.11', 'w': '11.68'}], [{'page': '3', 'x': '256.73', 'y': '409.19', 'h': '33.83', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '421.19', 'h': '233.85', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '433.18', 'h': '55.66', 'w': '11.68'}, {'page': '3', 'x': '115.92', 'y': '432.78', 'h': '7.64', 'w': '16.21'}, {'page': '3', 'x': '124.80', 'y': '433.18', 'h': '165.75', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '445.21', 'h': '212.40', 'w': '11.68'}]]\", 'pages': \"('3', '3')\", 'section_title': 'SF sample preparation for proteomic analysis', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Peptides were isolated based on a protocol described by Kamphorst et al. [13].Fifty microliters of SF were diluted in 235 µL of 50 mM ABC and 15 µL dimethyl sulfoxide (DMSO) for peptidomic analysis.Peptide concentration was conducted using Amicon Ultra-0.5 centrifugal filter units (10 kDa MWCO; MilliporeSigma) which were pre-equilibrated with 250 µL of 50 mM ABC. SF samples were spun at 10 000 RPM for 60 min at 4 °C then washed with 100 µL of 50 mM of ABC and spun for another 10 min.Filtrates were acidified with 5 µL of FA.', metadata={'text': 'Peptides were isolated based on a protocol described by Kamphorst et al. [13].Fifty microliters of SF were diluted in 235 µL of 50 mM ABC and 15 µL dimethyl sulfoxide (DMSO) for peptidomic analysis.Peptide concentration was conducted using Amicon Ultra-0.5 centrifugal filter units (10 kDa MWCO; MilliporeSigma) which were pre-equilibrated with 250 µL of 50 mM ABC. SF samples were spun at 10 000 RPM for 60 min at 4 °C then washed with 100 µL of 50 mM of ABC and spun for another 10 min.Filtrates were acidified with 5 µL of FA.', 'para': '3', 'bboxes': \"[[{'page': '3', 'x': '56.69', 'y': '489.10', 'h': '233.88', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '501.10', 'h': '88.97', 'w': '11.68'}], [{'page': '3', 'x': '148.21', 'y': '501.10', 'h': '142.36', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '513.09', 'h': '233.84', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '525.09', 'h': '139.02', 'w': '11.68'}], [{'page': '3', 'x': '199.26', 'y': '525.09', 'h': '91.29', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '537.08', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '549.10', 'h': '233.85', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '561.10', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '573.09', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '585.09', 'h': '233.89', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '597.08', 'h': '30.69', 'w': '11.68'}], [{'page': '3', 'x': '89.69', 'y': '597.08', 'h': '159.80', 'w': '11.68'}]]\", 'pages': \"('3', '3')\", 'section_title': 'SF sample preparation for peptidomics analysis', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Peptides were desalted using one hydrophilic-lipophilic-balanced reverse-phase cartridge per sample (Oasis HLB).Each cartridge [1 mL (30 mg); Waters cat# WAT094225] was first pre-equilibrated with 1 mL of 90% acetonitrile (ACN), 0.1% FA and 0.02% trifluoroacetic acid (TFA) and then washed with 3 mL of buffer A (5% ACN, 0.1% FA, 0.02% TFA).The SF sample was then passed through the cartridge and washed a second time with 3 mL of buffer A. Peptides were eluted with 700 µL of 60% ACN, 0.1% FA and 0.02% TFA and each eluate was reduced to a volume of less than 300 µL and stored at -20 °C until subjected to LC-MS/MS.', metadata={'text': 'Peptides were desalted using one hydrophilic-lipophilic-balanced reverse-phase cartridge per sample (Oasis HLB).Each cartridge [1 mL (30 mg); Waters cat# WAT094225] was first pre-equilibrated with 1 mL of 90% acetonitrile (ACN), 0.1% FA and 0.02% trifluoroacetic acid (TFA) and then washed with 3 mL of buffer A (5% ACN, 0.1% FA, 0.02% TFA).The SF sample was then passed through the cartridge and washed a second time with 3 mL of buffer A. Peptides were eluted with 700 µL of 60% ACN, 0.1% FA and 0.02% TFA and each eluate was reduced to a volume of less than 300 µL and stored at -20 °C until subjected to LC-MS/MS.', 'para': '2', 'bboxes': \"[[{'page': '3', 'x': '64.69', 'y': '609.08', 'h': '225.86', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '621.10', 'h': '233.84', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '633.10', 'h': '53.38', 'w': '11.68'}], [{'page': '3', 'x': '113.12', 'y': '633.10', 'h': '177.46', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '645.09', 'h': '233.85', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '657.09', 'h': '233.85', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '669.08', 'h': '233.89', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '681.08', 'h': '122.57', 'w': '11.68'}], [{'page': '3', 'x': '183.89', 'y': '681.08', 'h': '106.67', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '693.07', 'h': '233.88', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '705.07', 'h': '233.88', 'w': '11.68'}, {'page': '3', 'x': '56.69', 'y': '717.06', 'h': '233.89', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '89.29', 'h': '233.82', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '100.89', 'h': '7.64', 'w': '16.21'}, {'page': '3', 'x': '314.67', 'y': '101.29', 'h': '149.77', 'w': '11.68'}]]\", 'pages': \"('3', '3')\", 'section_title': 'SF sample preparation for peptidomics analysis', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Processed samples were desalted using C-18 OMIX Pipette Tips (Agilent Technologies, USA) and eluted in 3 µL of MS buffer B (65% ACN, 0.1% FA in H 2 O).The eluates were then diluted with 57 µL of MS buffer A (0.1% FA in H 2 O) and 28 µL were injected onto a 2 cm C18 trap column, packed with Varian Pursuit (5 µm C18), with an 8 µm tip (New Objective).The LC setup was coupled online to a Q Exactive (Thermo Fisher Scientific, USA) mass spectrometer with a nanoelectrospray ionization source (Proxeon Biosystems).Samples for direct proteomic analysis as well as samples for direct peptidomics analysis underwent a 60-min linear gradient using MS buffer A and MS buffer B. Eluted peptides were subjected to tandem mass spectrometry in positive ion mode.Data acquisition was conducted via Thermo XCalibur v.3.0.63 (Thermo Fisher Scientific, USA).', metadata={'text': 'Processed samples were desalted using C-18 OMIX Pipette Tips (Agilent Technologies, USA) and eluted in 3 µL of MS buffer B (65% ACN, 0.1% FA in H 2 O).The eluates were then diluted with 57 µL of MS buffer A (0.1% FA in H 2 O) and 28 µL were injected onto a 2 cm C18 trap column, packed with Varian Pursuit (5 µm C18), with an 8 µm tip (New Objective).The LC setup was coupled online to a Q Exactive (Thermo Fisher Scientific, USA) mass spectrometer with a nanoelectrospray ionization source (Proxeon Biosystems).Samples for direct proteomic analysis as well as samples for direct peptidomics analysis underwent a 60-min linear gradient using MS buffer A and MS buffer B. Eluted peptides were subjected to tandem mass spectrometry in positive ion mode.Data acquisition was conducted via Thermo XCalibur v.3.0.63 (Thermo Fisher Scientific, USA).', 'para': '4', 'bboxes': \"[[{'page': '3', 'x': '304.72', 'y': '138.58', 'h': '233.88', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '150.58', 'h': '233.87', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '162.58', 'h': '198.22', 'w': '11.68'}, {'page': '3', 'x': '502.95', 'y': '166.97', 'h': '3.44', 'w': '8.18'}, {'page': '3', 'x': '506.39', 'y': '162.58', 'h': '13.12', 'w': '11.68'}], [{'page': '3', 'x': '523.19', 'y': '162.58', 'h': '15.40', 'w': '11.68'}, {'page': '3', 'x': '304.73', 'y': '174.58', 'h': '233.85', 'w': '11.68'}, {'page': '3', 'x': '304.73', 'y': '186.58', 'h': '31.91', 'w': '11.68'}, {'page': '3', 'x': '336.64', 'y': '190.97', 'h': '3.44', 'w': '8.18'}, {'page': '3', 'x': '340.08', 'y': '186.58', 'h': '198.48', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '198.58', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '210.58', 'h': '121.39', 'w': '11.68'}], [{'page': '3', 'x': '429.06', 'y': '210.58', 'h': '109.53', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '222.57', 'h': '233.89', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '234.57', 'h': '233.85', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '246.56', 'h': '124.67', 'w': '11.68'}], [{'page': '3', 'x': '434.13', 'y': '246.56', 'h': '104.45', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '258.58', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '270.58', 'h': '233.85', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '282.58', 'h': '233.87', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '294.57', 'h': '211.84', 'w': '11.68'}], [{'page': '3', 'x': '519.17', 'y': '294.57', 'h': '19.40', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '306.57', 'h': '233.85', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '318.56', 'h': '132.70', 'w': '11.68'}]]\", 'pages': \"('3', '3')\", 'section_title': 'LC-MS/MS', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='The resulting proteomic and peptidomic raw data files were uploaded into MaxQuant v.1.5.2.8 (www.coxdocs.org) [14] with the integrated Andromeda search engine.MS and MS/MS spectra were searched against a reverted version of the SwissProt human protein database (version July 2017) for protein identification and a randomized version of the SwissProt human protein database for peptide identification.Search parameters for proteomic analysis included carbamidomethylation of cysteines as a fixed modification and oxidized methionine and N-terminal acetylation as variable modifications.Trypsin was the chosen digestion enzyme and a maximum of two missed cleavages were allowed.Search parameters for peptidomic analysis included oxidized methionine and oxidized proline as variable modifications.An unspecific enzyme search was the chosen digestion method.Both proteomic and peptidomic data were initially searched against a smaller \"human first search\" database with a peptide tolerance of 20 ppm for mass recalibration.The main search was performed using the Swissprot human protein database (version July 2017) with a peptide tolerance of 4.5 ppm.Data was analyzed using label-free quantification (LFQ) with a minimum ratio count of 1 and the \"Match between runs\" interval set to 2 min.The peptide-spectrum match and protein false discovery rate were set to 1%.', metadata={'text': 'The resulting proteomic and peptidomic raw data files were uploaded into MaxQuant v.1.5.2.8 (www.coxdocs.org) [14] with the integrated Andromeda search engine.MS and MS/MS spectra were searched against a reverted version of the SwissProt human protein database (version July 2017) for protein identification and a randomized version of the SwissProt human protein database for peptide identification.Search parameters for proteomic analysis included carbamidomethylation of cysteines as a fixed modification and oxidized methionine and N-terminal acetylation as variable modifications.Trypsin was the chosen digestion enzyme and a maximum of two missed cleavages were allowed.Search parameters for peptidomic analysis included oxidized methionine and oxidized proline as variable modifications.An unspecific enzyme search was the chosen digestion method.Both proteomic and peptidomic data were initially searched against a smaller \"human first search\" database with a peptide tolerance of 20 ppm for mass recalibration.The main search was performed using the Swissprot human protein database (version July 2017) with a peptide tolerance of 4.5 ppm.Data was analyzed using label-free quantification (LFQ) with a minimum ratio count of 1 and the \"Match between runs\" interval set to 2 min.The peptide-spectrum match and protein false discovery rate were set to 1%.', 'para': '10', 'bboxes': \"[[{'page': '3', 'x': '304.72', 'y': '355.84', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '367.84', 'h': '233.86', 'w': '11.68'}], [{'page': '3', 'x': '304.72', 'y': '379.83', 'h': '233.85', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '391.83', 'h': '29.49', 'w': '11.68'}], [{'page': '3', 'x': '338.30', 'y': '391.83', 'h': '200.27', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '403.82', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '415.84', 'h': '233.83', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '427.84', 'h': '233.88', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '439.83', 'h': '150.06', 'w': '11.68'}], [{'page': '3', 'x': '459.90', 'y': '439.83', 'h': '78.69', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '451.83', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '463.82', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '475.84', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '487.84', 'h': '22.75', 'w': '11.68'}], [{'page': '3', 'x': '331.63', 'y': '487.84', 'h': '206.95', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '499.84', 'h': '203.21', 'w': '11.68'}], [{'page': '3', 'x': '510.96', 'y': '499.84', 'h': '27.63', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '511.83', 'h': '233.87', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '523.83', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '535.84', 'h': '22.75', 'w': '11.68'}], [{'page': '3', 'x': '333.47', 'y': '535.84', 'h': '205.13', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '547.84', 'h': '74.99', 'w': '11.68'}], [{'page': '3', 'x': '383.16', 'y': '547.84', 'h': '155.44', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '559.83', 'h': '233.85', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '571.83', 'h': '233.86', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '583.82', 'h': '77.07', 'w': '11.68'}], [{'page': '3', 'x': '384.01', 'y': '583.82', 'h': '154.60', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '595.82', 'h': '233.89', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '607.81', 'h': '152.96', 'w': '11.68'}], [{'page': '3', 'x': '461.10', 'y': '607.81', 'h': '77.52', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '619.81', 'h': '233.87', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '631.80', 'h': '233.89', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '643.80', 'h': '55.04', 'w': '11.68'}], [{'page': '3', 'x': '363.64', 'y': '643.80', 'h': '174.96', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '655.79', 'h': '140.36', 'w': '11.68'}]]\", 'pages': \"('3', '3')\", 'section_title': 'Protein identification and quantification', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Pathway analysis of dysregulated proteins identified by LC-MS/MS was conducted using the functional-analysis tool Ingenuity Pathway Analysis (IPA; http://www.ingenuity.com)[15].To determine the specificity of identified proteins at the tissue and biological fluid level, proteomic datasets were searched against ProteomicsDB (http:// www.Prote omics DB.org), a web-based database of mass spectrometry-generated proteomics data [16].Pathway analysis of SF peptides was conducted through the Database for Annotation, Visualization and Integrated Discovery (DAVID) 6.8 with reference to the Kyoto Encyclopedia of Genes and Genomes (KEGG) [17].Annotations with q-values of less than 0.05 were considered statistically significant.Identification of known AMPs in the SF peptidome was determined by comparison with experimentally validated human AMPs taken from the Collection of Anti-Microbial Peptides (CAMP R3 ) (http:// www.camp.bicnirrh.res.in/)database [18].AMP prediction of the identified peptides was performed using the support vector machine (SVM) learning algorithm developed for the CAMP R3 database.Peptides with an SVM score of 0.8 or higher were predicted to be antimicrobial.', metadata={'text': 'Pathway analysis of dysregulated proteins identified by LC-MS/MS was conducted using the functional-analysis tool Ingenuity Pathway Analysis (IPA; http://www.ingenuity.com)[15].To determine the specificity of identified proteins at the tissue and biological fluid level, proteomic datasets were searched against ProteomicsDB (http:// www.Prote omics DB.org), a web-based database of mass spectrometry-generated proteomics data [16].Pathway analysis of SF peptides was conducted through the Database for Annotation, Visualization and Integrated Discovery (DAVID) 6.8 with reference to the Kyoto Encyclopedia of Genes and Genomes (KEGG) [17].Annotations with q-values of less than 0.05 were considered statistically significant.Identification of known AMPs in the SF peptidome was determined by comparison with experimentally validated human AMPs taken from the Collection of Anti-Microbial Peptides (CAMP R3 ) (http:// www.camp.bicnirrh.res.in/)database [18].AMP prediction of the identified peptides was performed using the support vector machine (SVM) learning algorithm developed for the CAMP R3 database.Peptides with an SVM score of 0.8 or higher were predicted to be antimicrobial.', 'para': '10', 'bboxes': \"[[{'page': '3', 'x': '304.72', 'y': '693.10', 'h': '233.87', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '705.10', 'h': '233.84', 'w': '11.68'}, {'page': '3', 'x': '304.72', 'y': '717.09', 'h': '233.87', 'w': '11.68'}], [{'page': '4', 'x': '56.69', 'y': '88.58', 'h': '38.36', 'w': '11.68'}], [{'page': '4', 'x': '98.35', 'y': '88.58', 'h': '18.46', 'w': '11.68'}], [{'page': '4', 'x': '120.12', 'y': '88.58', 'h': '170.41', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '100.58', 'h': '233.88', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '112.57', 'h': '233.85', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '124.57', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '136.56', 'h': '204.08', 'w': '11.68'}], [{'page': '4', 'x': '268.46', 'y': '136.56', 'h': '22.09', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '148.58', 'h': '233.87', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '160.58', 'h': '233.85', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '172.57', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '184.58', 'h': '195.92', 'w': '11.68'}], [{'page': '4', 'x': '256.18', 'y': '184.58', 'h': '34.37', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '196.58', 'h': '233.83', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '208.58', 'h': '93.53', 'w': '11.68'}], [{'page': '4', 'x': '153.33', 'y': '208.58', 'h': '137.23', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '220.57', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '232.57', 'h': '233.87', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '244.56', 'h': '190.57', 'w': '11.68'}, {'page': '4', 'x': '247.26', 'y': '248.97', 'h': '7.70', 'w': '8.18'}, {'page': '4', 'x': '254.96', 'y': '244.58', 'h': '35.59', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '256.58', 'h': '67.94', 'w': '11.68'}], [{'page': '4', 'x': '124.64', 'y': '256.58', 'h': '44.50', 'w': '11.68'}], [{'page': '4', 'x': '173.18', 'y': '256.58', 'h': '58.02', 'w': '11.68'}], [{'page': '4', 'x': '235.25', 'y': '256.58', 'h': '55.30', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '268.58', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '280.58', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '292.58', 'h': '85.56', 'w': '11.68'}, {'page': '4', 'x': '142.25', 'y': '296.97', 'h': '7.70', 'w': '8.18'}, {'page': '4', 'x': '153.76', 'y': '292.58', 'h': '37.74', 'w': '11.68'}], [{'page': '4', 'x': '195.31', 'y': '292.58', 'h': '95.23', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '304.58', 'h': '231.28', 'w': '11.68'}]]\", 'pages': \"('3', '4')\", 'section_title': 'Bioinformatic analyses', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='A linear model was fitted to examine the effects of age and sex on the protein and peptide expression data using the LIMMA package in R [19].Due to the nature of data generated by LC-MS/MS, protein quantification is often skewed and imposes limits on statistical inference.To circumvent assumptions of normality, the Mann-Whitney U test coupled to the Benjamini-Hochberg correction to control for multiple hypothesis testing was performed for comparisons between two independent groups.Adjusted p values of less than 0.05 were considered statistically significant.Differential abundance of proteins and peptides were computed with the myTAI package in R, generating a ratio of log-transformed extracted ion currents in one group against the second group, considered to be the reference group [20].A volcano plot was used to visualize the results of the Mann-Whitney U test.', metadata={'text': 'A linear model was fitted to examine the effects of age and sex on the protein and peptide expression data using the LIMMA package in R [19].Due to the nature of data generated by LC-MS/MS, protein quantification is often skewed and imposes limits on statistical inference.To circumvent assumptions of normality, the Mann-Whitney U test coupled to the Benjamini-Hochberg correction to control for multiple hypothesis testing was performed for comparisons between two independent groups.Adjusted p values of less than 0.05 were considered statistically significant.Differential abundance of proteins and peptides were computed with the myTAI package in R, generating a ratio of log-transformed extracted ion currents in one group against the second group, considered to be the reference group [20].A volcano plot was used to visualize the results of the Mann-Whitney U test.', 'para': '5', 'bboxes': \"[[{'page': '4', 'x': '56.69', 'y': '370.15', 'h': '233.83', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '382.15', 'h': '233.84', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '394.14', 'h': '126.69', 'w': '11.68'}], [{'page': '4', 'x': '186.16', 'y': '394.14', 'h': '104.38', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '406.14', 'h': '233.88', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '418.13', 'h': '204.55', 'w': '11.68'}], [{'page': '4', 'x': '263.45', 'y': '418.13', 'h': '27.10', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '430.16', 'h': '233.85', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '442.15', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '454.15', 'h': '233.85', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '466.14', 'h': '194.44', 'w': '11.68'}], [{'page': '4', 'x': '253.71', 'y': '466.14', 'h': '36.85', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '478.14', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '490.16', 'h': '33.51', 'w': '11.68'}], [{'page': '4', 'x': '93.20', 'y': '490.16', 'h': '197.36', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '502.15', 'h': '233.84', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '514.15', 'h': '233.83', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '526.14', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '538.16', 'h': '76.53', 'w': '11.68'}], [{'page': '4', 'x': '136.70', 'y': '538.16', 'h': '153.83', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '550.15', 'h': '165.53', 'w': '11.68'}]]\", 'pages': \"('4', '4')\", 'section_title': 'Statistical analyses and data visualizations were completed with R (R Foundation for Statistical Computing).', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Demographics, disease characteristics and concomitant therapies of recruited patients are summarized in Table 1.', metadata={'text': 'Demographics, disease characteristics and concomitant therapies of recruited patients are summarized in Table 1.', 'para': '0', 'bboxes': \"[[{'page': '4', 'x': '56.69', 'y': '603.73', 'h': '233.85', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '615.72', 'h': '233.86', 'w': '11.68'}]]\", 'pages': \"('4', '4')\", 'section_title': 'Clinical characteristics of recruited patients', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Collectively, 389 unique proteins were identified across all IA SF proteomic samples.When assessing each cohort individually, 377 unique proteins were identified in RA patient samples, 369 unique proteins were identified in PsA patient samples and 399 proteins were identified in control patient samples.A review of the overlap between proteomes of each cohort revealed 347 proteins to be common to all three patient groups.', metadata={'text': 'Collectively, 389 unique proteins were identified across all IA SF proteomic samples.When assessing each cohort individually, 377 unique proteins were identified in RA patient samples, 369 unique proteins were identified in PsA patient samples and 399 proteins were identified in control patient samples.A review of the overlap between proteomes of each cohort revealed 347 proteins to be common to all three patient groups.', 'para': '2', 'bboxes': \"[[{'page': '4', 'x': '56.69', 'y': '657.30', 'h': '233.85', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '669.30', 'h': '116.50', 'w': '11.68'}], [{'page': '4', 'x': '175.43', 'y': '669.30', 'h': '115.12', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '681.29', 'h': '233.83', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '693.29', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '705.28', 'h': '233.84', 'w': '11.68'}, {'page': '4', 'x': '56.69', 'y': '717.28', 'h': '98.40', 'w': '11.68'}], [{'page': '4', 'x': '157.69', 'y': '717.28', 'h': '132.83', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '343.38', 'h': '233.85', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '355.37', 'h': '146.57', 'w': '11.68'}]]\", 'pages': \"('4', '4')\", 'section_title': 'Holistic protein and peptide mining', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='A total of 226 unique peptide sequences were identified across all IA SF samples originating from a total of 48 unique proteins.Inter-cohort comparisons identified 184 unique peptides in RA patient samples, 175 unique peptides in PsA patient samples and 192 unique peptides in control patient samples.Comparisons between the SF peptidomes of arthritic and control conditions revealed 95 peptides to be common to all three groups.', metadata={'text': 'A total of 226 unique peptide sequences were identified across all IA SF samples originating from a total of 48 unique proteins.Inter-cohort comparisons identified 184 unique peptides in RA patient samples, 175 unique peptides in PsA patient samples and 192 unique peptides in control patient samples.Comparisons between the SF peptidomes of arthritic and control conditions revealed 95 peptides to be common to all three groups.', 'para': '2', 'bboxes': \"[[{'page': '4', 'x': '312.72', 'y': '367.37', 'h': '225.87', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '379.40', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '391.39', 'h': '81.49', 'w': '11.68'}], [{'page': '4', 'x': '389.78', 'y': '391.39', 'h': '148.80', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '403.39', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '415.39', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '427.38', 'h': '110.00', 'w': '11.68'}], [{'page': '4', 'x': '417.62', 'y': '427.38', 'h': '120.98', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '439.38', 'h': '233.84', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '451.37', 'h': '187.39', 'w': '11.68'}]]\", 'pages': \"('4', '4')\", 'section_title': 'Holistic protein and peptide mining', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Next, we investigated the overlap between the proteins identified through our peptidomic approach and those identified through our proteomic approach by comparing the IA-associated proteins originating from both experiments.Of the 48 precursor proteins from our peptidomic study, 25 proteins were also found in the IA SF proteome (Fig. 1).Taken together, they have yielded the combined identification of 412 proteins in IA SF.A complete list of identified proteins and peptides are reported in Additional file 1: Tables S1, S2 and S3.', metadata={'text': 'Next, we investigated the overlap between the proteins identified through our peptidomic approach and those identified through our proteomic approach by comparing the IA-associated proteins originating from both experiments.Of the 48 precursor proteins from our peptidomic study, 25 proteins were also found in the IA SF proteome (Fig. 1).Taken together, they have yielded the combined identification of 412 proteins in IA SF.A complete list of identified proteins and peptides are reported in Additional file 1: Tables S1, S2 and S3.', 'para': '3', 'bboxes': \"[[{'page': '4', 'x': '312.72', 'y': '463.37', 'h': '225.89', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '475.36', 'h': '233.87', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '487.36', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '499.35', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '511.40', 'h': '27.69', 'w': '11.68'}], [{'page': '4', 'x': '334.67', 'y': '511.40', 'h': '203.93', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '523.39', 'h': '233.85', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '535.39', 'h': '31.50', 'w': '11.68'}], [{'page': '4', 'x': '339.36', 'y': '535.39', 'h': '199.24', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '547.39', 'h': '157.69', 'w': '11.68'}], [{'page': '4', 'x': '465.22', 'y': '547.39', 'h': '73.37', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '559.38', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '571.40', 'h': '134.72', 'w': '11.68'}]]\", 'pages': \"('4', '4')\", 'section_title': 'Holistic protein and peptide mining', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Differential abundance analyses were conducted to detect dysregulated proteins in the SF of: (1) IA compared to control and (2) RA compared to PsA.Using non-parametric statistical tests, 144 proteins were determined to have statistically significant differential abundance in IA SF with 54 proteins showing significant upregulation and 90 proteins showing significant downregulation (Fig. 2).When comparing RA and PsA proteomes, no proteins showed significant differences in abundance after correcting for multiple hypothesis testing.However, with respect to an unadjusted p value, 22 proteins were differentially abundant between the two groups with 13 proteins demonstrating significant upregulation in RA relative to PsA and 9 proteins showing significant upregulation in PsA relative to RA. Significantly dysregulated proteins in IA compared to control and significantly dysregulated proteins in RA compared to PsA are summarized in Additional file 1: Tables S4 andS5, respectively.Dysregulated functional pathways likely to be associated with the significantly upregulated and downregulated proteins of IA SF were detected with IPA.Core analyses determined the top 5 canonical pathways associated with upregulated proteins to be: (1) LXR/RXR activation, (2) FXR/RXR activation, (3) acute phase response signaling, (4) atherosclerosis signaling and (5) IL-12 signaling and production in macrophages, several of which Fig. 1 Venn diagram of proteins identified in the IA SF proteome and peptidome.The total number of proteins identified was 412, with 364 proteins detected in the proteome, 23 proteins detected in the peptidome and 25 proteins detected in both Fig. 2 Volcano plot of significantly differentially abundant proteins identified in the IA SF proteome relative to control SF.A total of 144 proteins, highlighted in blue and found above the y-intercept oflog 10 (0.05), were determined to have statistically significant differential abundance in IA SF have been previously associated with IA.Details regarding the top diseases and disorders as well as molecular and cellular functions associated with both groups of dysregulated proteins can be found in Additional file 1: Table S6.As the data suggests, upregulated proteins show more distinct relations to inflammatory and immunological processes while downregulated proteins demonstrate stronger relations to metabolic processes.Ultimately, to identify the strongest candidate biomarkers to be validated in IA patient serum, we focused on upregulated proteins in the SF.', metadata={'text': 'Differential abundance analyses were conducted to detect dysregulated proteins in the SF of: (1) IA compared to control and (2) RA compared to PsA.Using non-parametric statistical tests, 144 proteins were determined to have statistically significant differential abundance in IA SF with 54 proteins showing significant upregulation and 90 proteins showing significant downregulation (Fig. 2).When comparing RA and PsA proteomes, no proteins showed significant differences in abundance after correcting for multiple hypothesis testing.However, with respect to an unadjusted p value, 22 proteins were differentially abundant between the two groups with 13 proteins demonstrating significant upregulation in RA relative to PsA and 9 proteins showing significant upregulation in PsA relative to RA. Significantly dysregulated proteins in IA compared to control and significantly dysregulated proteins in RA compared to PsA are summarized in Additional file 1: Tables S4 andS5, respectively.Dysregulated functional pathways likely to be associated with the significantly upregulated and downregulated proteins of IA SF were detected with IPA.Core analyses determined the top 5 canonical pathways associated with upregulated proteins to be: (1) LXR/RXR activation, (2) FXR/RXR activation, (3) acute phase response signaling, (4) atherosclerosis signaling and (5) IL-12 signaling and production in macrophages, several of which Fig. 1 Venn diagram of proteins identified in the IA SF proteome and peptidome.The total number of proteins identified was 412, with 364 proteins detected in the proteome, 23 proteins detected in the peptidome and 25 proteins detected in both Fig. 2 Volcano plot of significantly differentially abundant proteins identified in the IA SF proteome relative to control SF.A total of 144 proteins, highlighted in blue and found above the y-intercept oflog 10 (0.05), were determined to have statistically significant differential abundance in IA SF have been previously associated with IA.Details regarding the top diseases and disorders as well as molecular and cellular functions associated with both groups of dysregulated proteins can be found in Additional file 1: Table S6.As the data suggests, upregulated proteins show more distinct relations to inflammatory and immunological processes while downregulated proteins demonstrate stronger relations to metabolic processes.Ultimately, to identify the strongest candidate biomarkers to be validated in IA patient serum, we focused on upregulated proteins in the SF.', 'para': '10', 'bboxes': \"[[{'page': '4', 'x': '304.72', 'y': '609.30', 'h': '233.87', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '621.30', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '633.29', 'h': '164.53', 'w': '11.68'}], [{'page': '4', 'x': '473.75', 'y': '633.29', 'h': '64.83', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '645.30', 'h': '233.87', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '657.30', 'h': '233.87', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '669.29', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '681.30', 'h': '233.86', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '693.30', 'h': '52.05', 'w': '11.68'}], [{'page': '4', 'x': '360.37', 'y': '693.30', 'h': '178.22', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '705.30', 'h': '233.85', 'w': '11.68'}, {'page': '4', 'x': '304.72', 'y': '717.29', 'h': '192.97', 'w': '11.68'}], [{'page': '4', 'x': '500.09', 'y': '717.29', 'h': '38.49', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '501.24', 'h': '233.88', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '513.23', 'h': '233.89', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '525.23', 'h': '233.86', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '537.23', 'h': '233.86', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '549.24', 'h': '233.86', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '561.23', 'h': '233.86', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '573.24', 'h': '233.86', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '585.24', 'h': '226.18', 'w': '11.68'}], [{'page': '5', 'x': '312.72', 'y': '597.23', 'h': '225.86', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '609.24', 'h': '233.86', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '621.24', 'h': '209.30', 'w': '11.68'}], [{'page': '5', 'x': '518.55', 'y': '621.24', 'h': '20.02', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '633.23', 'h': '233.86', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '645.24', 'h': '233.86', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '657.24', 'h': '233.86', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '669.23', 'h': '233.86', 'w': '11.68'}, {'page': '5', 'x': '304.72', 'y': '681.24', 'h': '233.84', 'w': '11.68'}, {'page': '5', 'x': '62.94', 'y': '451.38', 'h': '259.50', 'w': '7.39'}], [{'page': '5', 'x': '323.86', 'y': '451.38', 'h': '182.25', 'w': '7.39'}, {'page': '5', 'x': '62.94', 'y': '461.38', 'h': '344.77', 'w': '7.39'}, {'page': '5', 'x': '63.07', 'y': '637.74', 'h': '212.63', 'w': '7.39'}, {'page': '5', 'x': '63.07', 'y': '647.74', 'h': '169.06', 'w': '7.39'}], [{'page': '5', 'x': '233.88', 'y': '647.74', 'h': '29.01', 'w': '7.39'}, {'page': '5', 'x': '63.07', 'y': '657.74', 'h': '208.95', 'w': '7.39'}, {'page': '5', 'x': '63.07', 'y': '667.74', 'h': '6.35', 'w': '7.39'}, {'page': '5', 'x': '71.17', 'y': '667.53', 'h': '6.24', 'w': '13.23'}, {'page': '5', 'x': '78.41', 'y': '667.74', 'h': '187.66', 'w': '8.79'}, {'page': '5', 'x': '63.07', 'y': '677.74', 'h': '97.08', 'w': '7.39'}, {'page': '6', 'x': '56.69', 'y': '88.58', 'h': '169.21', 'w': '11.68'}], [{'page': '6', 'x': '228.93', 'y': '88.58', 'h': '61.63', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '100.58', 'h': '233.83', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '112.58', 'h': '233.85', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '124.57', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '136.57', 'h': '36.59', 'w': '11.68'}], [{'page': '6', 'x': '95.49', 'y': '136.57', 'h': '195.04', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '148.56', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '160.58', 'h': '233.85', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '172.58', 'h': '173.96', 'w': '11.68'}], [{'page': '6', 'x': '234.14', 'y': '172.58', 'h': '56.43', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '184.57', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '196.58', 'h': '233.88', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '208.58', 'h': '73.15', 'w': '11.68'}]]\", 'pages': \"('4', '6')\", 'section_title': 'Dysregulated proteins in IA SF', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Tissue and fluid specificity of upregulated proteins were used to narrow down the list of candidates deemed likely to be associated with IA, RA and PsA.We concentrated on proteins which displayed strong abundance in SF, bone, bone marrow or immune regulatory cells according to ProteomicsDB.Immunoglobulins were excluded from further analysis.The resulting list of upregulated proteins compared to the reference group consisted of 38 IAspecific, 8 RA-specific and 9 PsA-specific unique protein candidates.High abundance proteins in serum, as identified according to the literature [21,22], were excluded due to the likelihood that they were serum contaminants at the time of joint aspiration.Moreover, this ensured protein candidates were unlikely to be overexpressed in the serum of non-diseased patients.Following this filtering step, the final list of candidate biomarkers consisted of 5, 4 and 2 upregulated proteins which we deemed likely to be associated with IA, RA and PsA, respectively (Table 2).', metadata={'text': 'Tissue and fluid specificity of upregulated proteins were used to narrow down the list of candidates deemed likely to be associated with IA, RA and PsA.We concentrated on proteins which displayed strong abundance in SF, bone, bone marrow or immune regulatory cells according to ProteomicsDB.Immunoglobulins were excluded from further analysis.The resulting list of upregulated proteins compared to the reference group consisted of 38 IAspecific, 8 RA-specific and 9 PsA-specific unique protein candidates.High abundance proteins in serum, as identified according to the literature [21,22], were excluded due to the likelihood that they were serum contaminants at the time of joint aspiration.Moreover, this ensured protein candidates were unlikely to be overexpressed in the serum of non-diseased patients.Following this filtering step, the final list of candidate biomarkers consisted of 5, 4 and 2 upregulated proteins which we deemed likely to be associated with IA, RA and PsA, respectively (Table 2).', 'para': '6', 'bboxes': \"[[{'page': '6', 'x': '64.69', 'y': '220.57', 'h': '225.86', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '232.57', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '244.56', 'h': '160.38', 'w': '11.68'}], [{'page': '6', 'x': '220.15', 'y': '244.56', 'h': '70.39', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '256.56', 'h': '233.89', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '268.55', 'h': '233.87', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '280.55', 'h': '73.12', 'w': '11.68'}], [{'page': '6', 'x': '132.73', 'y': '280.55', 'h': '157.82', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '292.54', 'h': '65.60', 'w': '11.68'}], [{'page': '6', 'x': '124.62', 'y': '292.54', 'h': '165.94', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '304.54', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '316.53', 'h': '233.85', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '328.53', 'h': '46.03', 'w': '11.68'}], [{'page': '6', 'x': '106.09', 'y': '328.53', 'h': '184.46', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '340.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '352.58', 'h': '233.83', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '364.57', 'h': '131.17', 'w': '11.68'}], [{'page': '6', 'x': '192.23', 'y': '364.57', 'h': '98.33', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '376.57', 'h': '233.85', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '388.56', 'h': '147.52', 'w': '11.68'}], [{'page': '6', 'x': '207.03', 'y': '388.56', 'h': '83.52', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '400.58', 'h': '233.84', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '412.58', 'h': '233.89', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '424.57', 'h': '233.89', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '436.57', 'h': '38.23', 'w': '11.68'}]]\", 'pages': \"('6', '6')\", 'section_title': 'Dysregulated proteins in IA SF', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Differential abundance analyses were conducted to detect strongly dysregulated peptides in the SF of: (1) IA compared to control and (2) RA compared to PsA.For both comparisons, no peptides showed statistically significant differences in abundance after correcting for multiple hypothesis testing, with the exception of the peptide sequence DSGEGDFLAEGGGV when comparing IA to the control.Alternatively, with respect to the unadjusted p value, 11 peptides were determined to be significantly differentially abundant in IA SF with 10 peptides showing significant upregulation and 1 peptide showing significant downregulation (Table 3).A complete list of dysregulated peptides in IA compared to control and dysregulated peptides in RA compared to PsA are summarized in Additional file 1: Tables S7 andS8, respectively.', metadata={'text': 'Differential abundance analyses were conducted to detect strongly dysregulated peptides in the SF of: (1) IA compared to control and (2) RA compared to PsA.For both comparisons, no peptides showed statistically significant differences in abundance after correcting for multiple hypothesis testing, with the exception of the peptide sequence DSGEGDFLAEGGGV when comparing IA to the control.Alternatively, with respect to the unadjusted p value, 11 peptides were determined to be significantly differentially abundant in IA SF with 10 peptides showing significant upregulation and 1 peptide showing significant downregulation (Table 3).A complete list of dysregulated peptides in IA compared to control and dysregulated peptides in RA compared to PsA are summarized in Additional file 1: Tables S7 andS8, respectively.', 'para': '3', 'bboxes': \"[[{'page': '6', 'x': '56.69', 'y': '472.58', 'h': '233.87', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '484.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '496.58', 'h': '194.72', 'w': '11.68'}], [{'page': '6', 'x': '254.59', 'y': '496.58', 'h': '35.98', 'w': '11.68'}, {'page': '6', 'x': '56.69', 'y': '508.58', 'h': '233.85', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '88.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '100.57', 'h': '233.88', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '112.57', 'h': '233.87', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '124.56', 'h': '47.67', 'w': '11.68'}], [{'page': '6', 'x': '355.25', 'y': '124.56', 'h': '183.36', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '136.56', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '148.56', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '160.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '172.58', 'h': '147.67', 'w': '11.68'}], [{'page': '6', 'x': '457.70', 'y': '172.58', 'h': '80.90', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '184.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '196.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '208.58', 'h': '226.18', 'w': '11.68'}]]\", 'pages': \"('6', '6')\", 'section_title': 'Dysregulated peptides in IA SF', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='When comparing RA and PsA peptidomes, 5 peptides showed differential abundance between the two groups with all 5 peptides demonstrating significant upregulation in PsA SF relative to RA SF (Table 4).', metadata={'text': 'When comparing RA and PsA peptidomes, 5 peptides showed differential abundance between the two groups with all 5 peptides demonstrating significant upregulation in PsA SF relative to RA SF (Table 4).', 'para': '0', 'bboxes': \"[[{'page': '6', 'x': '312.72', 'y': '220.58', 'h': '225.84', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '232.57', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '244.57', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '256.58', 'h': '169.42', 'w': '11.68'}]]\", 'pages': \"('6', '6')\", 'section_title': 'Dysregulated peptides in IA SF', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='KEGG analysis revealed significantly enriched pathways (fold enrichment in brackets) related to the IA SF peptidome.Figure 3 illustrates the top KEGG pathways among which complement and coagulation cascades [23], Staphylococcus aureus infection [18], protein digestion and absorption [17] and extracellular matrix (ECM)-receptor interaction [14] were significantly enriched.', metadata={'text': 'KEGG analysis revealed significantly enriched pathways (fold enrichment in brackets) related to the IA SF peptidome.Figure 3 illustrates the top KEGG pathways among which complement and coagulation cascades [23], Staphylococcus aureus infection [18], protein digestion and absorption [17] and extracellular matrix (ECM)-receptor interaction [14] were significantly enriched.', 'para': '1', 'bboxes': \"[[{'page': '6', 'x': '304.72', 'y': '292.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '304.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '316.58', 'h': '25.34', 'w': '11.68'}], [{'page': '6', 'x': '332.34', 'y': '316.58', 'h': '206.22', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '328.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '340.58', 'h': '233.89', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '352.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '364.57', 'h': '177.38', 'w': '11.68'}]]\", 'pages': \"('6', '6')\", 'section_title': 'Pathway enrichment analysis of the SF peptidome', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Accumulating evidence suggests a crucial role of intestinal resident flora in chronic activation of innate and adaptive immune responses leading to inflammatory disorders.Microorganisms residing in the intestine play an important role in maintaining systemic homeostasis through the delicate balance of the immune system response.Perturbations in the composition of the intestinal microbiota have been shown to elicit inappropriate immune cell activation leading to an inflammatory cascade and eventually, clinical disease [24].Specifically, perturbations of the gut epithelial cell layer and/or increased exposure to microbial metabolites may be primary triggers of an inflammatory cascade resulting in joint pathology [25].Protective mechanisms, such as the expression of AMPs, have naturally developed to oppose microbial dysbiosis.AMPs are fundamental effectors of the innate immune response with a broad spectrum of microbicidal activity.Under inflammatory conditions, the synovial membrane has demonstrated an altered pattern of expression of AMPs relative to healthy controls and suggests a valuable role of these proteins in the differential diagnosis of inflammatory joint disease [23].', metadata={'text': 'Accumulating evidence suggests a crucial role of intestinal resident flora in chronic activation of innate and adaptive immune responses leading to inflammatory disorders.Microorganisms residing in the intestine play an important role in maintaining systemic homeostasis through the delicate balance of the immune system response.Perturbations in the composition of the intestinal microbiota have been shown to elicit inappropriate immune cell activation leading to an inflammatory cascade and eventually, clinical disease [24].Specifically, perturbations of the gut epithelial cell layer and/or increased exposure to microbial metabolites may be primary triggers of an inflammatory cascade resulting in joint pathology [25].Protective mechanisms, such as the expression of AMPs, have naturally developed to oppose microbial dysbiosis.AMPs are fundamental effectors of the innate immune response with a broad spectrum of microbicidal activity.Under inflammatory conditions, the synovial membrane has demonstrated an altered pattern of expression of AMPs relative to healthy controls and suggests a valuable role of these proteins in the differential diagnosis of inflammatory joint disease [23].', 'para': '6', 'bboxes': \"[[{'page': '6', 'x': '304.72', 'y': '400.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '412.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '424.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '436.57', 'h': '40.41', 'w': '11.68'}], [{'page': '6', 'x': '348.54', 'y': '436.57', 'h': '190.06', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '448.57', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '460.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '472.58', 'h': '38.56', 'w': '11.68'}], [{'page': '6', 'x': '346.55', 'y': '472.58', 'h': '192.03', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '484.58', 'h': '233.86', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '496.58', 'h': '233.85', 'w': '11.68'}, {'page': '6', 'x': '304.72', 'y': '508.58', 'h': '182.90', 'w': '11.68'}], [{'page': '6', 'x': '490.78', 'y': '508.58', 'h': '47.78', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '464.20', 'h': '233.86', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '476.19', 'h': '233.86', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '488.20', 'h': '233.85', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '500.19', 'h': '83.82', 'w': '11.68'}], [{'page': '7', 'x': '143.43', 'y': '500.19', 'h': '147.10', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '512.19', 'h': '233.85', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '524.18', 'h': '82.03', 'w': '11.68'}], [{'page': '7', 'x': '142.42', 'y': '524.18', 'h': '148.15', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '536.18', 'h': '233.84', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '548.17', 'h': '88.90', 'w': '11.68'}], [{'page': '7', 'x': '151.46', 'y': '548.17', 'h': '139.11', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '560.17', 'h': '233.86', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '572.20', 'h': '233.87', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '584.19', 'h': '233.86', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '596.20', 'h': '216.26', 'w': '11.68'}]]\", 'pages': \"('6', '7')\", 'section_title': 'Antimicrobial peptides in IA SF', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Putative AMPs in the SF peptidome of IA were predicted with the assistance of a SVM learning algorithm (Additional file 1: Table S9).Overall, 26 peptide sequences originating from 8 proteins (complement C4-A, fibrinogen beta chain, fibrinogen alpha chain, annexin A1, collagen type III alpha 1 chain, collagen type I alpha 1 chain, gliomedin and EMI domain-containing protein (1) were predicted to have antimicrobial activity with an SVM score of 0.8 or higher (Table 5).', metadata={'text': 'Putative AMPs in the SF peptidome of IA were predicted with the assistance of a SVM learning algorithm (Additional file 1: Table S9).Overall, 26 peptide sequences originating from 8 proteins (complement C4-A, fibrinogen beta chain, fibrinogen alpha chain, annexin A1, collagen type III alpha 1 chain, collagen type I alpha 1 chain, gliomedin and EMI domain-containing protein (1) were predicted to have antimicrobial activity with an SVM score of 0.8 or higher (Table 5).', 'para': '1', 'bboxes': \"[[{'page': '7', 'x': '64.69', 'y': '608.19', 'h': '225.86', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '620.20', 'h': '233.86', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '632.20', 'h': '148.12', 'w': '11.68'}], [{'page': '7', 'x': '209.01', 'y': '632.20', 'h': '81.54', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '644.19', 'h': '233.88', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '656.19', 'h': '233.88', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '668.18', 'h': '233.90', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '680.18', 'h': '233.86', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '692.17', 'h': '233.88', 'w': '11.68'}, {'page': '7', 'x': '56.69', 'y': '704.17', 'h': '182.79', 'w': '11.68'}]]\", 'pages': \"('7', '7')\", 'section_title': 'Antimicrobial peptides in IA SF', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='In the current study, a comparative MS-based approach coupled to statistical and bioinformatics analyses was performed on IA SF relative to control SF, and RA SF relative to PsA SF, to detect notable differences in both the proteomic and peptidomic data.Studies using an MS-based approach to evaluate the proteome of similar inflammatory diseases, including psoriasis [26], systemic lupus erythematosus [27], and ankylosing spondylitis [28], corroborate the robustness of such analytical methodologies.The investigation of a proximal joint fluid, such as SF, was preferred since its protein and peptide expression patterns are expected to be reflective of the pathophysiological state of the joint.As such, elucidating the SF proteome and peptidome during the progression of IA can provide novel insights into molecular drivers of the disease.', metadata={'text': 'In the current study, a comparative MS-based approach coupled to statistical and bioinformatics analyses was performed on IA SF relative to control SF, and RA SF relative to PsA SF, to detect notable differences in both the proteomic and peptidomic data.Studies using an MS-based approach to evaluate the proteome of similar inflammatory diseases, including psoriasis [26], systemic lupus erythematosus [27], and ankylosing spondylitis [28], corroborate the robustness of such analytical methodologies.The investigation of a proximal joint fluid, such as SF, was preferred since its protein and peptide expression patterns are expected to be reflective of the pathophysiological state of the joint.As such, elucidating the SF proteome and peptidome during the progression of IA can provide novel insights into molecular drivers of the disease.', 'para': '3', 'bboxes': \"[[{'page': '7', 'x': '304.72', 'y': '476.68', 'h': '233.86', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '488.68', 'h': '233.84', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '500.67', 'h': '233.87', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '512.67', 'h': '233.84', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '524.66', 'h': '157.29', 'w': '11.68'}], [{'page': '7', 'x': '467.01', 'y': '524.66', 'h': '71.58', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '536.66', 'h': '233.84', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '548.65', 'h': '233.87', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '560.65', 'h': '233.88', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '572.64', 'h': '233.86', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '584.68', 'h': '41.68', 'w': '11.68'}], [{'page': '7', 'x': '351.47', 'y': '584.68', 'h': '187.12', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '596.68', 'h': '233.83', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '608.67', 'h': '233.88', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '620.67', 'h': '148.87', 'w': '11.68'}], [{'page': '7', 'x': '456.20', 'y': '620.67', 'h': '82.36', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '632.66', 'h': '233.85', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '644.66', 'h': '233.89', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '656.65', 'h': '46.82', 'w': '11.68'}]]\", 'pages': \"('7', '7')\", 'section_title': 'Discussion', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='The molecular pathways involved in the pathogenesis of IA are also overrepresented in the current study based on functional network analysis of IA SF proteins and peptides.Prominent mechanisms related to the identification of upregulated proteins include: (1) acute phase response  signaling, (2) antimicrobial response, (3) inflammatory response, (4) IL-12 signaling and production in macrophages and ( 5) cell-to-cell signaling and interaction.Similarly, interaction networks were established through pathway enrichment analysis of IA SF peptides.Of interest was the enrichment of Staphylococcus aureus infection.As previously highlighted, correlative studies are beginning to recognize a fundamental interplay between the microbiome and immune system response in the etiology of IA [29,30].Although the role of S. aureus in the progression of IA has yet to be clarified, the enrichment of this pathway, as reflected by the peptides identified in our study, reinforces this developing hypothesis.Overall, our analyses resulted in the identification of 144 differentially expressed proteins in the IA SF proteome.Comparison of RA SF to PsA SF identified 22 differentially expressed proteins.Since we are interested in identifying putative markers which can be further validated in patient serum, we decided to focus solely on upregulated proteins in each arthritic condition.Highpotential candidate biomarkers were selected on the basis of several molecular features including: differential abundance, fluid and tissue specificity, immunoglobulin status and abundance in the plasma proteome.Our list of dysregulated proteins in IA was reduced to a total of 5 promising protein candidates representative of intrinsic joint structures including the articular cartilage, synovial membrane and synoviocytes.The re-discovery of several upregulated proteins which have been previously implicated in IA, such as CD5 molecule-like (CD5L), matrix metalloproteinase (MMP)-3, defensin alpha 3 (DEFA3), S100 calcium-binding protein (S100) A8, and A9, provided an internal validation of our analytical proteomic approach [31,32].The application of similar, stringent filtering criteria on protein candidates of RA and PsA resulted in 4 RA-specific and 2 PsA-specific promising protein candidates.', metadata={'text': 'The molecular pathways involved in the pathogenesis of IA are also overrepresented in the current study based on functional network analysis of IA SF proteins and peptides.Prominent mechanisms related to the identification of upregulated proteins include: (1) acute phase response  signaling, (2) antimicrobial response, (3) inflammatory response, (4) IL-12 signaling and production in macrophages and ( 5) cell-to-cell signaling and interaction.Similarly, interaction networks were established through pathway enrichment analysis of IA SF peptides.Of interest was the enrichment of Staphylococcus aureus infection.As previously highlighted, correlative studies are beginning to recognize a fundamental interplay between the microbiome and immune system response in the etiology of IA [29,30].Although the role of S. aureus in the progression of IA has yet to be clarified, the enrichment of this pathway, as reflected by the peptides identified in our study, reinforces this developing hypothesis.Overall, our analyses resulted in the identification of 144 differentially expressed proteins in the IA SF proteome.Comparison of RA SF to PsA SF identified 22 differentially expressed proteins.Since we are interested in identifying putative markers which can be further validated in patient serum, we decided to focus solely on upregulated proteins in each arthritic condition.Highpotential candidate biomarkers were selected on the basis of several molecular features including: differential abundance, fluid and tissue specificity, immunoglobulin status and abundance in the plasma proteome.Our list of dysregulated proteins in IA was reduced to a total of 5 promising protein candidates representative of intrinsic joint structures including the articular cartilage, synovial membrane and synoviocytes.The re-discovery of several upregulated proteins which have been previously implicated in IA, such as CD5 molecule-like (CD5L), matrix metalloproteinase (MMP)-3, defensin alpha 3 (DEFA3), S100 calcium-binding protein (S100) A8, and A9, provided an internal validation of our analytical proteomic approach [31,32].The application of similar, stringent filtering criteria on protein candidates of RA and PsA resulted in 4 RA-specific and 2 PsA-specific promising protein candidates.', 'para': '12', 'bboxes': \"[[{'page': '7', 'x': '312.72', 'y': '668.65', 'h': '225.88', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '680.64', 'h': '233.89', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '692.64', 'h': '233.86', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '704.68', 'h': '21.95', 'w': '11.68'}], [{'page': '7', 'x': '328.87', 'y': '704.68', 'h': '209.69', 'w': '11.68'}, {'page': '7', 'x': '304.72', 'y': '716.68', 'h': '233.84', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '388.74', 'h': '233.85', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '400.73', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '412.74', 'h': '233.85', 'w': '11.68'}], [{'page': '8', 'x': '56.69', 'y': '424.73', 'h': '233.85', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '436.73', 'h': '195.15', 'w': '11.68'}], [{'page': '8', 'x': '254.51', 'y': '436.73', 'h': '36.04', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '448.74', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '460.74', 'h': '18.72', 'w': '11.68'}], [{'page': '8', 'x': '80.23', 'y': '460.74', 'h': '210.33', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '472.73', 'h': '233.85', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '484.73', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '496.74', 'h': '81.73', 'w': '11.68'}], [{'page': '8', 'x': '141.02', 'y': '496.74', 'h': '149.54', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '508.73', 'h': '233.83', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '520.73', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '532.73', 'h': '196.19', 'w': '11.68'}], [{'page': '8', 'x': '64.69', 'y': '544.72', 'h': '225.89', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '556.72', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '568.74', 'h': '27.84', 'w': '11.68'}], [{'page': '8', 'x': '89.00', 'y': '568.74', 'h': '201.55', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '580.73', 'h': '134.25', 'w': '11.68'}], [{'page': '8', 'x': '194.19', 'y': '580.73', 'h': '96.38', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '592.73', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '604.73', 'h': '233.83', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '616.72', 'h': '205.76', 'w': '11.68'}], [{'page': '8', 'x': '266.65', 'y': '616.72', 'h': '23.93', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '628.72', 'h': '233.88', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '640.71', 'h': '233.84', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '652.71', 'h': '233.88', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '664.70', 'h': '197.39', 'w': '11.68'}], [{'page': '8', 'x': '257.74', 'y': '664.70', 'h': '32.83', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '676.70', 'h': '233.87', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '688.69', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '700.69', 'h': '233.87', 'w': '11.68'}, {'page': '8', 'x': '56.69', 'y': '712.68', 'h': '120.05', 'w': '11.68'}], [{'page': '8', 'x': '179.60', 'y': '712.68', 'h': '110.97', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '388.68', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '400.74', 'h': '233.88', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '412.73', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '424.73', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '436.74', 'h': '233.85', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '448.73', 'h': '77.16', 'w': '11.68'}], [{'page': '8', 'x': '386.15', 'y': '448.73', 'h': '152.42', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '460.73', 'h': '233.85', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '472.73', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '484.72', 'h': '78.17', 'w': '11.68'}]]\", 'pages': \"('7', '8')\", 'section_title': 'Discussion', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Our analytical approach also yielded the discovery of novel putative biomarkers which, to our knowledge, have yet to be described in the context of IA.This includes the identification of alpha-ketoglutarate-dependent dioxygenase (FTO), family with sequence similarity 21 member C (FAM21C; more commonly known as WASH complex subunit 2C, WASHC2C) and T-box transcription factor (TBX3).Of these candidates, only TBX3 has been previously observed in IA at the genetic level [33].A genome-wide association study (GWAS) identified the single nucleotide polymorphism (SNP), rs12579024, located nearest the TBX3 gene, to be strongly associated with RA in a Korean population (p value < 0.0001).The functional roles of TBX3 have, thus far, been primarily described in relation to the morphogenesis of limbs and organs [34] as well as oncogenic processes [35].A recent study by Willmer et al. [36] attempted to delineate the molecular mechanisms driven by TBX3 and identified cyclin-dependent kinase inhibitor p21 WAF1 (p21), a key mediator of cell cycle arrest, to be a primary repressed target of TBX3.Interestingly, p21 has also been implicated in the regulation of proinflammatory cytokines and MMP production in synovial fibroblasts, both of which greatly promote inflammation and joint destruction during the pathogenesis of RA [37].Isolated RA synovial fibroblasts have shown reduced expression of p21 relative to osteoarthritis (OA) synovial fibroblasts and adenovirus-mediated delivery of p21 suppresses the spontaneous production of IL-6 and MMP1 in RA synovial fibroblasts.In support of this, p21 -/-mice maintain an enhanced experimental IA with markedly increased numbers of macrophages and articular destruction [38].This phenotype is resolved, however, with the administration of a p21-peptide mimetic.When taken with our own findings, it is conceivable that the upregulation of TBX3 in the synovial joint may lead to reduced p21 expression in synovial fibroblasts and promotes the proinflammatory state distinctive of IA pathogenesis.These findings corroborate with our hypothesis that delineating the IA proteome may highlight underlying mechanisms related to the progression of inflammatory arthritic disease and serve as novel targets for screening and therapeutic purposes.', metadata={'text': 'Our analytical approach also yielded the discovery of novel putative biomarkers which, to our knowledge, have yet to be described in the context of IA.This includes the identification of alpha-ketoglutarate-dependent dioxygenase (FTO), family with sequence similarity 21 member C (FAM21C; more commonly known as WASH complex subunit 2C, WASHC2C) and T-box transcription factor (TBX3).Of these candidates, only TBX3 has been previously observed in IA at the genetic level [33].A genome-wide association study (GWAS) identified the single nucleotide polymorphism (SNP), rs12579024, located nearest the TBX3 gene, to be strongly associated with RA in a Korean population (p value < 0.0001).The functional roles of TBX3 have, thus far, been primarily described in relation to the morphogenesis of limbs and organs [34] as well as oncogenic processes [35].A recent study by Willmer et al. [36] attempted to delineate the molecular mechanisms driven by TBX3 and identified cyclin-dependent kinase inhibitor p21 WAF1 (p21), a key mediator of cell cycle arrest, to be a primary repressed target of TBX3.Interestingly, p21 has also been implicated in the regulation of proinflammatory cytokines and MMP production in synovial fibroblasts, both of which greatly promote inflammation and joint destruction during the pathogenesis of RA [37].Isolated RA synovial fibroblasts have shown reduced expression of p21 relative to osteoarthritis (OA) synovial fibroblasts and adenovirus-mediated delivery of p21 suppresses the spontaneous production of IL-6 and MMP1 in RA synovial fibroblasts.In support of this, p21 -/-mice maintain an enhanced experimental IA with markedly increased numbers of macrophages and articular destruction [38].This phenotype is resolved, however, with the administration of a p21-peptide mimetic.When taken with our own findings, it is conceivable that the upregulation of TBX3 in the synovial joint may lead to reduced p21 expression in synovial fibroblasts and promotes the proinflammatory state distinctive of IA pathogenesis.These findings corroborate with our hypothesis that delineating the IA proteome may highlight underlying mechanisms related to the progression of inflammatory arthritic disease and serve as novel targets for screening and therapeutic purposes.', 'para': '11', 'bboxes': \"[[{'page': '8', 'x': '312.72', 'y': '496.72', 'h': '225.88', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '508.71', 'h': '233.87', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '520.71', 'h': '162.16', 'w': '11.68'}], [{'page': '8', 'x': '469.31', 'y': '520.71', 'h': '69.28', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '532.70', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '544.74', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '556.74', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '568.73', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '580.74', 'h': '80.16', 'w': '11.68'}], [{'page': '8', 'x': '388.23', 'y': '580.74', 'h': '150.34', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '592.73', 'h': '233.89', 'w': '11.68'}], [{'page': '8', 'x': '304.72', 'y': '604.73', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '616.73', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '628.72', 'h': '233.87', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '640.72', 'h': '214.72', 'w': '11.68'}], [{'page': '8', 'x': '523.19', 'y': '640.72', 'h': '15.40', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '652.71', 'h': '233.83', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '664.71', 'h': '233.86', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '676.70', 'h': '195.59', 'w': '11.68'}], [{'page': '8', 'x': '503.10', 'y': '676.70', 'h': '35.46', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '688.70', 'h': '233.87', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '700.69', 'h': '233.87', 'w': '11.68'}, {'page': '8', 'x': '304.72', 'y': '712.69', 'h': '160.75', 'w': '11.68'}, {'page': '8', 'x': '465.47', 'y': '710.76', 'h': '18.48', 'w': '8.18'}, {'page': '8', 'x': '488.11', 'y': '712.74', 'h': '50.48', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '463.49', 'h': '233.86', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '475.48', 'h': '66.52', 'w': '11.68'}], [{'page': '9', 'x': '127.31', 'y': '475.48', 'h': '163.24', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '487.49', 'h': '233.87', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '499.48', 'h': '233.87', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '511.48', 'h': '233.86', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '523.49', 'h': '141.21', 'w': '11.68'}], [{'page': '9', 'x': '202.41', 'y': '523.49', 'h': '88.16', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '535.48', 'h': '233.89', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '547.48', 'h': '233.86', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '559.49', 'h': '233.87', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '571.48', 'h': '233.86', 'w': '11.68'}], [{'page': '9', 'x': '56.69', 'y': '581.23', 'h': '233.86', 'w': '13.95'}, {'page': '9', 'x': '56.70', 'y': '595.48', 'h': '233.87', 'w': '11.68'}, {'page': '9', 'x': '56.70', 'y': '607.48', 'h': '188.38', 'w': '11.68'}], [{'page': '9', 'x': '249.74', 'y': '607.48', 'h': '40.81', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '619.49', 'h': '233.84', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '631.48', 'h': '96.86', 'w': '11.68'}], [{'page': '9', 'x': '156.77', 'y': '631.48', 'h': '133.78', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '643.49', 'h': '233.87', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '655.48', 'h': '233.83', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '667.48', 'h': '233.86', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '679.49', 'h': '170.17', 'w': '11.68'}], [{'page': '9', 'x': '230.41', 'y': '679.49', 'h': '60.14', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '691.48', 'h': '233.86', 'w': '11.68'}, {'page': '9', 'x': '56.69', 'y': '703.48', 'h': '233.85', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '463.48', 'h': '233.87', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '475.47', 'h': '233.85', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '487.47', 'h': '39.73', 'w': '11.68'}]]\", 'pages': \"('8', '9')\", 'section_title': 'Discussion', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Comparisons of RA and PsA revealed high-priority protein candidates specific to each disease.In RA SF, coagulation factor XII, SPARC-like protein 1, Rab GDP dissociation inhibitor beta and immunoglobulin gamma Fc region receptor III-A (FCGR3A) were notably upregulated; of which, activating FCGR3A has demonstrated important roles in sustaining the inflammatory response through the secretion of cytokines and proteases from the immune cell on which it is expressed [39].Likewise, allelic studies have demonstrated SNPs that may serve as susceptibility markers for RA [40].Taken together, the therapeutic targeting of FCGR3A may facilitate future management of RA.', metadata={'text': 'Comparisons of RA and PsA revealed high-priority protein candidates specific to each disease.In RA SF, coagulation factor XII, SPARC-like protein 1, Rab GDP dissociation inhibitor beta and immunoglobulin gamma Fc region receptor III-A (FCGR3A) were notably upregulated; of which, activating FCGR3A has demonstrated important roles in sustaining the inflammatory response through the secretion of cytokines and proteases from the immune cell on which it is expressed [39].Likewise, allelic studies have demonstrated SNPs that may serve as susceptibility markers for RA [40].Taken together, the therapeutic targeting of FCGR3A may facilitate future management of RA.', 'para': '3', 'bboxes': \"[[{'page': '9', 'x': '312.72', 'y': '499.46', 'h': '225.87', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '511.46', 'h': '186.19', 'w': '11.68'}], [{'page': '9', 'x': '495.60', 'y': '511.46', 'h': '42.98', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '523.45', 'h': '233.87', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '535.45', 'h': '233.84', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '547.44', 'h': '233.86', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '559.49', 'h': '233.84', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '571.48', 'h': '233.87', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '583.48', 'h': '233.88', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '595.48', 'h': '192.75', 'w': '11.68'}], [{'page': '9', 'x': '500.64', 'y': '595.48', 'h': '37.95', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '607.47', 'h': '233.87', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '619.47', 'h': '147.45', 'w': '11.68'}], [{'page': '9', 'x': '456.38', 'y': '619.47', 'h': '82.23', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '631.46', 'h': '233.85', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '643.46', 'h': '81.73', 'w': '11.68'}]]\", 'pages': \"('9', '9')\", 'section_title': 'Discussion', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content=\"Of the two PsA-specific protein candidates we identified, periostin (POSTN) has been previously investigated in our studies of the PsA tissue proteome as a potential serum marker of PsA [41].Although serum validation of POSTN did not reveal statistically significant differences  between PsA and control serum, its elevated levels in both PsA lesional skin as well as SF alludes to an important role of the protein in the pathobiology of PsA and may serve as part of a panel of biomarkers to differentiate between the onset of PsA and RA.Differential abundance analyses of peptide sequences identified 11 peptides to be significantly dysregulated in IA SF compared to the control group.Upregulated peptide sequences were primarily derived from FGA while single sequences originated from collagen type I alpha 1 (COL1A1) and coiled-coil serine rich protein 2 (CCSER2).All significant FGA-derived peptide fragments were representative of the 16-amino acid residue (ADSGEGDFLAEGGGVR) of fibrinopeptide A (FpA) located at the NH 2 -terminal end of FGA.The lack of detection of the full-length FpA peptide sequence in IA SF can be rationalized by the peptide's short half-life of 3-5 min in the blood plasma [42].FpA is a cleavage product of thrombin-induced conversion of fibrinogen into a fibrin clot.Fibrin deposition in the SF or on the synovial membrane is a consistent feature of IA and is believed to perpetuate inflammation and joint tissue destruction through synovial cell activation [43,44].Liu et al. demonstrated that stimulation of synovial fibroblasts with fibrin(ogen) resulted in the upregulated expression of IL-8 and intercellular adhesion molecule 1 (ICAM-1) for the recruitment and retention, respectively, of lymphocytes within the arthritic joint [43].Elevated abundance of FGA and FpA in serum has been observed in patients with inflammation-associated diseases including systemic lupus erythematosus, Crohn's disease, ischemic heart disease and gastric cancer [45][46][47][48].These findings highlight the non-specific indication of inflammation by FpA and its associated peptide fragments, and advocates for its utility as a sensitivity index of disease activity in patients with IA.Moreover, targeting FGA in the synovial joint may be a necessary therapeutic intervention to modulate the inflammatory response.Comparisons of peptide abundance between RA and PsA identified FGA and FGB-related peptide sequences to be consistently downregulated in RA relative to PsA.Although this may be indicative of a discriminatory ability for FGA and FGB peptide fragments to differentiate between the onset of RA and PsA, this outcome does not corroborate with the finding that RA patients are at a greater increased risk of venous thromboembolism relative to PsA patients [49].Targeted quantification in a second set of SF samples is necessary to verify this finding.\", metadata={'text': \"Of the two PsA-specific protein candidates we identified, periostin (POSTN) has been previously investigated in our studies of the PsA tissue proteome as a potential serum marker of PsA [41].Although serum validation of POSTN did not reveal statistically significant differences  between PsA and control serum, its elevated levels in both PsA lesional skin as well as SF alludes to an important role of the protein in the pathobiology of PsA and may serve as part of a panel of biomarkers to differentiate between the onset of PsA and RA.Differential abundance analyses of peptide sequences identified 11 peptides to be significantly dysregulated in IA SF compared to the control group.Upregulated peptide sequences were primarily derived from FGA while single sequences originated from collagen type I alpha 1 (COL1A1) and coiled-coil serine rich protein 2 (CCSER2).All significant FGA-derived peptide fragments were representative of the 16-amino acid residue (ADSGEGDFLAEGGGVR) of fibrinopeptide A (FpA) located at the NH 2 -terminal end of FGA.The lack of detection of the full-length FpA peptide sequence in IA SF can be rationalized by the peptide's short half-life of 3-5 min in the blood plasma [42].FpA is a cleavage product of thrombin-induced conversion of fibrinogen into a fibrin clot.Fibrin deposition in the SF or on the synovial membrane is a consistent feature of IA and is believed to perpetuate inflammation and joint tissue destruction through synovial cell activation [43,44].Liu et al. demonstrated that stimulation of synovial fibroblasts with fibrin(ogen) resulted in the upregulated expression of IL-8 and intercellular adhesion molecule 1 (ICAM-1) for the recruitment and retention, respectively, of lymphocytes within the arthritic joint [43].Elevated abundance of FGA and FpA in serum has been observed in patients with inflammation-associated diseases including systemic lupus erythematosus, Crohn's disease, ischemic heart disease and gastric cancer [45][46][47][48].These findings highlight the non-specific indication of inflammation by FpA and its associated peptide fragments, and advocates for its utility as a sensitivity index of disease activity in patients with IA.Moreover, targeting FGA in the synovial joint may be a necessary therapeutic intervention to modulate the inflammatory response.Comparisons of peptide abundance between RA and PsA identified FGA and FGB-related peptide sequences to be consistently downregulated in RA relative to PsA.Although this may be indicative of a discriminatory ability for FGA and FGB peptide fragments to differentiate between the onset of RA and PsA, this outcome does not corroborate with the finding that RA patients are at a greater increased risk of venous thromboembolism relative to PsA patients [49].Targeted quantification in a second set of SF samples is necessary to verify this finding.\", 'para': '14', 'bboxes': \"[[{'page': '9', 'x': '312.72', 'y': '655.45', 'h': '225.86', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '667.49', 'h': '233.85', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '679.48', 'h': '233.85', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '691.48', 'h': '109.36', 'w': '11.68'}], [{'page': '9', 'x': '416.94', 'y': '691.48', 'h': '121.62', 'w': '11.68'}, {'page': '9', 'x': '304.72', 'y': '703.48', 'h': '233.88', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '88.58', 'h': '233.83', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '100.58', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '112.58', 'h': '233.87', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '124.58', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '136.57', 'h': '139.01', 'w': '11.68'}], [{'page': '10', 'x': '64.69', 'y': '148.57', 'h': '225.85', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '160.56', 'h': '233.87', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '172.56', 'h': '178.64', 'w': '11.68'}], [{'page': '10', 'x': '239.84', 'y': '172.56', 'h': '50.72', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '184.55', 'h': '233.85', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '196.55', 'h': '233.84', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '208.54', 'h': '233.87', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '220.54', 'h': '53.19', 'w': '11.68'}], [{'page': '10', 'x': '114.36', 'y': '220.54', 'h': '176.20', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '232.58', 'h': '233.88', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '244.58', 'h': '233.84', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '256.57', 'h': '79.28', 'w': '11.68'}, {'page': '10', 'x': '135.97', 'y': '260.97', 'h': '3.44', 'w': '8.18'}, {'page': '10', 'x': '139.41', 'y': '256.58', 'h': '97.59', 'w': '11.68'}], [{'page': '10', 'x': '241.58', 'y': '256.58', 'h': '48.99', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '268.58', 'h': '233.87', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '280.57', 'h': '233.85', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '292.57', 'h': '137.51', 'w': '11.68'}], [{'page': '10', 'x': '196.42', 'y': '292.57', 'h': '94.13', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '304.58', 'h': '233.85', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '316.58', 'h': '43.08', 'w': '11.68'}], [{'page': '10', 'x': '102.60', 'y': '316.58', 'h': '187.95', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '328.57', 'h': '233.87', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '340.57', 'h': '233.85', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '352.56', 'h': '168.11', 'w': '11.68'}], [{'page': '10', 'x': '228.21', 'y': '352.56', 'h': '62.34', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '364.58', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '376.58', 'h': '233.85', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '388.57', 'h': '233.84', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '400.57', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '412.58', 'h': '147.99', 'w': '11.68'}], [{'page': '10', 'x': '208.01', 'y': '412.58', 'h': '82.54', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '424.58', 'h': '233.88', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '436.57', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '448.58', 'h': '233.85', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '460.58', 'h': '170.75', 'w': '11.68'}], [{'page': '10', 'x': '230.70', 'y': '460.58', 'h': '59.85', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '472.57', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '484.57', 'h': '233.84', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '496.56', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '508.56', 'h': '70.91', 'w': '11.68'}], [{'page': '10', 'x': '131.12', 'y': '508.56', 'h': '159.43', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '520.58', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '532.58', 'h': '161.14', 'w': '11.68'}], [{'page': '10', 'x': '222.86', 'y': '532.58', 'h': '67.69', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '544.57', 'h': '233.88', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '556.57', 'h': '233.87', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '568.56', 'h': '154.32', 'w': '11.68'}], [{'page': '10', 'x': '213.92', 'y': '568.56', 'h': '76.65', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '580.56', 'h': '233.84', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '592.55', 'h': '233.85', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '604.55', 'h': '233.88', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '616.54', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '628.54', 'h': '233.87', 'w': '11.68'}], [{'page': '10', 'x': '56.69', 'y': '640.53', 'h': '233.85', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '652.53', 'h': '126.90', 'w': '11.68'}]]\", 'pages': \"('9', '10')\", 'section_title': 'Discussion', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='The advent of high-throughput microbial DNA sequencing has marked a renewed interest in the complex interplay of the intestinal microbiome and inflammatory diseases.Studies suggest that the induction of autoimmunity is closely linked to intestinal dysbiosis and leads to distal synovitis and joint pathology [50].There exist several protective mechanisms to prevent changes in the gut microbiota including the physicochemical barrier of antimicrobial proteins and peptides.AMPs are a collective of naturally-occurring, cationic peptides released by lymphocytes of the innate immune system.Of the 26 peptides predicted to have antimicrobial activity, 13 of them originated from FGA or FGB precursor proteins.Despite the pro-inflammatory impression associated with the accumulation of FGA and FGB in the SF, their presence may be critical to the activation of microbicidal activity.Soluble fibrinogen and fibrin matrices have demonstrated antimicrobial host defense through their ability to physically entrap bacteria in addition to the recruitment and engagement of host immune cells which in turn, facilitate the removal of invading pathogens [51].Taken together, the deposition of fibrin during the progression of IA may initially serve the favourable purpose of limiting bacterial infection through the activation of antimicrobial host defense mechanisms.However, its added role in the recruitment and activation of leukocytes may exacerbate synovial joint inflammation thereby fueling joint disease.', metadata={'text': 'The advent of high-throughput microbial DNA sequencing has marked a renewed interest in the complex interplay of the intestinal microbiome and inflammatory diseases.Studies suggest that the induction of autoimmunity is closely linked to intestinal dysbiosis and leads to distal synovitis and joint pathology [50].There exist several protective mechanisms to prevent changes in the gut microbiota including the physicochemical barrier of antimicrobial proteins and peptides.AMPs are a collective of naturally-occurring, cationic peptides released by lymphocytes of the innate immune system.Of the 26 peptides predicted to have antimicrobial activity, 13 of them originated from FGA or FGB precursor proteins.Despite the pro-inflammatory impression associated with the accumulation of FGA and FGB in the SF, their presence may be critical to the activation of microbicidal activity.Soluble fibrinogen and fibrin matrices have demonstrated antimicrobial host defense through their ability to physically entrap bacteria in addition to the recruitment and engagement of host immune cells which in turn, facilitate the removal of invading pathogens [51].Taken together, the deposition of fibrin during the progression of IA may initially serve the favourable purpose of limiting bacterial infection through the activation of antimicrobial host defense mechanisms.However, its added role in the recruitment and activation of leukocytes may exacerbate synovial joint inflammation thereby fueling joint disease.', 'para': '8', 'bboxes': \"[[{'page': '10', 'x': '64.69', 'y': '664.52', 'h': '225.87', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '676.52', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '688.58', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '700.58', 'h': '69.66', 'w': '11.68'}], [{'page': '10', 'x': '130.92', 'y': '700.58', 'h': '159.66', 'w': '11.68'}, {'page': '10', 'x': '56.69', 'y': '712.58', 'h': '233.88', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '88.58', 'h': '206.58', 'w': '11.68'}], [{'page': '10', 'x': '515.04', 'y': '88.58', 'h': '23.54', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '100.58', 'h': '233.85', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '112.57', 'h': '233.88', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '124.57', 'h': '203.03', 'w': '11.68'}], [{'page': '10', 'x': '512.97', 'y': '124.57', 'h': '25.59', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '136.56', 'h': '233.84', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '148.56', 'h': '233.88', 'w': '11.68'}], [{'page': '10', 'x': '304.72', 'y': '160.55', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '172.58', 'h': '233.83', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '184.58', 'h': '36.11', 'w': '11.68'}], [{'page': '10', 'x': '343.64', 'y': '184.58', 'h': '194.94', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '196.58', 'h': '233.89', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '208.58', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '220.58', 'h': '63.80', 'w': '11.68'}], [{'page': '10', 'x': '373.48', 'y': '220.58', 'h': '165.11', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '232.58', 'h': '233.84', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '244.57', 'h': '233.88', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '256.57', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '268.56', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '280.58', 'h': '39.63', 'w': '11.68'}], [{'page': '10', 'x': '347.11', 'y': '280.58', 'h': '191.44', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '292.58', 'h': '233.87', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '304.57', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '316.58', 'h': '207.20', 'w': '11.68'}], [{'page': '10', 'x': '515.29', 'y': '316.58', 'h': '23.29', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '328.58', 'h': '233.88', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '340.58', 'h': '233.84', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '352.57', 'h': '116.17', 'w': '11.68'}]]\", 'pages': \"('10', '10')\", 'section_title': 'Discussion', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Though these findings are limited by lack of verification in a subsequent set of SF samples, the identification of IA-specific candidates using a label-free, MS-based approach has shown biological relevance and prospective utility for clinical applications.Future follow-up studies will address verification and validation efforts of selected protein and peptide candidates in a new set of SF and serum samples, respectively.We do acknowledge the limitation of sex discrepancy amongst the IA SF samples in our study which may have influenced the proteins and peptides identified.However, to compensate for this discrepancy between each subtype of IA, our control group consisted of an equal number of male and female SF samples.Moreover, we tested the influence of both sex and age on our data using a linear model and found there to be no effect by either predictor.', metadata={'text': 'Though these findings are limited by lack of verification in a subsequent set of SF samples, the identification of IA-specific candidates using a label-free, MS-based approach has shown biological relevance and prospective utility for clinical applications.Future follow-up studies will address verification and validation efforts of selected protein and peptide candidates in a new set of SF and serum samples, respectively.We do acknowledge the limitation of sex discrepancy amongst the IA SF samples in our study which may have influenced the proteins and peptides identified.However, to compensate for this discrepancy between each subtype of IA, our control group consisted of an equal number of male and female SF samples.Moreover, we tested the influence of both sex and age on our data using a linear model and found there to be no effect by either predictor.', 'para': '4', 'bboxes': \"[[{'page': '10', 'x': '312.72', 'y': '364.57', 'h': '225.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '376.58', 'h': '233.87', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '388.58', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '400.57', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '412.58', 'h': '155.45', 'w': '11.68'}], [{'page': '10', 'x': '466.10', 'y': '412.58', 'h': '72.46', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '424.58', 'h': '233.85', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '436.57', 'h': '233.84', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '448.57', 'h': '133.64', 'w': '11.68'}], [{'page': '10', 'x': '440.89', 'y': '448.57', 'h': '97.72', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '460.56', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '472.56', 'h': '233.85', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '484.55', 'h': '78.65', 'w': '11.68'}], [{'page': '10', 'x': '386.26', 'y': '484.55', 'h': '152.32', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '496.58', 'h': '233.87', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '508.58', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '520.58', 'h': '18.62', 'w': '11.68'}], [{'page': '10', 'x': '326.83', 'y': '520.58', 'h': '211.73', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '532.58', 'h': '233.90', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '544.57', 'h': '127.38', 'w': '11.68'}]]\", 'pages': \"('10', '10')\", 'section_title': 'Discussion', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='A technical limitation of this study includes the lack of fractionation of digested proteins and peptides which likely contributed to the low fold change ratios of our candidate biomarkers.Pre-fractionation methods are important for reducing the complexity of biological fluids and tissues.The proteomic profile of IA SF is markedly shifted compared to healthy SF with a greater concentration of pro-inflammatory cytokines, immunoglobulins, matrix-degrading enzymes and acute-phase markers.The dynamic range between proteins in diseased SF can vary by a factor of 10 10 [10] and the likelihood, therefore, of masking potentially clinically-relevant proteins within the low-abundance proteome increases and may be exacerbated by analysis of unfractionated biological samples.However, improving accessibility to low-concentration proteins comes at the cost of longer analysis times and lower reproducibility [52].Moreover, fractionation technologies have previously failed to significantly extend the sampling of the proteome relative to the unfractionated proteome [53].', metadata={'text': 'A technical limitation of this study includes the lack of fractionation of digested proteins and peptides which likely contributed to the low fold change ratios of our candidate biomarkers.Pre-fractionation methods are important for reducing the complexity of biological fluids and tissues.The proteomic profile of IA SF is markedly shifted compared to healthy SF with a greater concentration of pro-inflammatory cytokines, immunoglobulins, matrix-degrading enzymes and acute-phase markers.The dynamic range between proteins in diseased SF can vary by a factor of 10 10 [10] and the likelihood, therefore, of masking potentially clinically-relevant proteins within the low-abundance proteome increases and may be exacerbated by analysis of unfractionated biological samples.However, improving accessibility to low-concentration proteins comes at the cost of longer analysis times and lower reproducibility [52].Moreover, fractionation technologies have previously failed to significantly extend the sampling of the proteome relative to the unfractionated proteome [53].', 'para': '5', 'bboxes': \"[[{'page': '10', 'x': '312.72', 'y': '556.57', 'h': '225.88', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '568.56', 'h': '233.87', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '580.56', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '592.55', 'h': '95.03', 'w': '11.68'}], [{'page': '10', 'x': '406.46', 'y': '592.55', 'h': '132.13', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '604.55', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '616.54', 'h': '48.33', 'w': '11.68'}], [{'page': '10', 'x': '356.55', 'y': '616.54', 'h': '182.01', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '628.54', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '640.58', 'h': '233.87', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '652.58', 'h': '216.25', 'w': '11.68'}], [{'page': '10', 'x': '523.20', 'y': '652.58', 'h': '15.40', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '664.57', 'h': '233.88', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '676.57', 'h': '70.36', 'w': '11.68'}, {'page': '10', 'x': '375.09', 'y': '674.60', 'h': '6.89', 'w': '8.18'}, {'page': '10', 'x': '385.54', 'y': '676.58', 'h': '153.04', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '688.58', 'h': '233.87', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '700.57', 'h': '233.86', 'w': '11.68'}, {'page': '10', 'x': '304.72', 'y': '712.58', 'h': '233.85', 'w': '11.68'}], [{'page': '11', 'x': '56.69', 'y': '88.58', 'h': '233.86', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '100.58', 'h': '233.88', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '112.57', 'h': '108.37', 'w': '11.68'}], [{'page': '11', 'x': '168.16', 'y': '112.57', 'h': '122.39', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '124.58', 'h': '233.86', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '136.58', 'h': '233.84', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '148.57', 'h': '60.49', 'w': '11.68'}]]\", 'pages': \"('10', '11')\", 'section_title': 'Discussion', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Chronic inflammation in IA is orchestrated by a complex network of signaling pathways which are expected to be represented in the protein and peptide expression patterns of SF.Therefore, proteomic and peptidomic analysis of SF can reflect the molecular underpinnings of IA and enhance our understanding of principal drivers at the apex of this disease.Overall, through the application of high-throughput, label-free MS, this discovery-phase study has generated a comprehensive proteomic dataset representative of IA SF and its specific subtypes.We discovered 5 protein candidates and 10 peptide candidates upregulated in IA SF, of which 3 proteins have yet to be described in IA.Moreover, subtype-specific analyses identified 4 RA-specific protein candidates, 2 PsA-specific protein candidates and 5 PsA-specific peptide candidates.Several of these candidates have been associated with inflammatory pathways at the genetic level but have not been investigated at the protein level and therefore, require functional experimentation to elucidate their role in the pathogenesis of IA.The data presented herein underscores the potential for proteins and peptides to elucidate mechanistic pathways related to the onset of arthritic disease in addition to their capacity to serve as informative clinical biomarkers.', metadata={'text': 'Chronic inflammation in IA is orchestrated by a complex network of signaling pathways which are expected to be represented in the protein and peptide expression patterns of SF.Therefore, proteomic and peptidomic analysis of SF can reflect the molecular underpinnings of IA and enhance our understanding of principal drivers at the apex of this disease.Overall, through the application of high-throughput, label-free MS, this discovery-phase study has generated a comprehensive proteomic dataset representative of IA SF and its specific subtypes.We discovered 5 protein candidates and 10 peptide candidates upregulated in IA SF, of which 3 proteins have yet to be described in IA.Moreover, subtype-specific analyses identified 4 RA-specific protein candidates, 2 PsA-specific protein candidates and 5 PsA-specific peptide candidates.Several of these candidates have been associated with inflammatory pathways at the genetic level but have not been investigated at the protein level and therefore, require functional experimentation to elucidate their role in the pathogenesis of IA.The data presented herein underscores the potential for proteins and peptides to elucidate mechanistic pathways related to the onset of arthritic disease in addition to their capacity to serve as informative clinical biomarkers.', 'para': '6', 'bboxes': \"[[{'page': '11', 'x': '56.69', 'y': '188.44', 'h': '233.86', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '200.44', 'h': '233.85', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '212.43', 'h': '233.86', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '224.44', 'h': '46.64', 'w': '11.68'}], [{'page': '11', 'x': '106.51', 'y': '224.44', 'h': '184.05', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '236.44', 'h': '233.86', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '248.44', 'h': '233.88', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '260.43', 'h': '98.26', 'w': '11.68'}], [{'page': '11', 'x': '157.87', 'y': '260.43', 'h': '132.67', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '272.43', 'h': '233.86', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '284.42', 'h': '233.85', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '296.42', 'h': '199.34', 'w': '11.68'}], [{'page': '11', 'x': '258.69', 'y': '296.42', 'h': '31.87', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '308.44', 'h': '233.86', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '320.44', 'h': '233.84', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '332.43', 'h': '73.25', 'w': '11.68'}], [{'page': '11', 'x': '136.13', 'y': '332.43', 'h': '154.42', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '344.43', 'h': '233.86', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '356.44', 'h': '233.86', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '368.44', 'h': '31.59', 'w': '11.68'}], [{'page': '11', 'x': '91.00', 'y': '368.44', 'h': '199.54', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '380.44', 'h': '233.85', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '392.43', 'h': '233.83', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '404.43', 'h': '233.86', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '416.42', 'h': '124.16', 'w': '11.68'}], [{'page': '11', 'x': '183.48', 'y': '416.42', 'h': '107.07', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '428.42', 'h': '233.86', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '440.41', 'h': '233.85', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '452.41', 'h': '233.86', 'w': '11.68'}, {'page': '11', 'x': '56.69', 'y': '464.40', 'h': '129.87', 'w': '11.68'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Conclusions', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Additional file 1. Table S1: Complete protein group report for proteomics.Table S2: Complete peptide report for proteomics.Table S3: Complete spectra search output for peptidomics.Table S4: Complete list of significantly dysregulated human proteins identified in inflammatory arthritis synovial fluid relative to control synovial fluid.Table S5: Complete list of significantly dysregulated human proteins identified in rheumatoid synovial fluid relative to psoriatic arthritis synovial fluid.Table S6: Functional pathways and regulatory networks associated with significantly dysregulated proteins in IA SF.Table S7: Complete list of significantly dysregulated human peptides identified in inflammatory arthritis synovial fluid relative to control synovial fluid.Table S8: Complete list of significantly dysregulated human peptides identified in rheumatoid synovial fluid relative to psoriatic arthritis synovial fluid.Table S9: Complete list of all predicted antimicrobial peptides in inflammatory arthritis synovial fluid Abbreviations IA: inflammatory arthritis; RA: rheumatoid arthritis; PsA: psoriatic arthritis; MHC: major histocompatibility complex; MS: mass spectrometry; SF: synovial fluid; AMP: antimicrobial peptide; ACR : American College of Rheumatology; CASPAR: classification criteria for psoriatic arthritis; ABC: ammonium bicarbonate; DTT: dithiothreitol; IAM: iodoacetamide; FA: formic acid; LC-MS/MS: liquid chromatography-tandem mass spectrometry; DMSO: dimethyl sulfoxide; ACN: acetonitrile; TFA: trifluoroacetic acid; LFQ: label-free quantification; IPA: ingenuity pathway analysis; DAVID: database for annotation, visualization and integrated discovery; KEGG: Kyoto Encyclopedia of Genes and Genomes; CAMP R3 : collection of anti-microbial peptides; SVM: support vector machine; FGA: fibrinogen alpha chain; CPB2: carboxypeptidase B2; FGB: fibrinogen beta chain; F2: prothrombin; TLR: toll-like receptor; TNF-α: tumor necrosis factor alpha; IL: interleukin; CD5L: CD5 molecule-like; MMP: matrix metalloproteinase; S100: S100 calcium-binding protein; DEFA3: defensin alpha 3; FTO: alphaketoglutarate-dependent dioxygenase; FAM21C: family with sequence similarity 21 member C; TBX3: T-box transcription factor; GWAS: genome-wide association study; SNP: single nucleotide polymorphism; p21: cyclin-dependent kinase inhibitor p21 WAF1 ; OA: osteoarthritis; FCGR3A: immunoglobulin gamma Fc region receptor III-A; POSTN: periostin; PGK1: phosphoglycerate kinase 1; COL1A1: collagen type I alpha 1; CCSER2: coiled-coil serine rich protein 2; FpA: fibrinopeptide A; ICAM-1: intercellular adhesion molecule 1.', metadata={'text': 'Additional file 1. Table S1: Complete protein group report for proteomics.Table S2: Complete peptide report for proteomics.Table S3: Complete spectra search output for peptidomics.Table S4: Complete list of significantly dysregulated human proteins identified in inflammatory arthritis synovial fluid relative to control synovial fluid.Table S5: Complete list of significantly dysregulated human proteins identified in rheumatoid synovial fluid relative to psoriatic arthritis synovial fluid.Table S6: Functional pathways and regulatory networks associated with significantly dysregulated proteins in IA SF.Table S7: Complete list of significantly dysregulated human peptides identified in inflammatory arthritis synovial fluid relative to control synovial fluid.Table S8: Complete list of significantly dysregulated human peptides identified in rheumatoid synovial fluid relative to psoriatic arthritis synovial fluid.Table S9: Complete list of all predicted antimicrobial peptides in inflammatory arthritis synovial fluid Abbreviations IA: inflammatory arthritis; RA: rheumatoid arthritis; PsA: psoriatic arthritis; MHC: major histocompatibility complex; MS: mass spectrometry; SF: synovial fluid; AMP: antimicrobial peptide; ACR : American College of Rheumatology; CASPAR: classification criteria for psoriatic arthritis; ABC: ammonium bicarbonate; DTT: dithiothreitol; IAM: iodoacetamide; FA: formic acid; LC-MS/MS: liquid chromatography-tandem mass spectrometry; DMSO: dimethyl sulfoxide; ACN: acetonitrile; TFA: trifluoroacetic acid; LFQ: label-free quantification; IPA: ingenuity pathway analysis; DAVID: database for annotation, visualization and integrated discovery; KEGG: Kyoto Encyclopedia of Genes and Genomes; CAMP R3 : collection of anti-microbial peptides; SVM: support vector machine; FGA: fibrinogen alpha chain; CPB2: carboxypeptidase B2; FGB: fibrinogen beta chain; F2: prothrombin; TLR: toll-like receptor; TNF-α: tumor necrosis factor alpha; IL: interleukin; CD5L: CD5 molecule-like; MMP: matrix metalloproteinase; S100: S100 calcium-binding protein; DEFA3: defensin alpha 3; FTO: alphaketoglutarate-dependent dioxygenase; FAM21C: family with sequence similarity 21 member C; TBX3: T-box transcription factor; GWAS: genome-wide association study; SNP: single nucleotide polymorphism; p21: cyclin-dependent kinase inhibitor p21 WAF1 ; OA: osteoarthritis; FCGR3A: immunoglobulin gamma Fc region receptor III-A; POSTN: periostin; PGK1: phosphoglycerate kinase 1; COL1A1: collagen type I alpha 1; CCSER2: coiled-coil serine rich protein 2; FpA: fibrinopeptide A; ICAM-1: intercellular adhesion molecule 1.', 'para': '8', 'bboxes': \"[[{'page': '11', 'x': '62.69', 'y': '530.70', 'h': '204.27', 'w': '6.93'}, {'page': '11', 'x': '62.69', 'y': '539.70', 'h': '22.70', 'w': '6.93'}], [{'page': '11', 'x': '87.04', 'y': '539.70', 'h': '151.96', 'w': '6.93'}], [{'page': '11', 'x': '240.64', 'y': '539.70', 'h': '28.68', 'w': '6.93'}, {'page': '11', 'x': '62.69', 'y': '548.70', 'h': '146.86', 'w': '6.93'}], [{'page': '11', 'x': '211.19', 'y': '548.70', 'h': '69.88', 'w': '6.93'}, {'page': '11', 'x': '62.69', 'y': '557.70', 'h': '212.59', 'w': '6.93'}, {'page': '11', 'x': '62.69', 'y': '566.70', 'h': '159.22', 'w': '6.93'}], [{'page': '11', 'x': '223.55', 'y': '566.70', 'h': '59.96', 'w': '6.93'}, {'page': '11', 'x': '62.69', 'y': '575.70', 'h': '207.88', 'w': '6.93'}, {'page': '11', 'x': '62.69', 'y': '584.70', 'h': '176.78', 'w': '6.93'}], [{'page': '11', 'x': '241.12', 'y': '584.70', 'h': '28.68', 'w': '6.93'}, {'page': '11', 'x': '62.69', 'y': '593.70', 'h': '221.85', 'w': '6.93'}, {'page': '11', 'x': '62.69', 'y': '602.70', 'h': '89.17', 'w': '6.93'}], [{'page': '11', 'x': '153.50', 'y': '602.70', 'h': '114.70', 'w': '6.93'}, {'page': '11', 'x': '62.69', 'y': '611.70', 'h': '219.50', 'w': '6.93'}, {'page': '11', 'x': '62.69', 'y': '620.70', 'h': '109.25', 'w': '6.93'}], [{'page': '11', 'x': '173.58', 'y': '620.70', 'h': '98.95', 'w': '6.93'}, {'page': '11', 'x': '62.69', 'y': '629.70', 'h': '209.87', 'w': '6.93'}, {'page': '11', 'x': '62.69', 'y': '638.70', 'h': '137.56', 'w': '6.93'}], [{'page': '11', 'x': '201.90', 'y': '638.70', 'h': '77.47', 'w': '6.93'}, {'page': '11', 'x': '62.69', 'y': '647.70', 'h': '220.05', 'w': '6.93'}, {'page': '11', 'x': '56.69', 'y': '673.71', 'h': '45.65', 'w': '6.93'}, {'page': '11', 'x': '56.69', 'y': '682.71', 'h': '232.75', 'w': '6.93'}, {'page': '11', 'x': '56.69', 'y': '691.71', 'h': '226.35', 'w': '6.93'}, {'page': '11', 'x': '56.69', 'y': '700.71', 'h': '233.85', 'w': '6.93'}, {'page': '11', 'x': '56.69', 'y': '709.71', 'h': '213.67', 'w': '6.93'}, {'page': '11', 'x': '56.69', 'y': '718.71', 'h': '218.10', 'w': '6.93'}, {'page': '11', 'x': '304.72', 'y': '88.58', 'h': '216.40', 'w': '6.93'}, {'page': '11', 'x': '304.72', 'y': '97.58', 'h': '223.99', 'w': '6.93'}, {'page': '11', 'x': '304.72', 'y': '106.58', 'h': '216.69', 'w': '6.93'}, {'page': '11', 'x': '304.72', 'y': '115.58', 'h': '229.09', 'w': '6.93'}, {'page': '11', 'x': '304.72', 'y': '124.58', 'h': '226.88', 'w': '8.24'}, {'page': '11', 'x': '304.72', 'y': '133.58', 'h': '231.25', 'w': '6.93'}, {'page': '11', 'x': '304.72', 'y': '142.58', 'h': '220.08', 'w': '6.93'}, {'page': '11', 'x': '304.72', 'y': '151.58', 'h': '233.55', 'w': '6.93'}, {'page': '11', 'x': '304.72', 'y': '160.58', 'h': '216.71', 'w': '6.93'}, {'page': '11', 'x': '304.72', 'y': '169.58', 'h': '232.85', 'w': '6.93'}, {'page': '11', 'x': '304.72', 'y': '178.58', 'h': '229.01', 'w': '6.93'}, {'page': '11', 'x': '304.72', 'y': '187.58', 'h': '224.24', 'w': '6.93'}, {'page': '11', 'x': '304.72', 'y': '195.10', 'h': '231.14', 'w': '8.41'}, {'page': '11', 'x': '304.73', 'y': '205.58', 'h': '223.60', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '214.58', 'h': '231.67', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '223.58', 'h': '177.35', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Additional file', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='We thank Ihor Batruch for his support in mass spectrometric analysis.', metadata={'text': 'We thank Ihor Batruch for his support in mass spectrometric analysis.', 'para': '0', 'bboxes': \"[[{'page': '11', 'x': '304.73', 'y': '250.58', 'h': '204.70', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Acknowledgements', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='We thank Ihor Batruch for his support in mass spectrometric analysis.', metadata={'text': 'We thank Ihor Batruch for his support in mass spectrometric analysis.', 'para': '0', 'bboxes': \"[[{'page': '11', 'x': '304.73', 'y': '250.58', 'h': '204.70', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Acknowledgements', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='The mass spectrometry proteomics and peptidomics datasets supporting the conclusions of this article are available in the PRIDE Archive via the PRIDE partner repository with the data set identifier PXD011872; http://www.ebi.ac.uk/ pride /archi ve/ (username: reviewer92309@ebi.ac.uk and password: 3hXihB2 s).', metadata={'text': 'The mass spectrometry proteomics and peptidomics datasets supporting the conclusions of this article are available in the PRIDE Archive via the PRIDE partner repository with the data set identifier PXD011872; http://www.ebi.ac.uk/ pride /archi ve/ (username: reviewer92309@ebi.ac.uk and password: 3hXihB2 s).', 'para': '0', 'bboxes': \"[[{'page': '11', 'x': '304.73', 'y': '385.58', 'h': '231.09', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '394.58', 'h': '232.71', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '403.58', 'h': '226.34', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '412.58', 'h': '232.94', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Availability of data and materials', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='The mass spectrometry proteomics and peptidomics datasets supporting the conclusions of this article are available in the PRIDE Archive via the PRIDE partner repository with the data set identifier PXD011872; http://www.ebi.ac.uk/ pride /archi ve/ (username: reviewer92309@ebi.ac.uk and password: 3hXihB2 s).', metadata={'text': 'The mass spectrometry proteomics and peptidomics datasets supporting the conclusions of this article are available in the PRIDE Archive via the PRIDE partner repository with the data set identifier PXD011872; http://www.ebi.ac.uk/ pride /archi ve/ (username: reviewer92309@ebi.ac.uk and password: 3hXihB2 s).', 'para': '0', 'bboxes': \"[[{'page': '11', 'x': '304.73', 'y': '385.58', 'h': '231.09', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '394.58', 'h': '232.71', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '403.58', 'h': '226.34', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '412.58', 'h': '232.94', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Availability of data and materials', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='This work was supported by The Krembil Foundation.', metadata={'text': 'This work was supported by The Krembil Foundation.', 'para': '0', 'bboxes': \"[[{'page': '11', 'x': '304.73', 'y': '358.58', 'h': '158.35', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Funding', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='This work was supported by The Krembil Foundation.', metadata={'text': 'This work was supported by The Krembil Foundation.', 'para': '0', 'bboxes': \"[[{'page': '11', 'x': '304.73', 'y': '358.58', 'h': '158.35', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Funding', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content=\"Authors' contributions SM, EPD, and VC participated in the conceptualization of the study and experimental design.VC provided RA and PsA SF samples.EK provided RA SF samples and RK provided control cadaveric SF samples.IB provided mass spectrometry expertise and assisted with mass spectrometry analysis.KL provided statistical expertise and assisted with the statistical analysis.SM drafted the manuscript.SM, EPD, and VC prepared the final version of the manuscript.All authors read and approved the final manuscript.\", metadata={'text': \"Authors' contributions SM, EPD, and VC participated in the conceptualization of the study and experimental design.VC provided RA and PsA SF samples.EK provided RA SF samples and RK provided control cadaveric SF samples.IB provided mass spectrometry expertise and assisted with mass spectrometry analysis.KL provided statistical expertise and assisted with the statistical analysis.SM drafted the manuscript.SM, EPD, and VC prepared the final version of the manuscript.All authors read and approved the final manuscript.\", 'para': '7', 'bboxes': \"[[{'page': '11', 'x': '304.73', 'y': '268.58', 'h': '72.39', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '277.58', 'h': '210.67', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '286.58', 'h': '62.65', 'w': '6.93'}], [{'page': '11', 'x': '368.75', 'y': '286.58', 'h': '108.10', 'w': '6.93'}], [{'page': '11', 'x': '478.50', 'y': '286.58', 'h': '45.67', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '295.58', 'h': '173.00', 'w': '6.93'}], [{'page': '11', 'x': '479.37', 'y': '295.58', 'h': '50.78', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '304.58', 'h': '206.83', 'w': '6.93'}], [{'page': '11', 'x': '513.20', 'y': '304.58', 'h': '21.33', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '313.58', 'h': '194.37', 'w': '6.93'}], [{'page': '11', 'x': '500.74', 'y': '313.58', 'h': '32.68', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '322.58', 'h': '46.91', 'w': '6.93'}], [{'page': '11', 'x': '353.27', 'y': '322.58', 'h': '182.31', 'w': '6.93'}], [{'page': '11', 'x': '304.73', 'y': '331.58', 'h': '153.28', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Ethics approval and consent to participate', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Human research ethics board approval was received for the study from the University Health Network, Mount Sinai Hospital and the University of Calgary.Informed consent was obtained from all patients.', metadata={'text': 'Human research ethics board approval was received for the study from the University Health Network, Mount Sinai Hospital and the University of Calgary.Informed consent was obtained from all patients.', 'para': '1', 'bboxes': \"[[{'page': '11', 'x': '304.73', 'y': '439.58', 'h': '222.73', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '448.58', 'h': '231.99', 'w': '6.93'}], [{'page': '11', 'x': '304.73', 'y': '457.58', 'h': '146.68', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Ethics approval and consent to participate', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Not applicable.', metadata={'text': 'Not applicable.', 'para': '0', 'bboxes': \"[[{'page': '11', 'x': '304.73', 'y': '484.58', 'h': '44.93', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Ethics approval and consent to participate', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='The authors declare that they have no competing financial interest.', metadata={'text': 'The authors declare that they have no competing financial interest.', 'para': '0', 'bboxes': \"[[{'page': '11', 'x': '304.73', 'y': '511.58', 'h': '200.08', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Ethics approval and consent to participate', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.', metadata={'text': 'Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.', 'para': '0', 'bboxes': \"[[{'page': '13', 'x': '304.72', 'y': '193.58', 'h': '223.21', 'w': '6.93'}, {'page': '13', 'x': '304.72', 'y': '202.58', 'h': '119.32', 'w': '6.93'}]]\", 'pages': \"('13', '13')\", 'section_title': 'Ethics approval and consent to participate', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Human research ethics board approval was received for the study from the University Health Network, Mount Sinai Hospital and the University of Calgary.Informed consent was obtained from all patients.', metadata={'text': 'Human research ethics board approval was received for the study from the University Health Network, Mount Sinai Hospital and the University of Calgary.Informed consent was obtained from all patients.', 'para': '1', 'bboxes': \"[[{'page': '11', 'x': '304.73', 'y': '439.58', 'h': '222.73', 'w': '6.93'}, {'page': '11', 'x': '304.73', 'y': '448.58', 'h': '231.99', 'w': '6.93'}], [{'page': '11', 'x': '304.73', 'y': '457.58', 'h': '146.68', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Ethics approval and consent to participate', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Not applicable.', metadata={'text': 'Not applicable.', 'para': '0', 'bboxes': \"[[{'page': '11', 'x': '304.73', 'y': '484.58', 'h': '44.93', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Consent for publication', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='The authors declare that they have no competing financial interest.', metadata={'text': 'The authors declare that they have no competing financial interest.', 'para': '0', 'bboxes': \"[[{'page': '11', 'x': '304.73', 'y': '511.58', 'h': '200.08', 'w': '6.93'}]]\", 'pages': \"('11', '11')\", 'section_title': 'Competing interests', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.', metadata={'text': 'Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.', 'para': '0', 'bboxes': \"[[{'page': '13', 'x': '304.72', 'y': '193.58', 'h': '223.21', 'w': '6.93'}, {'page': '13', 'x': '304.72', 'y': '202.58', 'h': '119.32', 'w': '6.93'}]]\", 'pages': \"('13', '13')\", 'section_title': \"Publisher's Note\", 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'})]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from langchain.document_loaders.parsers import GrobidParser\n",
    "from langchain.document_loaders.generic import GenericLoader\n",
    "\n",
    "loader = GenericLoader.from_filesystem(\n",
    "    DATA_PATH / \"papers/\",\n",
    "    glob=\"2.pdf\",\n",
    "    suffixes=[\".pdf\"],\n",
    "    parser=GrobidParser(segment_sentences=False),\n",
    ")\n",
    "docs = loader.load()\n",
    "docs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/data/tommaso/mambaforge/envs/llm4scilit/lib/python3.10/site-packages/spacy/pipeline/lemmatizer.py:211: UserWarning: [W108] The rule-based lemmatizer did not find POS annotation for one or more tokens. Check that your pipeline includes components that assign token.pos, typically 'tagger'+'attribute_ruler' or 'morphologizer'.\n",
      "  warnings.warn(Warnings.W108)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[Document(page_content='We determined that 144 proteins showed significant differential abundance between the IA and control SF proteomes, of which 11 protein candidates were selected for future follow-up studies.\\n\\nSimilar analyses applied to our peptidomic data identified 15 peptide sequences, originating from 4 protein precursors, to have significant differential abundance in IA compared to the control SF peptidome.\\n\\nPathway enrichment analysis of the IA SF peptidome along with AMP prediction suggests a possible mechanistic role of microbes in eliciting an immune response which drives the development of IA.', metadata={'text': 'We determined that 144 proteins showed significant differential abundance between the IA and control SF proteomes, of which 11 protein candidates were selected for future follow-up studies.Similar analyses applied to our peptidomic data identified 15 peptide sequences, originating from 4 protein precursors, to have significant differential abundance in IA compared to the control SF peptidome.Pathway enrichment analysis of the IA SF peptidome along with AMP prediction suggests a possible mechanistic role of microbes in eliciting an immune response which drives the development of IA.', 'para': '2', 'bboxes': \"[[{'page': '1', 'x': '101.12', 'y': '422.98', 'h': '424.81', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '434.98', 'h': '340.13', 'w': '9.24'}], [{'page': '1', 'x': '405.45', 'y': '434.98', 'h': '120.66', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '446.98', 'h': '468.92', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '458.98', 'h': '225.40', 'w': '9.24'}], [{'page': '1', 'x': '290.71', 'y': '458.98', 'h': '234.48', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '470.98', 'h': '460.78', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '482.98', 'h': '91.59', 'w': '9.24'}]]\", 'pages': \"('1', '1')\", 'section_title': 'Results:', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='The discovery-phase data generated herein has provided a basis for the identification of candidates with the greatest potential to serve as novel serum biomarkers specific to inflammatory arthritides.\\n\\nMoreover, these findings facilitate the understanding of possible disease mechanisms specific to each subtype.', metadata={'text': 'The discovery-phase data generated herein has provided a basis for the identification of candidates with the greatest potential to serve as novel serum biomarkers specific to inflammatory arthritides.Moreover, these findings facilitate the understanding of possible disease mechanisms specific to each subtype.', 'para': '1', 'bboxes': \"[[{'page': '1', 'x': '122.15', 'y': '497.98', 'h': '394.30', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '509.98', 'h': '391.31', 'w': '9.24'}], [{'page': '1', 'x': '456.63', 'y': '509.98', 'h': '63.75', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '521.98', 'h': '374.26', 'w': '9.24'}]]\", 'pages': \"('1', '1')\", 'section_title': 'Conclusions:', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='Inflammatory arthritis (IA) is characterized by synovial hyperplasia leading to degradation of adjacent articular cartilage and bone [1].The\\n\\nterm encompasses several forms of inflammatory joint diseases that when taken together, have an annual incidence ranging from 115 to 271 per 100,000 adults [2].IA\\n\\nis a multifactorial disease driven by the complex interplay of both genetics and the environment.\\n\\nRheumatoid arthritis (RA), the most common and potentially destructive IA, has a well-established association with class II major histocompatibility complex (MHC) alleles while the spondyloarthritides, such as psoriatic arthritis (PsA), are more frequently associated with class I MHC alleles [', metadata={'text': \"Inflammatory arthritis (IA) is characterized by synovial hyperplasia leading to degradation of adjacent articular cartilage and bone [1].The term encompasses several forms of inflammatory joint diseases that when taken together, have an annual incidence ranging from 115 to 271 per 100,000 adults [2].IA is a multifactorial disease driven by the complex interplay of both genetics and the environment.Rheumatoid arthritis (RA), the most common and potentially destructive IA, has a well-established association with class II major histocompatibility complex (MHC) alleles while the spondyloarthritides, such as psoriatic arthritis (PsA), are more frequently associated with class I MHC alleles [3].Susceptibility to IA increases when genetic predisposition is complemented by environmental risk factors such as smoking, obesity and more recently, microbial infection and intestinal dysbiosis [4][5][6].The exact etiology of IA is still poorly understood with studies aimed at delineating the molecular pathways driving loss of immunological tolerance to the body's self-antigens.Alterations to the adaptive and innate immune system perpetuate systemic inflammation and lead to an elevated risk of developing comorbid conditions such as cardiovascular disease, metabolic syndrome, diabetes and depression [7,8].Naturally, there is a compelling need to identify markers of aberrant immune pathways relevant to IA which may advance current insights into the molecular mechanisms of the disease and serve as clinical markers for disease monitoring and treatment responses.\", 'para': '7', 'bboxes': \"[[{'page': '2', 'x': '56.69', 'y': '101.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '113.84', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '125.84', 'h': '98.24', 'w': '11.68'}], [{'page': '2', 'x': '158.95', 'y': '125.84', 'h': '131.59', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '137.83', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '149.83', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '161.83', 'h': '124.09', 'w': '11.68'}], [{'page': '2', 'x': '183.72', 'y': '161.83', 'h': '106.83', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '173.85', 'h': '233.83', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '185.84', 'h': '94.37', 'w': '11.68'}], [{'page': '2', 'x': '155.55', 'y': '185.84', 'h': '135.01', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '197.84', 'h': '233.83', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '209.83', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '221.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '233.85', 'h': '233.84', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '245.84', 'h': '212.58', 'w': '11.68'}], [{'page': '2', 'x': '272.28', 'y': '245.84', 'h': '18.27', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '257.85', 'h': '233.83', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '269.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '281.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '293.83', 'h': '127.47', 'w': '11.68'}], [{'page': '2', 'x': '187.45', 'y': '293.83', 'h': '103.09', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '305.83', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '317.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '329.85', 'h': '184.18', 'w': '11.68'}], [{'page': '2', 'x': '243.59', 'y': '329.85', 'h': '46.94', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '341.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '353.84', 'h': '233.84', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '365.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '377.83', 'h': '233.85', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '389.83', 'h': '24.69', 'w': '11.68'}], [{'page': '2', 'x': '84.82', 'y': '389.83', 'h': '205.76', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '401.82', 'h': '233.85', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '413.82', 'h': '233.84', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '425.81', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '437.81', 'h': '203.55', 'w': '11.68'}]]\", 'pages': \"('2', '2')\", 'section_title': 'Introduction', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content=\"3].Susceptibility to IA increases when genetic predisposition is complemented by environmental risk factors such as smoking, obesity and more recently, microbial infection and intestinal dysbiosis [4][5][6].The exact etiology of IA is still poorly understood with studies aimed at delineating the molecular pathways driving loss of immunological tolerance to the body's self-antigens.\\n\\nAlterations to the adaptive and innate immune system perpetuate systemic inflammation and lead to an elevated risk of developing comorbid conditions such as cardiovascular disease, metabolic syndrome, diabetes and depression [7,8].Naturally, there is a compelling need to identify markers of aberrant immune pathways relevant to IA which may advance current insights into the molecular mechanisms of the disease and serve as clinical markers for disease monitoring and treatment responses.\", metadata={'text': \"Inflammatory arthritis (IA) is characterized by synovial hyperplasia leading to degradation of adjacent articular cartilage and bone [1].The term encompasses several forms of inflammatory joint diseases that when taken together, have an annual incidence ranging from 115 to 271 per 100,000 adults [2].IA is a multifactorial disease driven by the complex interplay of both genetics and the environment.Rheumatoid arthritis (RA), the most common and potentially destructive IA, has a well-established association with class II major histocompatibility complex (MHC) alleles while the spondyloarthritides, such as psoriatic arthritis (PsA), are more frequently associated with class I MHC alleles [3].Susceptibility to IA increases when genetic predisposition is complemented by environmental risk factors such as smoking, obesity and more recently, microbial infection and intestinal dysbiosis [4][5][6].The exact etiology of IA is still poorly understood with studies aimed at delineating the molecular pathways driving loss of immunological tolerance to the body's self-antigens.Alterations to the adaptive and innate immune system perpetuate systemic inflammation and lead to an elevated risk of developing comorbid conditions such as cardiovascular disease, metabolic syndrome, diabetes and depression [7,8].Naturally, there is a compelling need to identify markers of aberrant immune pathways relevant to IA which may advance current insights into the molecular mechanisms of the disease and serve as clinical markers for disease monitoring and treatment responses.\", 'para': '7', 'bboxes': \"[[{'page': '2', 'x': '56.69', 'y': '101.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '113.84', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '125.84', 'h': '98.24', 'w': '11.68'}], [{'page': '2', 'x': '158.95', 'y': '125.84', 'h': '131.59', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '137.83', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '149.83', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '161.83', 'h': '124.09', 'w': '11.68'}], [{'page': '2', 'x': '183.72', 'y': '161.83', 'h': '106.83', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '173.85', 'h': '233.83', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '185.84', 'h': '94.37', 'w': '11.68'}], [{'page': '2', 'x': '155.55', 'y': '185.84', 'h': '135.01', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '197.84', 'h': '233.83', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '209.83', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '221.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '233.85', 'h': '233.84', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '245.84', 'h': '212.58', 'w': '11.68'}], [{'page': '2', 'x': '272.28', 'y': '245.84', 'h': '18.27', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '257.85', 'h': '233.83', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '269.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '281.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '293.83', 'h': '127.47', 'w': '11.68'}], [{'page': '2', 'x': '187.45', 'y': '293.83', 'h': '103.09', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '305.83', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '317.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '329.85', 'h': '184.18', 'w': '11.68'}], [{'page': '2', 'x': '243.59', 'y': '329.85', 'h': '46.94', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '341.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '353.84', 'h': '233.84', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '365.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '377.83', 'h': '233.85', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '389.83', 'h': '24.69', 'w': '11.68'}], [{'page': '2', 'x': '84.82', 'y': '389.83', 'h': '205.76', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '401.82', 'h': '233.85', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '413.82', 'h': '233.84', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '425.81', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '437.81', 'h': '203.55', 'w': '11.68'}]]\", 'pages': \"('2', '2')\", 'section_title': 'Introduction', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}),\n",
       " Document(page_content='The rise in high-throughput technologies, such as next-generation gene sequencing and mass spectrometry (MS), facilitate the discovery of key modulators of disease.\\n\\nSpecifically, MS-based approaches provide an essential analytical platform for the identification, quantification and characterization of candidate biomarkers.\\n\\nBiomarkers may come in the form of a molecular signature, a clinical feature or even as an imaging parameter.\\n\\nMolecular biomarkers may be further subtyped into the domains of genomics, transcriptomics, proteomics, metabolomics or peptidomics.\\n\\nDue to the importance of proteins in pathophysiological processes, there is increased interest in resolving the proteomic profile of biospecimens related to IA.Similarly, peptides play a seminal role in mediating physiological functions by serving as neurotransmitters, hormones, antibiotics and immune regulators [9].During IA', metadata={'text': 'The rise in high-throughput technologies, such as next-generation gene sequencing and mass spectrometry (MS), facilitate the discovery of key modulators of disease.Specifically, MS-based approaches provide an essential analytical platform for the identification, quantification and characterization of candidate biomarkers.Biomarkers may come in the form of a molecular signature, a clinical feature or even as an imaging parameter.Molecular biomarkers may be further subtyped into the domains of genomics, transcriptomics, proteomics, metabolomics or peptidomics.Due to the importance of proteins in pathophysiological processes, there is increased interest in resolving the proteomic profile of biospecimens related to IA.Similarly, peptides play a seminal role in mediating physiological functions by serving as neurotransmitters, hormones, antibiotics and immune regulators [9].During IA, joint pain and inflammation are driven by aberrant proteolysis resulting in the production of inflammatory peptides and the destruction of joint cartilage and bone.Synovial fluid (SF), a proximal fluid which bathes the intrinsic joint structures, is an important reservoir of putative protein and peptide biomarkers whose abundance levels fluctuate in response to pathological changes due to disease [10].', 'para': '7', 'bboxes': \"[[{'page': '2', 'x': '64.69', 'y': '449.80', 'h': '225.84', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '461.80', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '473.85', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '485.84', 'h': '45.00', 'w': '11.68'}], [{'page': '2', 'x': '106.03', 'y': '485.84', 'h': '184.52', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '497.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '509.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '521.85', 'h': '36.67', 'w': '11.68'}], [{'page': '2', 'x': '96.18', 'y': '521.85', 'h': '194.37', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '533.85', 'h': '233.87', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '545.84', 'h': '44.89', 'w': '11.68'}], [{'page': '2', 'x': '105.44', 'y': '545.84', 'h': '185.11', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '557.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '569.85', 'h': '200.97', 'w': '11.68'}], [{'page': '2', 'x': '261.20', 'y': '569.85', 'h': '29.37', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '581.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '593.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '605.85', 'h': '191.41', 'w': '11.68'}], [{'page': '2', 'x': '251.27', 'y': '605.85', 'h': '39.28', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '617.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '629.84', 'h': '233.84', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '641.83', 'h': '177.40', 'w': '11.68'}], [{'page': '2', 'x': '240.69', 'y': '641.83', 'h': '49.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '653.83', 'h': '233.85', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '665.83', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '677.85', 'h': '233.88', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '689.84', 'h': '23.12', 'w': '11.68'}], [{'page': '2', 'x': '82.70', 'y': '689.84', 'h': '207.82', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '701.84', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '56.69', 'y': '713.85', 'h': '233.86', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '89.32', 'h': '233.88', 'w': '11.68'}, {'page': '2', 'x': '304.72', 'y': '101.32', 'h': '116.75', 'w': '11.68'}]]\", 'pages': \"('2', '2')\", 'section_title': 'Introduction', 'section_number': 'None', 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry', 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'})]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import spacy\n",
    "# spacy.require_gpu(gpu_id=1)\n",
    "\n",
    "import spacy_transformers # needed by SpacyTextSplitter when using the en_core_web_trf pipeline\n",
    "from langchain.text_splitter import SpacyTextSplitter\n",
    "from itertools import chain\n",
    "\n",
    "splitter = SpacyTextSplitter(chunk_size=1000, pipeline=\"en_core_web_trf\")\n",
    "chunks = splitter.split_documents(docs)\n",
    "chunks[:5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "db_paper_2 = FAISS.from_documents(chunks, model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "db.merge_from(db_paper_2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[Document(page_content='These serum proteins have strong potential to serve as diagnostic and prognostic biomarkers of RA and can also be evaluated to fill the gaps in the current knowledge of pathogenesis of RA.These\\n\\nfindings can be validated in larger cohorts from different populations to identify diagnostic and prognostic biomarkers of RA.', metadata={'text': 'RA is a complex disease that is influenced by an intricate interactome of various environmental, genetic and microbial factors that influence the immune homeostasis.Owing to the complex genetic architecture accompanied by a plethora of microbial and environmental triggers that an organism is exposed to this has made the identification of diagnostic and prognostic markers challenging.Our study has explored the serum proteomics of this complex autoimmune disorder in a relatively understudied Pakistani population to identify disease biomarkers that are DE among various serotypes of RA patients and healthy controls.We identified that PZP, SELENOP, C4BP beta chain, ApoM, NAMLAA, CPN catalytic chain, OIT3, CPN subunit 2, ApoC1 and ApoCIII were DE between the RA patients and healthy controls.These serum proteins have strong potential to serve as diagnostic and prognostic biomarkers of RA and can also be evaluated to fill the gaps in the current knowledge of pathogenesis of RA.These findings can be validated in larger cohorts from different populations to identify diagnostic and prognostic biomarkers of RA.', 'para': '5', 'bboxes': \"[[{'page': '15', 'x': '187.65', 'y': '173.66', 'h': '371.62', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '186.22', 'h': '394.62', 'w': '9.58'}], [{'page': '15', 'x': '166.39', 'y': '198.77', 'h': '392.88', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '211.32', 'h': '392.88', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '223.88', 'h': '229.10', 'w': '9.58'}], [{'page': '15', 'x': '401.31', 'y': '223.88', 'h': '157.97', 'w': '9.58'}, {'page': '15', 'x': '166.10', 'y': '236.43', 'h': '393.18', 'w': '9.58'}, {'page': '15', 'x': '166.10', 'y': '248.98', 'h': '393.57', 'w': '9.58'}, {'page': '15', 'x': '166.10', 'y': '261.54', 'h': '130.46', 'w': '9.58'}], [{'page': '15', 'x': '299.65', 'y': '261.54', 'h': '260.87', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '274.09', 'h': '392.88', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '286.64', 'h': '201.22', 'w': '9.58'}], [{'page': '15', 'x': '370.71', 'y': '286.64', 'h': '188.57', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '299.19', 'h': '392.88', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '311.75', 'h': '238.67', 'w': '9.58'}], [{'page': '15', 'x': '407.54', 'y': '311.75', 'h': '151.74', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '324.30', 'h': '392.88', 'w': '9.58'}, {'page': '15', 'x': '166.39', 'y': '336.85', 'h': '28.14', 'w': '9.58'}]]\", 'pages': \"('15', '15')\", 'section_title': 'Conclusions', 'section_number': '5.', 'paper_title': 'LC-MS/MS-Based Serum Protein Profiling for Identification of Candidate Biomarkers in Pakistani Rheumatoid Arthritis Patients', 'file_path': '/data/tommaso/llm4scilit/data/papers/1.pdf'}),\n",
       " Document(page_content='Rheumatoid factor (RF) and anti-citrullinated peptide antibodies (ACPA) are considered as the main serological markers for RA that have been included in the 2010 American College of Rheumatology (ACR)/European League against Rheumatism (EULAR) classification criteria for RA [7][8][9].Based on 2010 ACR/EULAR classification criteria for RA, clinically diagnosed RA patients can be categorized into four serotypes: (i) positive for both RF and ACPA, (ii) positive for RF and negative for ACPA, (iii) negative for RF and positive for ACPA and (iv) negative for both RF and ACPA.However\\n\\n, the levels of RF are also perturbed in connective tissue diseases [10] and some chronic infectious diseases such as hepatitis B and hepatitis C virus infections [11].RF\\n\\nis thus not a specific diagnostic marker for', metadata={'text': 'Rheumatoid factor (RF) and anti-citrullinated peptide antibodies (ACPA) are considered as the main serological markers for RA that have been included in the 2010 American College of Rheumatology (ACR)/European League against Rheumatism (EULAR) classification criteria for RA [7][8][9].Based on 2010 ACR/EULAR classification criteria for RA, clinically diagnosed RA patients can be categorized into four serotypes: (i) positive for both RF and ACPA, (ii) positive for RF and negative for ACPA, (iii) negative for RF and positive for ACPA and (iv) negative for both RF and ACPA.However, the levels of RF are also perturbed in connective tissue diseases [10] and some chronic infectious diseases such as hepatitis B and hepatitis C virus infections [11].RF is thus not a specific diagnostic marker for RA.ACPA is comparatively a more specific biomarker and two-thirds of the individuals ultimately diagnosed with RA were tested positive for ACPAs 6-10 years before diagnosis [12,13].A total of 1-3% of the healthy population may also test positive for ACPAs suggesting the decreased specificity of this biomarker [14][15][16][17].Therefore, it is important to discover the biomarkers for the diagnosis of RA with both increased sensitivity and specificity.', 'para': '6', 'bboxes': \"[[{'page': '2', 'x': '187.65', 'y': '223.58', 'h': '373.27', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '236.13', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '248.68', 'h': '394.53', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '261.24', 'h': '133.81', 'w': '9.58'}], [{'page': '2', 'x': '303.29', 'y': '261.24', 'h': '257.23', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '273.79', 'h': '393.08', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '286.34', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.10', 'y': '298.90', 'h': '272.66', 'w': '9.58'}], [{'page': '2', 'x': '441.85', 'y': '298.90', 'h': '117.43', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '311.45', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '324.00', 'h': '240.16', 'w': '9.58'}], [{'page': '2', 'x': '409.64', 'y': '324.00', 'h': '149.63', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '336.55', 'h': '67.99', 'w': '9.58'}], [{'page': '2', 'x': '236.99', 'y': '336.55', 'h': '322.28', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '349.11', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '361.66', 'h': '107.38', 'w': '9.58'}], [{'page': '2', 'x': '276.86', 'y': '361.66', 'h': '282.42', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '374.21', 'h': '325.69', 'w': '9.58'}], [{'page': '2', 'x': '495.20', 'y': '374.21', 'h': '64.08', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '386.77', 'h': '393.27', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '399.32', 'h': '65.18', 'w': '9.58'}]]\", 'pages': \"('2', '2')\", 'section_title': 'Introduction', 'section_number': '1.', 'paper_title': 'LC-MS/MS-Based Serum Protein Profiling for Identification of Candidate Biomarkers in Pakistani Rheumatoid Arthritis Patients', 'file_path': '/data/tommaso/llm4scilit/data/papers/1.pdf'}),\n",
       " Document(page_content='is thus not a specific diagnostic marker for\\n\\nRA.ACPA is comparatively a more specific biomarker and two-thirds of the individuals ultimately diagnosed with RA were tested positive for ACPAs 6-10 years before diagnosis [12,13].A total of 1-3% of the healthy population may also test positive for ACPAs suggesting the decreased specificity of this biomarker [14][15][16][17].Therefore\\n\\n, it is important to discover the biomarkers for the diagnosis of RA with both increased sensitivity and specificity.', metadata={'text': 'Rheumatoid factor (RF) and anti-citrullinated peptide antibodies (ACPA) are considered as the main serological markers for RA that have been included in the 2010 American College of Rheumatology (ACR)/European League against Rheumatism (EULAR) classification criteria for RA [7][8][9].Based on 2010 ACR/EULAR classification criteria for RA, clinically diagnosed RA patients can be categorized into four serotypes: (i) positive for both RF and ACPA, (ii) positive for RF and negative for ACPA, (iii) negative for RF and positive for ACPA and (iv) negative for both RF and ACPA.However, the levels of RF are also perturbed in connective tissue diseases [10] and some chronic infectious diseases such as hepatitis B and hepatitis C virus infections [11].RF is thus not a specific diagnostic marker for RA.ACPA is comparatively a more specific biomarker and two-thirds of the individuals ultimately diagnosed with RA were tested positive for ACPAs 6-10 years before diagnosis [12,13].A total of 1-3% of the healthy population may also test positive for ACPAs suggesting the decreased specificity of this biomarker [14][15][16][17].Therefore, it is important to discover the biomarkers for the diagnosis of RA with both increased sensitivity and specificity.', 'para': '6', 'bboxes': \"[[{'page': '2', 'x': '187.65', 'y': '223.58', 'h': '373.27', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '236.13', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '248.68', 'h': '394.53', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '261.24', 'h': '133.81', 'w': '9.58'}], [{'page': '2', 'x': '303.29', 'y': '261.24', 'h': '257.23', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '273.79', 'h': '393.08', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '286.34', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.10', 'y': '298.90', 'h': '272.66', 'w': '9.58'}], [{'page': '2', 'x': '441.85', 'y': '298.90', 'h': '117.43', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '311.45', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '324.00', 'h': '240.16', 'w': '9.58'}], [{'page': '2', 'x': '409.64', 'y': '324.00', 'h': '149.63', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '336.55', 'h': '67.99', 'w': '9.58'}], [{'page': '2', 'x': '236.99', 'y': '336.55', 'h': '322.28', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '349.11', 'h': '392.88', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '361.66', 'h': '107.38', 'w': '9.58'}], [{'page': '2', 'x': '276.86', 'y': '361.66', 'h': '282.42', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '374.21', 'h': '325.69', 'w': '9.58'}], [{'page': '2', 'x': '495.20', 'y': '374.21', 'h': '64.08', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '386.77', 'h': '393.27', 'w': '9.58'}, {'page': '2', 'x': '166.39', 'y': '399.32', 'h': '65.18', 'w': '9.58'}]]\", 'pages': \"('2', '2')\", 'section_title': 'Introduction', 'section_number': '1.', 'paper_title': 'LC-MS/MS-Based Serum Protein Profiling for Identification of Candidate Biomarkers in Pakistani Rheumatoid Arthritis Patients', 'file_path': '/data/tommaso/llm4scilit/data/papers/1.pdf'}),\n",
       " Document(page_content='For validation, serum samples were collected and processed from RA patients (n = 60) (mean age ± SD = 41.495 ± 12.8275) and healthy controls (n = 20) (mean age ± SD = 45.4 ± 11.31) from the same population.\\n\\nThe demographics and clinical characteristics of the experimental and validation cohort are shown in Table 1.', metadata={'text': 'For validation, serum samples were collected and processed from RA patients (n = 60) (mean age ± SD = 41.495 ± 12.8275) and healthy controls (n = 20) (mean age ± SD = 45.4 ± 11.31) from the same population.The demographics and clinical characteristics of the experimental and validation cohort are shown in Table 1.', 'para': '1', 'bboxes': \"[[{'page': '3', 'x': '187.65', 'y': '160.81', 'h': '372.02', 'w': '9.58'}, {'page': '3', 'x': '166.10', 'y': '173.05', 'h': '394.17', 'w': '9.90'}, {'page': '3', 'x': '166.07', 'y': '185.60', 'h': '256.73', 'w': '9.90'}], [{'page': '3', 'x': '425.92', 'y': '185.92', 'h': '133.36', 'w': '9.58'}, {'page': '3', 'x': '166.39', 'y': '198.47', 'h': '343.00', 'w': '9.58'}]]\", 'pages': \"('3', '3')\", 'section_title': 'Study Subjects and Serum Collection', 'section_number': '2.1.', 'paper_title': 'LC-MS/MS-Based Serum Protein Profiling for Identification of Candidate Biomarkers in Pakistani Rheumatoid Arthritis Patients', 'file_path': '/data/tommaso/llm4scilit/data/papers/1.pdf'})]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.as_retriever().get_relevant_documents(\"What are the main serological markers for RA?\", metadata={\"paper_title\": \"LC-MS/MS-Based Serum Protein Profiling for Identification of Candidate Biomarkers in Pakistani Rheumatoid Arthritis Patients\"})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "results = db.as_retriever().get_relevant_documents(\"What are the main serological markers for RA?\", search_kwargs={\"metadata\": {\"paper_title\": \"Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry\"}})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'LC-MS/MS-Based Serum Protein Profiling for Identification of Candidate Biomarkers in Pakistani Rheumatoid Arthritis Patients'"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results[0].metadata[\"paper_title\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "134"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "db.index.ntotal"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'text': 'We determined that 144 proteins showed significant differential abundance between the IA and control SF proteomes, of which 11 protein candidates were selected for future follow-up studies.Similar analyses applied to our peptidomic data identified 15 peptide sequences, originating from 4 protein precursors, to have significant differential abundance in IA compared to the control SF peptidome.Pathway enrichment analysis of the IA SF peptidome along with AMP prediction suggests a possible mechanistic role of microbes in eliciting an immune response which drives the development of IA.',\n",
       " 'para': '2',\n",
       " 'bboxes': \"[[{'page': '1', 'x': '101.12', 'y': '422.98', 'h': '424.81', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '434.98', 'h': '340.13', 'w': '9.24'}], [{'page': '1', 'x': '405.45', 'y': '434.98', 'h': '120.66', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '446.98', 'h': '468.92', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '458.98', 'h': '225.40', 'w': '9.24'}], [{'page': '1', 'x': '290.71', 'y': '458.98', 'h': '234.48', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '470.98', 'h': '460.78', 'w': '9.24'}, {'page': '1', 'x': '63.12', 'y': '482.98', 'h': '91.59', 'w': '9.24'}]]\",\n",
       " 'pages': \"('1', '1')\",\n",
       " 'section_title': 'Results:',\n",
       " 'section_number': 'None',\n",
       " 'paper_title': 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry',\n",
       " 'file_path': '/data/tommaso/llm4scilit/data/papers/2.pdf'}"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chunks[0].metadata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry',\n",
       " 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry',\n",
       " 'Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry']"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[x.metadata[\"paper_title\"] for x in db.as_retriever(search_kwargs={\"filter\": {\"paper_title\": \"Elucidating the endogenous synovial fluid proteome and peptidome of inflammatory arthritis using label-free mass spectrometry\"}}).get_relevant_documents(\"What are the main serological markers for RA?\")]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['ask_paper', 'usu sus', 'asd']"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import shlex\n",
    "\n",
    "shlex.split('ask_paper \"usu sus\" asd')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llm4scilit",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.12"
  },
  "orig_nbformat": 4
 },
 "nbformat": 4,
 "nbformat_minor": 2
}