Wikipedia:Reference desk/Archives/Computing/2019 August 28

From Wikipedia, the free encyclopedia
Computing desk
< August 27 << Jul | August | Sep >> August 29 >
Welcome to the Wikipedia Computing Reference Desk Archives
The page you are currently viewing is an archive page. While you can leave answers for any questions shown below, please ask new questions on one of the current reference desk pages.


August 28[edit]

Help with extracting info from json api[edit]

Using xidel how to I extract only the chapter numbers from this json api? I tried the following;

xidel https://mangadex.org/api/manga/32846 -e '$json/chapter'

but I get a load of other sub information that I don't want. I only want the chapter numbers, for example the output should look like this;

504195
522807
576777
614751
622223
626558
644663
644664
668861
670108
697858

I've a novice with json so any help is greatly appreciated. Thanks. 77.103.40.144 (talk) 14:52, 28 August 2019 (UTC)[reply]

I see two potential problems with the json file, that the chapter numbers don't have a label, like "chapter_number", and that "chapter" is used to define 2 different things at 2 different levels. Thus you end up retrieving not only the chapter number but also all the other info available on that chapter (and possibly under the other label called "chapter", I can't tell for sure because you didn't include your actual output). Ideally the json file format would be fixed to address these issues. ALL items should have UNIQUE labels. Avoid having the label "chapter" used to define 2 different things at 2 different levels by making the second one something like "chapter_heading" (but be aware that changing this label could break other queries that have the bad label hard-coded in). Then your -e clause would look like "-e '$json/chapter/chapter_number'". However, if you can't do this, some options come to mind:
  1. Only retrieve the first string found under each chapter. This addresses the first issue only.
  2. Only retrieve the first 6 digits found under each chapter (if we can assume the length is always 6). This addresses the first issue only.
  3. Only retrieve the first level of strings under each chapter, since the additional info is at sublevels under the chapter number. In other words, don't do a recursive search. This addresses the first issue only.
  4. Explicitly exclude the other items listed under the chapter: "volume", "chapter", "title", "lang_code", "group_id", "group_name", "group_id_2", "group_name_2", "group_id_3", "group_name_3", and "timestamp". This may require explicitly including the full path, to avoid the confusion over the two different "chapter" labels. But the missing "chapter_number" label will again be a problem, not allowing you to exclude "$json/chapter/chapter_number/chapter" until you make the partial file fix, or "$json/chapter/chapter_number/chapter_heading" after you make the full file fix. If xidel allows wildcards, then something like excluding "$json/chapter/*/chapter" (along with the other labels), may work, without fixing the json file format.
I don't know the syntax for each approach, but hopefully you can look that up. If xidel can't do what you want, you might try redirecting/piping the output to a command line utility that can. There you could remove everything from the output that isn't a 6 digit number, including the timestamp, which is a number with more digits. (There are other 6 digit numbers in the original json file, but hopefully your xidel command does at least filter those out.) If you will include your current output and the O/S you are using, others should be able to tell you how to filter it down to what you need, using a redirect/pipe. SinisterLefty (talk) 15:11, 28 August 2019 (UTC)[reply]

It looks to me like the chapter numbers are dictionary keys rather than values. That is a little bit weird, but ok. From xidel/jsoniq docs you would use something like keys($json.chapter) but that's probably not quite right, and I'd have to install those programs and mess around for a while in order to test. Here is a python 2 script that gets the chapter numbers (as strings):

import json,urllib
class AppURLopener(urllib.FancyURLopener):
    version = "App/1.7"  # from urllib docs
urllib._urlopener = AppURLopener()

url='https://mangadex.org/api/manga/32846'
jstring = urllib.urlopen(url).read()
j = json.loads(jstring)
print j['chapter'].keys()

When I run the above, it prints

[u'668861', u'644663', u'644664', u'576777', u'670108', u'522807', u'504195', u'614751', u'697858', u'626558', u'622223']

173.228.123.207 (talk) 06:57, 29 August 2019 (UTC)[reply]