bibliometrics
-
- Posts: 5
- Joined: October 28 2005
- Affiliation: INAF/Oss. Astronomico Roma/ Italy
- Contact:
bibliometrics
Hi all
I hope this is the correct forum. I am experimenting with a web program that makes some bibliometric statistics (citations etc) on list of names (eg you can input the telephone directory of your department and get a ranking...). It uses the databases at NASA-ADS. If you want to experiment with it and give me some feedback, here it is:
http://odisseo.mporzio.astro.it/bibliomatrix
Luca Amendola
I hope this is the correct forum. I am experimenting with a web program that makes some bibliometric statistics (citations etc) on list of names (eg you can input the telephone directory of your department and get a ranking...). It uses the databases at NASA-ADS. If you want to experiment with it and give me some feedback, here it is:
http://odisseo.mporzio.astro.it/bibliomatrix
Luca Amendola
-
- Posts: 144
- Joined: September 24 2004
- Affiliation: University College London (UCL)
- Contact:
bibliometrics
Quite fun, thanks!
Some feedback: it currently appears to strip off all but the first letter of first name, e.g. "Bridle, Sarah" -> "Bridle, S" and seems to miss some papers (is it scanning PRD? or does it miss papers on which I am Sarah instead of S?).
Sarah
Some feedback: it currently appears to strip off all but the first letter of first name, e.g. "Bridle, Sarah" -> "Bridle, S" and seems to miss some papers (is it scanning PRD? or does it miss papers on which I am Sarah instead of S?).
Sarah
-
- Posts: 129
- Joined: September 24 2004
- Affiliation: University of Rome
- Contact:
bibliometrics
Impressive ! I think you should write to the NASA-ADS people: they should implement it on their web site.
-
- Posts: 129
- Joined: September 24 2004
- Affiliation: University of Rome
- Contact:
bibliometrics
Hi,
Since we are entering a "citations madness mode" I think there is one interesting thing you could do (in my opinion) :-)
We know that first authorships are important. However often the papers are in alphabetical order and people with names starting with A will statistically have more first authorships than people with names starting with Z.
If, with the power of bibliomatrix, you make a graph of first authorships in function of the first letter of the name you should see this correlation. If the correlation is strong, you may think to include a new prefactor that should correct for the "alphabetical order effect" the number of first authorships.
Does it makes any sense ? :-)
Cheers
Alessandro
Since we are entering a "citations madness mode" I think there is one interesting thing you could do (in my opinion) :-)
We know that first authorships are important. However often the papers are in alphabetical order and people with names starting with A will statistically have more first authorships than people with names starting with Z.
If, with the power of bibliomatrix, you make a graph of first authorships in function of the first letter of the name you should see this correlation. If the correlation is strong, you may think to include a new prefactor that should correct for the "alphabetical order effect" the number of first authorships.
Does it makes any sense ? :-)
Cheers
Alessandro
-
- Posts: 87
- Joined: February 24 2005
- Affiliation: Institute of Astronomy, Nicolaus Copernicus University
- Contact:
Re: bibliometrics
Well, I don't think people with names like Amendola or Bridle are likely to be unbiased in deciding the relevance of this issue ;).Alessandro Melchiorri wrote: We know that first authorships are important. However often the papers are in alphabetical order and people with names starting with A will statistically have more first authorships than people with names starting with Z.
If, with the power of bibliomatrix, you make a graph of first authorships in function of the first letter of the name you should see this correlation. If the correlation is strong, you may think to include a new prefactor that should correct for the "alphabetical order effect" the number of first authorships.
Does it makes any sense ? :-)
Cheers
Alessandro
-
- Posts: 5
- Joined: October 28 2005
- Affiliation: INAF/Oss. Astronomico Roma/ Italy
- Contact:
bibliometrics
Thanks for the feedback.
To Sarah: my program should retrieve all of your papers listed at NASA-ADS (and not Spires or anything else) with author Bridle, S. regardless of whether they have been originally submitted as Sarah, Sara or any other S. name (but click on the checkboxes at the bottom for a complete search). If you find some mismatch please let me know exactly which paper is missing.
To Alessandro: I got several suggestions for alternative weight schemes. In fact I am thinking in the future of leaving the choice of weighing scheme to the user (so we can have fun finding the weights that maximimise our performance...MCMC lovers are invited).
Luca
To Sarah: my program should retrieve all of your papers listed at NASA-ADS (and not Spires or anything else) with author Bridle, S. regardless of whether they have been originally submitted as Sarah, Sara or any other S. name (but click on the checkboxes at the bottom for a complete search). If you find some mismatch please let me know exactly which paper is missing.
To Alessandro: I got several suggestions for alternative weight schemes. In fact I am thinking in the future of leaving the choice of weighing scheme to the user (so we can have fun finding the weights that maximimise our performance...MCMC lovers are invited).
Luca
-
- Posts: 8
- Joined: September 25 2004
- Affiliation: University of Oslo
- Contact:
bibliometrics
Hi Luca,
Nice! Thanks for setting up the website.
I just wonder why it is not possible to also include SPIRES references
and citations in your scheme. NASA-ADS misses several citations when one
searches for more theoretical physics articles. Which is a pity if one
works in the interface of particle physics and astrophysics... in my case
it misses about 30% of the citations... :)
Isn't it possible to use both databases and be sure a reference/citation
is only counted once in the search?
All the best,
David
Nice! Thanks for setting up the website.
I just wonder why it is not possible to also include SPIRES references
and citations in your scheme. NASA-ADS misses several citations when one
searches for more theoretical physics articles. Which is a pity if one
works in the interface of particle physics and astrophysics... in my case
it misses about 30% of the citations... :)
Isn't it possible to use both databases and be sure a reference/citation
is only counted once in the search?
All the best,
David
-
- Posts: 144
- Joined: September 24 2004
- Affiliation: University College London (UCL)
- Contact:
bibliometrics
Hi Luca,
Thanks for the tip about the check boxes. I had to check the physics one to get some PRD papers. But then it does find everything.
Sarah
Thanks for the tip about the check boxes. I had to check the physics one to get some PRD papers. But then it does find everything.
Sarah
-
- Posts: 87
- Joined: February 24 2005
- Affiliation: Institute of Astronomy, Nicolaus Copernicus University
- Contact:
bibliometrics - i found a bug (or feature = compliment :)
Wishlist
debugging
Going through the ADS by hand, ticking astro + phys + arXiv, I get 378 citations for all my articles (Roukema B.), but with bibliomatrix I get 421. This is either a bug or a compliment ;). There's not much ambiguity problem with my name. There's a retired(?) US senator (I think) with my surname who gets a good google ranking, but she doesn't seem to publish astronomy articles (luckily).
Luca - i'm just wondering if you could trace the difference here. From neural memory, I think that when I checked bibliomatrix a few days ago, it gave me about 376, so if you changed the code in the last few days, maybe that might explain the difference of an extra 45 citations?
- The ADS has various options for disambiguation, e.g. to distinguish Bond, J. Richard from Bond, James The Name Is, or combining with institute names with AND, etc. - this would be a useful feature for people not wanting to have to correct this by hand.
- bibliomatrix requires ascii characters, but since the ADS seems to assume that ł and l are equivalent (ł is a polish letter), this functions OK (e.g. test on Krełowski seems to give the same result either by hand through ADS or through bibliomatrix)
debugging
Going through the ADS by hand, ticking astro + phys + arXiv, I get 378 citations for all my articles (Roukema B.), but with bibliomatrix I get 421. This is either a bug or a compliment ;). There's not much ambiguity problem with my name. There's a retired(?) US senator (I think) with my surname who gets a good google ranking, but she doesn't seem to publish astronomy articles (luckily).
Luca - i'm just wondering if you could trace the difference here. From neural memory, I think that when I checked bibliomatrix a few days ago, it gave me about 376, so if you changed the code in the last few days, maybe that might explain the difference of an extra 45 citations?
-
- Posts: 5
- Joined: October 28 2005
- Affiliation: INAF/Oss. Astronomico Roma/ Italy
- Contact:
bibliometrics
The problem was infact due to the way ADS dealt with dates of preprints, assigning them both the date of the preprint and the date of the final publication. This, combined with Bibliomatrix's time splitting procedure for long list of names, caused in some rare cases double-counting of some papers. This problem has now been solved by the (incredibly efficient) staff at ADS and, as a consequence, also Bibliomatrix is now working properly. Unfortunately this means that there are no longer citations for free.Going through the ADS by hand, ticking astro + phys + arXiv, I get 378 citations for all my articles (Roukema B.), but with bibliomatrix I get 421. This is either a bug or a compliment ;). There's not much ambiguity problem with my name. There's a retired(?) US senator (I think) with my surname who gets a good google ranking, but she doesn't seem to publish astronomy articles (luckily).
-
- Posts: 87
- Joined: February 24 2005
- Affiliation: Institute of Astronomy, Nicolaus Copernicus University
- Contact:
Re: bibliometrics
Thanks for debugging so quickly :). I checked and it looks OK now.
Just to clarify this for people who haven't been following the last few comments: Luca means there are no longer "gratuitous" citations (extra citations due to a software bug ;).Luca Amendola wrote:Unfortunately this means that there are no longer citations for free.
-
- Posts: 183
- Joined: September 24 2004
- Affiliation: Brookhaven National Laboratory
- Contact:
bibliometrics
In the banana republic of Slovenia there is this concept of "clean citations". A "clean citation" is citation for which citing and cited article have no common authors. The "exclude self-citations" on ADS doesn't quite achieve this effect. Does anybody know how to get this number easily?
Also, has anybody else noticed that those greedy idiots at SCI Web of Science not only have a terrible interface but also seem to be missing lots of citations, seemingly because they don't count papers that were cited as astro-ph papers (even if published later) (but maybe I am just using it in a wrong manner) (again, in the BROS SCI is the only things that counts).
Also, has anybody else noticed that those greedy idiots at SCI Web of Science not only have a terrible interface but also seem to be missing lots of citations, seemingly because they don't count papers that were cited as astro-ph papers (even if published later) (but maybe I am just using it in a wrong manner) (again, in the BROS SCI is the only things that counts).
-
- Posts: 87
- Joined: February 24 2005
- Affiliation: Institute of Astronomy, Nicolaus Copernicus University
- Contact:
bibliometrics - non-ascii letters in names and normalising b
hi luca,
non-ascii letters in names
Although the ADS interactive interface correctly deals with names with non-ascii characters (if you enter them as ascii), your script seems to ignore most of the names which are in non-ascii characters.
You can test this with names like
Katarzyński
Goździewski
or in ascii:
Katarzynski
Gozdziewski
i think it's OK to keep the input in ascii, but it's a problem that the result ignores the non-ascii entries.
The simplest solution would probably be after retrieving the data from the ADS, replace all letters like ń,ż,ł,ą,ę,ś,ć,ó,ż and also non-polish latin-2 and latin-1 letters by their ascii equivalents.
Here's a uuencoded sed script to convert latin-2 polish letters to ascii:
can also help e.g. except that iconv crashes whenever it finds a letter that it doesn't recognise. Maybe there's an "ignore-errors" option - it's GPL so can be used freely provided your own code is GPL-compatible.
h-statistic
Jorge E. Hirsch's statistic looks pretty cool - http://arxiv.org/abs/physics/0508025 - any chance of also calculating scientific age (years since first published paper) = n and then calculating
m = h/n ?
This essentially gives that m= 1.0 is a good scientist and 2.0 is an excellent scientist.
non-ascii letters in names
Although the ADS interactive interface correctly deals with names with non-ascii characters (if you enter them as ascii), your script seems to ignore most of the names which are in non-ascii characters.
You can test this with names like
Katarzyński
Goździewski
or in ascii:
Katarzynski
Gozdziewski
i think it's OK to keep the input in ascii, but it's a problem that the result ignores the non-ascii entries.
The simplest solution would probably be after retrieving the data from the ADS, replace all letters like ń,ż,ł,ą,ę,ś,ć,ó,ż and also non-polish latin-2 and latin-1 letters by their ascii equivalents.
Here's a uuencoded sed script to convert latin-2 polish letters to ascii:
Code: Select all
begin 744 iso2_drop.sh
M(R$O8FEN+W-H"B,C(&QI8V5N8V4Z($=03"!O<B!P=6)L:6,@9&]M86EN(&%S
M('EO=2!L:6ME("!"+B!2;W5K96UA(#(P,#8*(R,@<V-R:7!T('1O(')E<&QA
M8V4@25-/+3@X-3DM,B`H:7-O+6QA=&EN+3(I(&-H87)A8W1E<G,@8GD@<&QA
M:6X@87-C:6D*(R,@92YG+B`@L2!B96-O;65S(&$*(R,@"B,C('5S86=E.@HC
M(R!I<V\R7V1R;W`N<V@@:6YP=71F:6QE(`H*96-H;R`G)24@5T%23DE.1SH@
M5&AI<R!I<R!A(%9%4ED@<VEM<&QE('-C<FEP="X@270@:7,@;F]T('5S97(@
M9G)I96YD;'DN)PIE8VAO("<E)2!U<V%G93HG"F5C:&\@)R4E(&ES;S)S;&%S
M:"YS:"!I;G!U=&9I;&4@)PH*<V5D("UE("=S?+%\87PG9R`D,2!\(%P*<V5D
M("UE("=S?.9\8WPG9R`@?"!<"G-E9"`M92`G<WSJ?&5\)V<@('P@7`IS960@
M+64@)W-\LWQL?"=G("!\(%P*<V5D("UE("=S?/%\;GPG9R`@?"!<"G-E9"`M
M92`G<WSS?&]\)V<@('P@7`IS960@+64@)W-\MGQS?"=G("!\(%P*<V5D("UE
M("=S?+]\>GPG9R`@?"!<"G-E9"`M92`G<WR\?'A\)V<@('P@7`IS960@+64@
M)W-\H7Q!?"=G("!\(%P*<V5D("UE("=S?,9\0WPG9R`@?"!<"G-E9"`M92`G
M<WS*?$5\)V<@('P@7`IS960@+64@)W-\HWQ,?"=G("!\(%P*<V5D("UE("=S
M?-%\3GPG9R`@?"!<"G-E9"`M92`G<WS3?$]\)V<@('P@7`IS960@+64@)W-\
MIGQ3?"=G("!\(%P*<V5D("UE("=S?*]\6GPG9R`@?"!<"G-E9"`M92`G<WRL
0?%A\)V<@(`H*97AI="`Q"@``
`
end
Code: Select all
iconv
Code: Select all
iconv -f utf8 -t latin2
h-statistic
Jorge E. Hirsch's statistic looks pretty cool - http://arxiv.org/abs/physics/0508025 - any chance of also calculating scientific age (years since first published paper) = n and then calculating
m = h/n ?
This essentially gives that m= 1.0 is a good scientist and 2.0 is an excellent scientist.
-
- Posts: 5
- Joined: October 28 2005
- Affiliation: INAF/Oss. Astronomico Roma/ Italy
- Contact:
bibliometrics
Boud, thanks for the feedback.
I will soon add the polish characters in my script' list of non-ascii characters
so that they are consistently replaced with the corresponding ascii characters both in input and in output (of course this is what the script was meant to do, but some characters escaped my attention).
As for your m=h/age, in fact you can do it easily from the results of Bibliomatrix: for a single author in fact you have in output h and the number of publications ordered by year. You can also do m'=h/authors/age which I am sure is a measure of something, although don't know what.
I could do it also for a list of authors, maybe in the future...
I will soon add the polish characters in my script' list of non-ascii characters
so that they are consistently replaced with the corresponding ascii characters both in input and in output (of course this is what the script was meant to do, but some characters escaped my attention).
As for your m=h/age, in fact you can do it easily from the results of Bibliomatrix: for a single author in fact you have in output h and the number of publications ordered by year. You can also do m'=h/authors/age which I am sure is a measure of something, although don't know what.
I could do it also for a list of authors, maybe in the future...
-
- Posts: 87
- Joined: February 24 2005
- Affiliation: Institute of Astronomy, Nicolaus Copernicus University
- Contact:
Re: bibliometrics
hi Luca,
There are a whole lot of HESS papers that have come out last year with about 140 or so authors. There are some people at my institute a bit puzzled that their 1/140-th or so contribution to these papers does not get recognised by the present bibliomatrix script. Any chance that the server can cope with doubling the limit from 80 to 160 authors to search among when deciding co-authorship?
cheers
boud
There are a whole lot of HESS papers that have come out last year with about 140 or so authors. There are some people at my institute a bit puzzled that their 1/140-th or so contribution to these papers does not get recognised by the present bibliomatrix script. Any chance that the server can cope with doubling the limit from 80 to 160 authors to search among when deciding co-authorship?
cheers
boud