Discussion:
GSOC 2015 Call for Ideas
(too old to reply)
Serge Stinckwich
2015-02-15 14:23:46 UTC
Permalink
Dear pharoers,

this year Pharo consortium (and community) is going to take part in a
Google Summer of Code event[1] as a standalone organization. This is
an opportunity to promote Pharo, get some job done and have students
paid.

Currently we are at the most important stage as we are preparing the
organization application, and hoping that we will be accepted and
granted decent amount of project slots. Everyone can help with
application by submitting ideas for student projects.

Current list can be found at:
https://github.com/pharo-project/pharo-project-proposals/blob/master/Topics.st

It is in STON format, and result is being generated at: http://gsoc.pharo.org/

Please add your ideas following the format of existing projects and
open a pull request with them (you will need a github account).
Preferably submit ideas with possible mentors, but if none are
available at the moment ideas without mentors are also welcome.

The template to submit projects is :

PharoTopic new
title: 'The name of your project;
contact: 'email address';
supervisors: 'Supervisors names';
keywords: 'keywords separated by spaces;
context: 'a description of the context of the project';
goal: 'description of the goal';
level: 'Beginner or Intermediate or Advanced';
yourself.

We will need a lot of projects/idea before February 20th 2015, the
deadline for applying to GSOC 2015.

Do not hesitate to ask questions. Administrators of this year’s
application are Serge Stinckwich <***@gmail.com> and
Yuriy Tymchuk <***@me.com>

If you don't know how to edit the list, please send your project
following the template to the administrators.

[1]: https://www.google-melange.com/gsoc/homepage/google/gsoc2015

Cheers,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
Sebastian Sastre
2015-02-15 19:48:47 UTC
Permalink
Great!

Forked repo, starred and PR sent:
https://github.com/pharo-project/pharo-project-proposals/pull/1 <https://github.com/pharo-project/pharo-project-proposals/pull/1>

keep up the good work!
Post by Serge Stinckwich
Dear pharoers,
this year Pharo consortium (and community) is going to take part in a
Google Summer of Code event[1] as a standalone organization. This is
an opportunity to promote Pharo, get some job done and have students
paid.
Currently we are at the most important stage as we are preparing the
organization application, and hoping that we will be accepted and
granted decent amount of project slots. Everyone can help with
application by submitting ideas for student projects.
https://github.com/pharo-project/pharo-project-proposals/blob/master/Topics.st
It is in STON format, and result is being generated at: http://gsoc.pharo.org/
Please add your ideas following the format of existing projects and
open a pull request with them (you will need a github account).
Preferably submit ideas with possible mentors, but if none are
available at the moment ideas without mentors are also welcome.
PharoTopic new
title: 'The name of your project;
contact: 'email address';
supervisors: 'Supervisors names';
keywords: 'keywords separated by spaces;
context: 'a description of the context of the project';
goal: 'description of the goal';
level: 'Beginner or Intermediate or Advanced';
yourself.
We will need a lot of projects/idea before February 20th 2015, the
deadline for applying to GSOC 2015.
Do not hesitate to ask questions. Administrators of this year’s
If you don't know how to edit the list, please send your project
following the template to the administrators.
[1]: https://www.google-melange.com/gsoc/homepage/google/gsoc2015
Cheers,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
Ben Coman
2015-02-16 14:48:43 UTC
Permalink
Looks like a great

On Mon, Feb 16, 2015 at 3:48 AM, Sebastian Sastre <
Post by Sebastian Sastre
Great!
https://github.com/pharo-project/pharo-project-proposals/pull/1
keep up the good work!
I see this is merged 19 hours ago, but it doesn't show on the page for me.
e.g. "7GUIs"


btw,
* Perhaps pharo-users would be a less daunting mail list for students begin
their enquiries on.
* Any chance of having working urls so its easier to engage candidates with
links to further info?
Serge Stinckwich
2015-02-17 18:28:04 UTC
Permalink
We have something like 45 projects ideas at the moment.
We really need more project ideas from more people (not only RMOD guys).

Even if you have a vague idea, you can contribute.

Thank you.

Regards,

On Sun, Feb 15, 2015 at 3:23 PM, Serge Stinckwich
Post by Serge Stinckwich
Dear pharoers,
this year Pharo consortium (and community) is going to take part in a
Google Summer of Code event[1] as a standalone organization. This is
an opportunity to promote Pharo, get some job done and have students
paid.
Currently we are at the most important stage as we are preparing the
organization application, and hoping that we will be accepted and
granted decent amount of project slots. Everyone can help with
application by submitting ideas for student projects.
https://github.com/pharo-project/pharo-project-proposals/blob/master/Topics.st
It is in STON format, and result is being generated at: http://gsoc.pharo.org/
Please add your ideas following the format of existing projects and
open a pull request with them (you will need a github account).
Preferably submit ideas with possible mentors, but if none are
available at the moment ideas without mentors are also welcome.
PharoTopic new
title: 'The name of your project;
contact: 'email address';
supervisors: 'Supervisors names';
keywords: 'keywords separated by spaces;
context: 'a description of the context of the project';
goal: 'description of the goal';
level: 'Beginner or Intermediate or Advanced';
yourself.
We will need a lot of projects/idea before February 20th 2015, the
deadline for applying to GSOC 2015.
Do not hesitate to ask questions. Administrators of this year’s
If you don't know how to edit the list, please send your project
following the template to the administrators.
[1]: https://www.google-melange.com/gsoc/homepage/google/gsoc2015
Cheers,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
Andrea Ferretti
2015-02-18 08:52:02 UTC
Permalink
I am not an expert in Pharo in any conceivable way, but I have tried
to learn it over the past few weeks.

I think that an area where the interactive tools of Pharo would really
shine is that of data visualization, inspection and mining. Pharo
already has very good tools - such as Roassal2 - for data
visualization, but I think it lacks support for machine learning and
scientific computing in general. The most useful thing I would like to
come to Pharo (apart from performance improvements in the VM itself)
would be something like scikit-learn for python. Also, these tasks
often involve consuming data from various sources, such as CSV and
Json files. NeoCSV and NeoJSON are still a little too rigid for the
task - libraries like pandas allow to just feed a csv file and try to
make head or tails of the content without having to define too much of
a schema beforehand

The second thing I can think of is a modern, minimal web framework. I
know of Seaside and Seaside REST, but it feels too big and complex.
Most web projects as of today involve some kind of stateless API that
is consumed by single page applications or mobile apps, and Seaside
does not really feel like a good solution for these cases. I would
like to see something much smaller, like Spray (scala) or Sinatra
(ruby).
Post by Serge Stinckwich
We have something like 45 projects ideas at the moment.
We really need more project ideas from more people (not only RMOD guys).
Even if you have a vague idea, you can contribute.
Thank you.
Regards,
On Sun, Feb 15, 2015 at 3:23 PM, Serge Stinckwich
Post by Serge Stinckwich
Dear pharoers,
this year Pharo consortium (and community) is going to take part in a
Google Summer of Code event[1] as a standalone organization. This is
an opportunity to promote Pharo, get some job done and have students
paid.
Currently we are at the most important stage as we are preparing the
organization application, and hoping that we will be accepted and
granted decent amount of project slots. Everyone can help with
application by submitting ideas for student projects.
https://github.com/pharo-project/pharo-project-proposals/blob/master/Topics.st
It is in STON format, and result is being generated at: http://gsoc.pharo.org/
Please add your ideas following the format of existing projects and
open a pull request with them (you will need a github account).
Preferably submit ideas with possible mentors, but if none are
available at the moment ideas without mentors are also welcome.
PharoTopic new
title: 'The name of your project;
contact: 'email address';
supervisors: 'Supervisors names';
keywords: 'keywords separated by spaces;
context: 'a description of the context of the project';
goal: 'description of the goal';
level: 'Beginner or Intermediate or Advanced';
yourself.
We will need a lot of projects/idea before February 20th 2015, the
deadline for applying to GSOC 2015.
Do not hesitate to ask questions. Administrators of this year’s
If you don't know how to edit the list, please send your project
following the template to the administrators.
[1]: https://www.google-melange.com/gsoc/homepage/google/gsoc2015
Cheers,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
j***@objektfabrik.de
2015-02-18 09:00:55 UTC
Permalink
Andrea,

not sure if I would call Sinatra lightweight.

If you want to build a RESTFul API, there is absolutely no need for
Seaside at all. You can just use the underlying HTTP server layer, in
Pharo that would be Zinc. There also is a REST interface for Zinc, so
all you need is there already. It's just that there is not enough noise
around it ;-)

I cannot comment on your machine learning and csv comments. I don't feel
a need for these or can live with what's available quite well (e.g.
NeoCSV). But that may be due to the fact that I mostly write boring
business software, where data files are well-defined for
interoperability anyways - or the format is my responsibility, which
makes things even easier ;-) .

Joachim
Post by Andrea Ferretti
I am not an expert in Pharo in any conceivable way, but I have tried
to learn it over the past few weeks.
I think that an area where the interactive tools of Pharo would really
shine is that of data visualization, inspection and mining. Pharo
already has very good tools - such as Roassal2 - for data
visualization, but I think it lacks support for machine learning and
scientific computing in general. The most useful thing I would like to
come to Pharo (apart from performance improvements in the VM itself)
would be something like scikit-learn for python. Also, these tasks
often involve consuming data from various sources, such as CSV and
Json files. NeoCSV and NeoJSON are still a little too rigid for the
task - libraries like pandas allow to just feed a csv file and try to
make head or tails of the content without having to define too much of
a schema beforehand
The second thing I can think of is a modern, minimal web framework. I
know of Seaside and Seaside REST, but it feels too big and complex.
Most web projects as of today involve some kind of stateless API that
is consumed by single page applications or mobile apps, and Seaside
does not really feel like a good solution for these cases. I would
like to see something much smaller, like Spray (scala) or Sinatra
(ruby).
Post by Serge Stinckwich
We have something like 45 projects ideas at the moment.
We really need more project ideas from more people (not only RMOD guys).
Even if you have a vague idea, you can contribute.
Thank you.
Regards,
On Sun, Feb 15, 2015 at 3:23 PM, Serge Stinckwich
Post by Serge Stinckwich
Dear pharoers,
this year Pharo consortium (and community) is going to take part in a
Google Summer of Code event[1] as a standalone organization. This is
an opportunity to promote Pharo, get some job done and have students
paid.
Currently we are at the most important stage as we are preparing the
organization application, and hoping that we will be accepted and
granted decent amount of project slots. Everyone can help with
application by submitting ideas for student projects.
https://github.com/pharo-project/pharo-project-proposals/blob/master/Topics.st
It is in STON format, and result is being generated at: http://gsoc.pharo.org/
Please add your ideas following the format of existing projects and
open a pull request with them (you will need a github account).
Preferably submit ideas with possible mentors, but if none are
available at the moment ideas without mentors are also welcome.
PharoTopic new
title: 'The name of your project;
contact: 'email address';
supervisors: 'Supervisors names';
keywords: 'keywords separated by spaces;
context: 'a description of the context of the project';
goal: 'description of the goal';
level: 'Beginner or Intermediate or Advanced';
yourself.
We will need a lot of projects/idea before February 20th 2015, the
deadline for applying to GSOC 2015.
Do not hesitate to ask questions. Administrators of this year’s
If you don't know how to edit the list, please send your project
following the template to the administrators.
[1]: https://www.google-melange.com/gsoc/homepage/google/gsoc2015
Cheers,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
--
-----------------------------------------------------------------------
Objektfabrik Joachim Tuchel mailto:***@objektfabrik.de
Fliederweg 1 http://www.objektfabrik.de
D-71640 Ludwigsburg http://joachimtuchel.wordpress.com
Telefon: +49 7141 56 10 86 0 Fax: +49 7141 56 10 86 1
Sven Van Caekenberghe
2015-02-18 09:12:48 UTC
Permalink
Post by Andrea Ferretti
Also, these tasks
often involve consuming data from various sources, such as CSV and
Json files. NeoCSV and NeoJSON are still a little too rigid for the
task - libraries like pandas allow to just feed a csv file and try to
make head or tails of the content without having to define too much of
a schema beforehand
Both NeoCSV and NeoJSON can operate in two ways, (1) without the definition of any schema's or (2) with the definition of schema's and mappings. The quick and dirty explore style is most certainly possible.

'my-data.csv' asFileReference readStreamDo: [ :in | (NeoCSVReader on: in) upToEnd ].

=> an array of arrays

'my-data.json' asFileReference readStreamDo: [ :in | (NeoJSONReader on: in) next ].

=> objects structured using dictionaries and arrays

Sven
Andrea Ferretti
2015-02-18 09:26:26 UTC
Permalink
Thank you Sven. I think this should be emphasized and prominent on the
home page*. Still, libraries such as pandas are even more lenient,
doing things such as:

- autodetecting which fields are numeric in CSV files
- allowing to fill missing data based on statistics (for instance, you
can say: where the field `age` is missing, use the average age)

Probably there is room for something built on top of Neo


* by the way, I suggest that the documentation on Neo could benefit
from a reorganization. Right now, the first topic on the NeoJSON
paper introduces JSON itself. I would argue that everyone that tries
to use the library knows what JSON is already. Still, there is no
example of how to read JSON from a file in the whole document.
Post by Sven Van Caekenberghe
Post by Andrea Ferretti
Also, these tasks
often involve consuming data from various sources, such as CSV and
Json files. NeoCSV and NeoJSON are still a little too rigid for the
task - libraries like pandas allow to just feed a csv file and try to
make head or tails of the content without having to define too much of
a schema beforehand
Both NeoCSV and NeoJSON can operate in two ways, (1) without the definition of any schema's or (2) with the definition of schema's and mappings. The quick and dirty explore style is most certainly possible.
'my-data.csv' asFileReference readStreamDo: [ :in | (NeoCSVReader on: in) upToEnd ].
=> an array of arrays
'my-data.json' asFileReference readStreamDo: [ :in | (NeoJSONReader on: in) next ].
=> objects structured using dictionaries and arrays
Sven
Andrea Ferretti
2015-02-18 09:35:15 UTC
Permalink
For an example of what I am talking about, see

http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#csv-text-files

I agree that this is definitely too much options, but it gets the job
done for quick and dirty exploration.

The fact is that working with a dump of table on your db, whose
content you know, requires different tools than exploring the latest
opendata that your local municipality has put online, using yet
another messy format.

Enterprise programmers deal more often with the former, data
scientists with the latter, and I think there is room for both kind of
tools
Post by Andrea Ferretti
Thank you Sven. I think this should be emphasized and prominent on the
home page*. Still, libraries such as pandas are even more lenient,
- autodetecting which fields are numeric in CSV files
- allowing to fill missing data based on statistics (for instance, you
can say: where the field `age` is missing, use the average age)
Probably there is room for something built on top of Neo
* by the way, I suggest that the documentation on Neo could benefit
from a reorganization. Right now, the first topic on the NeoJSON
paper introduces JSON itself. I would argue that everyone that tries
to use the library knows what JSON is already. Still, there is no
example of how to read JSON from a file in the whole document.
Post by Sven Van Caekenberghe
Post by Andrea Ferretti
Also, these tasks
often involve consuming data from various sources, such as CSV and
Json files. NeoCSV and NeoJSON are still a little too rigid for the
task - libraries like pandas allow to just feed a csv file and try to
make head or tails of the content without having to define too much of
a schema beforehand
Both NeoCSV and NeoJSON can operate in two ways, (1) without the definition of any schema's or (2) with the definition of schema's and mappings. The quick and dirty explore style is most certainly possible.
'my-data.csv' asFileReference readStreamDo: [ :in | (NeoCSVReader on: in) upToEnd ].
=> an array of arrays
'my-data.json' asFileReference readStreamDo: [ :in | (NeoJSONReader on: in) next ].
=> objects structured using dictionaries and arrays
Sven
Sven Van Caekenberghe
2015-02-18 09:39:18 UTC
Permalink
Well, you are certainly free to contribute.

Heuristic interpretation of data could be useful, but looks like an addition on top, the core library should be fast and efficient.
Post by Andrea Ferretti
For an example of what I am talking about, see
http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#csv-text-files
I agree that this is definitely too much options, but it gets the job
done for quick and dirty exploration.
The fact is that working with a dump of table on your db, whose
content you know, requires different tools than exploring the latest
opendata that your local municipality has put online, using yet
another messy format.
Enterprise programmers deal more often with the former, data
scientists with the latter, and I think there is room for both kind of
tools
Post by Andrea Ferretti
Thank you Sven. I think this should be emphasized and prominent on the
home page*. Still, libraries such as pandas are even more lenient,
- autodetecting which fields are numeric in CSV files
- allowing to fill missing data based on statistics (for instance, you
can say: where the field `age` is missing, use the average age)
Probably there is room for something built on top of Neo
* by the way, I suggest that the documentation on Neo could benefit
from a reorganization. Right now, the first topic on the NeoJSON
paper introduces JSON itself. I would argue that everyone that tries
to use the library knows what JSON is already. Still, there is no
example of how to read JSON from a file in the whole document.
Post by Sven Van Caekenberghe
Post by Andrea Ferretti
Also, these tasks
often involve consuming data from various sources, such as CSV and
Json files. NeoCSV and NeoJSON are still a little too rigid for the
task - libraries like pandas allow to just feed a csv file and try to
make head or tails of the content without having to define too much of
a schema beforehand
Both NeoCSV and NeoJSON can operate in two ways, (1) without the definition of any schema's or (2) with the definition of schema's and mappings. The quick and dirty explore style is most certainly possible.
'my-data.csv' asFileReference readStreamDo: [ :in | (NeoCSVReader on: in) upToEnd ].
=> an array of arrays
'my-data.json' asFileReference readStreamDo: [ :in | (NeoJSONReader on: in) next ].
=> objects structured using dictionaries and arrays
Sven
Andrea Ferretti
2015-02-18 09:53:28 UTC
Permalink
I am sorry if the previous messages came off as too harsh. The Neo
tools are perfectly fine for their intended use.

What I was trying to say is that a good idea for a SoC project would
be to develop a framework for data analysis that would be useful for
data scientists, and in particular this would include something to
import unstructured data more freely.
Post by Sven Van Caekenberghe
Well, you are certainly free to contribute.
Heuristic interpretation of data could be useful, but looks like an addition on top, the core library should be fast and efficient.
Post by Andrea Ferretti
For an example of what I am talking about, see
http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#csv-text-files
I agree that this is definitely too much options, but it gets the job
done for quick and dirty exploration.
The fact is that working with a dump of table on your db, whose
content you know, requires different tools than exploring the latest
opendata that your local municipality has put online, using yet
another messy format.
Enterprise programmers deal more often with the former, data
scientists with the latter, and I think there is room for both kind of
tools
Post by Andrea Ferretti
Thank you Sven. I think this should be emphasized and prominent on the
home page*. Still, libraries such as pandas are even more lenient,
- autodetecting which fields are numeric in CSV files
- allowing to fill missing data based on statistics (for instance, you
can say: where the field `age` is missing, use the average age)
Probably there is room for something built on top of Neo
* by the way, I suggest that the documentation on Neo could benefit
from a reorganization. Right now, the first topic on the NeoJSON
paper introduces JSON itself. I would argue that everyone that tries
to use the library knows what JSON is already. Still, there is no
example of how to read JSON from a file in the whole document.
Post by Sven Van Caekenberghe
Post by Andrea Ferretti
Also, these tasks
often involve consuming data from various sources, such as CSV and
Json files. NeoCSV and NeoJSON are still a little too rigid for the
task - libraries like pandas allow to just feed a csv file and try to
make head or tails of the content without having to define too much of
a schema beforehand
Both NeoCSV and NeoJSON can operate in two ways, (1) without the definition of any schema's or (2) with the definition of schema's and mappings. The quick and dirty explore style is most certainly possible.
'my-data.csv' asFileReference readStreamDo: [ :in | (NeoCSVReader on: in) upToEnd ].
=> an array of arrays
'my-data.json' asFileReference readStreamDo: [ :in | (NeoJSONReader on: in) next ].
=> objects structured using dictionaries and arrays
Sven
Sven Van Caekenberghe
2015-02-18 10:01:12 UTC
Permalink
OK, try making a proposal then, http://gsoc.pharo.org has the instructions and the current list, you probably know more about data science than I do.
Post by Andrea Ferretti
I am sorry if the previous messages came off as too harsh. The Neo
tools are perfectly fine for their intended use.
What I was trying to say is that a good idea for a SoC project would
be to develop a framework for data analysis that would be useful for
data scientists, and in particular this would include something to
import unstructured data more freely.
Post by Sven Van Caekenberghe
Well, you are certainly free to contribute.
Heuristic interpretation of data could be useful, but looks like an addition on top, the core library should be fast and efficient.
Post by Andrea Ferretti
For an example of what I am talking about, see
http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#csv-text-files
I agree that this is definitely too much options, but it gets the job
done for quick and dirty exploration.
The fact is that working with a dump of table on your db, whose
content you know, requires different tools than exploring the latest
opendata that your local municipality has put online, using yet
another messy format.
Enterprise programmers deal more often with the former, data
scientists with the latter, and I think there is room for both kind of
tools
Post by Andrea Ferretti
Thank you Sven. I think this should be emphasized and prominent on the
home page*. Still, libraries such as pandas are even more lenient,
- autodetecting which fields are numeric in CSV files
- allowing to fill missing data based on statistics (for instance, you
can say: where the field `age` is missing, use the average age)
Probably there is room for something built on top of Neo
* by the way, I suggest that the documentation on Neo could benefit
from a reorganization. Right now, the first topic on the NeoJSON
paper introduces JSON itself. I would argue that everyone that tries
to use the library knows what JSON is already. Still, there is no
example of how to read JSON from a file in the whole document.
Post by Sven Van Caekenberghe
Post by Andrea Ferretti
Also, these tasks
often involve consuming data from various sources, such as CSV and
Json files. NeoCSV and NeoJSON are still a little too rigid for the
task - libraries like pandas allow to just feed a csv file and try to
make head or tails of the content without having to define too much of
a schema beforehand
Both NeoCSV and NeoJSON can operate in two ways, (1) without the definition of any schema's or (2) with the definition of schema's and mappings. The quick and dirty explore style is most certainly possible.
'my-data.csv' asFileReference readStreamDo: [ :in | (NeoCSVReader on: in) upToEnd ].
=> an array of arrays
'my-data.json' asFileReference readStreamDo: [ :in | (NeoJSONReader on: in) next ].
=> objects structured using dictionaries and arrays
Sven
Andrea Ferretti
2015-02-18 10:06:52 UTC
Permalink
I am sorry, I must have misunderstood the purpose of this thread. I
read "Even if you have a vague idea, you can contribute." and tried to
give a couple of vague ideas.

I did not really mean that I would be able or have time to mentor such a project
Post by Sven Van Caekenberghe
OK, try making a proposal then, http://gsoc.pharo.org has the instructions and the current list, you probably know more about data science than I do.
Post by Andrea Ferretti
I am sorry if the previous messages came off as too harsh. The Neo
tools are perfectly fine for their intended use.
What I was trying to say is that a good idea for a SoC project would
be to develop a framework for data analysis that would be useful for
data scientists, and in particular this would include something to
import unstructured data more freely.
Post by Sven Van Caekenberghe
Well, you are certainly free to contribute.
Heuristic interpretation of data could be useful, but looks like an addition on top, the core library should be fast and efficient.
Post by Andrea Ferretti
For an example of what I am talking about, see
http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#csv-text-files
I agree that this is definitely too much options, but it gets the job
done for quick and dirty exploration.
The fact is that working with a dump of table on your db, whose
content you know, requires different tools than exploring the latest
opendata that your local municipality has put online, using yet
another messy format.
Enterprise programmers deal more often with the former, data
scientists with the latter, and I think there is room for both kind of
tools
Post by Andrea Ferretti
Thank you Sven. I think this should be emphasized and prominent on the
home page*. Still, libraries such as pandas are even more lenient,
- autodetecting which fields are numeric in CSV files
- allowing to fill missing data based on statistics (for instance, you
can say: where the field `age` is missing, use the average age)
Probably there is room for something built on top of Neo
* by the way, I suggest that the documentation on Neo could benefit
from a reorganization. Right now, the first topic on the NeoJSON
paper introduces JSON itself. I would argue that everyone that tries
to use the library knows what JSON is already. Still, there is no
example of how to read JSON from a file in the whole document.
Post by Sven Van Caekenberghe
Post by Andrea Ferretti
Also, these tasks
often involve consuming data from various sources, such as CSV and
Json files. NeoCSV and NeoJSON are still a little too rigid for the
task - libraries like pandas allow to just feed a csv file and try to
make head or tails of the content without having to define too much of
a schema beforehand
Both NeoCSV and NeoJSON can operate in two ways, (1) without the definition of any schema's or (2) with the definition of schema's and mappings. The quick and dirty explore style is most certainly possible.
'my-data.csv' asFileReference readStreamDo: [ :in | (NeoCSVReader on: in) upToEnd ].
=> an array of arrays
'my-data.json' asFileReference readStreamDo: [ :in | (NeoJSONReader on: in) next ].
=> objects structured using dictionaries and arrays
Sven
stepharo
2015-02-23 08:46:00 UTC
Permalink
Post by Andrea Ferretti
I am sorry, I must have misunderstood the purpose of this thread. I
read "Even if you have a vague idea, you can contribute." and tried to
give a couple of vague ideas.
no this is good :) We should get influenced and ideas from other people.

Now you should consider that since you know better than us certain
domains, you could start to
have a little project to improve the situation (you know one brick at a
time) and doing so you will become an accomplished and zen pharoer :)
May be with something as simple as one hour per week you can really
improve the situation.

Stef
Serge Stinckwich
2015-02-18 10:14:53 UTC
Permalink
Post by Sven Van Caekenberghe
OK, try making a proposal then, http://gsoc.pharo.org has the instructions and the current list, you probably know more about data science than I do.
Post by Andrea Ferretti
I am sorry if the previous messages came off as too harsh. The Neo
tools are perfectly fine for their intended use.
What I was trying to say is that a good idea for a SoC project would
be to develop a framework for data analysis that would be useful for
data scientists, and in particular this would include something to
import unstructured data more freely.
Sorry Andrea. I didn't see you message because I'm not pharo-users
mailing-list, only on pharo-dev.
I'm also really interested to have a gsoc project to develop data
analysis framework.
Please let's talk together in order to discuss about a proposal.

Regards,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
Andrea Ferretti
2015-02-19 08:36:26 UTC
Permalink
Hi Serge,

as I said I do not really have the time now to get involved in a gsoc
proposal, but I can give you my perspective. There are two sides to
the story.

The first one is complementary to SciSmalltalk: in order to analize
data, you need to get data in first. So, one may want to read - say -
a CSV, and have a number of heuristics, such as:

- autodetection of encoding
- autodetection of quotes and delimiter
- autodetection of columns containing numbers or dates
- the possibility to indicate that some markers, such as "N/A",
represent missing values
- the possibility to indicate a replacement for missing values, such
as 0, or "", or the average or the minimum of the other values in the
colums

See http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#csv-text-files
for some examples.

It may be worth to consider making this into a sequence that is read
and processed lazily, to deal with CSV files bigger than memory.

When data is finally in, usually the first task is doing some
processing, inspection or visualization. The Smalltalk collections are
good for processing (although some lazy variants might help), and
Roassal and the inspectors are perfect for visualization and browsing.

The second part comes the time when one wants to run some algorithm.
While there is no need to have the fanciest ones, there should be some
of the basics, such as:

- some form or regression (linear, logistic...)
- some form of clustering (kmeans, dbscan, canopy...)
- SVM

Another thing which would be useful is support for linear algebra,
leveraging native libraries such as BLAS or LAPACK.

In short: just copying R, or numpy + pandas + scikit-learn would
already be a giant leap forward.

Actually, some of the things I have mentioned above are already (I
think) in SciSmalltalk, which brings me to the next point:
documentation. There is really no point in having all these tools if
people do not know they are there.

For this to become useful, there should be a dedicated site,
highlighting what is already available, in what state (experimental,
partial, stable...) and how to use it.

Ideally, I would include also some tutorials, for instance for dealing
with standard problems such as Kaggle competitions. Here I think
Smalltalk would have an edge, since these tutorial could be in the
form of Prof Stef. Still, it would be nice if some form of the
tutorials was also on the web, which makes it discoverable.

Best,
Andrea
Post by Serge Stinckwich
Post by Sven Van Caekenberghe
OK, try making a proposal then, http://gsoc.pharo.org has the instructions and the current list, you probably know more about data science than I do.
Post by Andrea Ferretti
I am sorry if the previous messages came off as too harsh. The Neo
tools are perfectly fine for their intended use.
What I was trying to say is that a good idea for a SoC project would
be to develop a framework for data analysis that would be useful for
data scientists, and in particular this would include something to
import unstructured data more freely.
Sorry Andrea. I didn't see you message because I'm not pharo-users
mailing-list, only on pharo-dev.
I'm also really interested to have a gsoc project to develop data
analysis framework.
Please let's talk together in order to discuss about a proposal.
Regards,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
stepharo
2015-02-21 12:44:36 UTC
Permalink
Indeed these are nice to have, now they will not magically happen :)
There is a 400 pages book on SciTalk.

Stef
Post by Andrea Ferretti
Hi Serge,
as I said I do not really have the time now to get involved in a gsoc
proposal, but I can give you my perspective. There are two sides to
the story.
The first one is complementary to SciSmalltalk: in order to analize
data, you need to get data in first. So, one may want to read - say -
- autodetection of encoding
- autodetection of quotes and delimiter
- autodetection of columns containing numbers or dates
- the possibility to indicate that some markers, such as "N/A",
represent missing values
- the possibility to indicate a replacement for missing values, such
as 0, or "", or the average or the minimum of the other values in the
colums
See http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#csv-text-files
for some examples.
It may be worth to consider making this into a sequence that is read
and processed lazily, to deal with CSV files bigger than memory.
When data is finally in, usually the first task is doing some
processing, inspection or visualization. The Smalltalk collections are
good for processing (although some lazy variants might help), and
Roassal and the inspectors are perfect for visualization and browsing.
The second part comes the time when one wants to run some algorithm.
While there is no need to have the fanciest ones, there should be some
- some form or regression (linear, logistic...)
- some form of clustering (kmeans, dbscan, canopy...)
- SVM
Another thing which would be useful is support for linear algebra,
leveraging native libraries such as BLAS or LAPACK.
In short: just copying R, or numpy + pandas + scikit-learn would
already be a giant leap forward.
Actually, some of the things I have mentioned above are already (I
documentation. There is really no point in having all these tools if
people do not know they are there.
For this to become useful, there should be a dedicated site,
highlighting what is already available, in what state (experimental,
partial, stable...) and how to use it.
Ideally, I would include also some tutorials, for instance for dealing
with standard problems such as Kaggle competitions. Here I think
Smalltalk would have an edge, since these tutorial could be in the
form of Prof Stef. Still, it would be nice if some form of the
tutorials was also on the web, which makes it discoverable.
Best,
Andrea
Post by Serge Stinckwich
Post by Sven Van Caekenberghe
OK, try making a proposal then, http://gsoc.pharo.org has the instructions and the current list, you probably know more about data science than I do.
Post by Andrea Ferretti
I am sorry if the previous messages came off as too harsh. The Neo
tools are perfectly fine for their intended use.
What I was trying to say is that a good idea for a SoC project would
be to develop a framework for data analysis that would be useful for
data scientists, and in particular this would include something to
import unstructured data more freely.
Sorry Andrea. I didn't see you message because I'm not pharo-users
mailing-list, only on pharo-dev.
I'm also really interested to have a gsoc project to develop data
analysis framework.
Please let's talk together in order to discuss about a proposal.
Regards,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
Werner Kassens
2015-02-22 17:34:31 UTC
Permalink
Post by stepharo
Indeed these are nice to have, now they will not magically happen :)
There is a 400 pages book on SciTalk.
Post by Andrea Ferretti
Actually, some of the things I have mentioned above are already (I
documentation. There is really no point in having all these tools if
people do not know they are there.
For this to become useful, there should be a dedicated site,
highlighting what is already available, in what state (experimental,
partial, stable...) and how to use it.
Hi Stefane, hi Andrea,
some docu for sciTalk can be found here:
https://github.com/SergeStinckwich/SciSmalltalk/wiki/SciSmalltalk-Contents
werner
stepharo
2015-02-23 07:58:25 UTC
Permalink
I will add these ideas to the list of Pharo topics (if serge did not do
it yet) but after the result of GSOC are announced
to avoid to break the web site.

Stef
Post by Andrea Ferretti
Hi Serge,
as I said I do not really have the time now to get involved in a gsoc
proposal, but I can give you my perspective. There are two sides to
the story.
The first one is complementary to SciSmalltalk: in order to analize
data, you need to get data in first. So, one may want to read - say -
- autodetection of encoding
- autodetection of quotes and delimiter
- autodetection of columns containing numbers or dates
- the possibility to indicate that some markers, such as "N/A",
represent missing values
- the possibility to indicate a replacement for missing values, such
as 0, or "", or the average or the minimum of the other values in the
colums
See http://pandas.pydata.org/pandas-docs/version/0.15.2/io.html#csv-text-files
for some examples.
It may be worth to consider making this into a sequence that is read
and processed lazily, to deal with CSV files bigger than memory.
When data is finally in, usually the first task is doing some
processing, inspection or visualization. The Smalltalk collections are
good for processing (although some lazy variants might help), and
Roassal and the inspectors are perfect for visualization and browsing.
The second part comes the time when one wants to run some algorithm.
While there is no need to have the fanciest ones, there should be some
- some form or regression (linear, logistic...)
- some form of clustering (kmeans, dbscan, canopy...)
- SVM
Another thing which would be useful is support for linear algebra,
leveraging native libraries such as BLAS or LAPACK.
In short: just copying R, or numpy + pandas + scikit-learn would
already be a giant leap forward.
Actually, some of the things I have mentioned above are already (I
documentation. There is really no point in having all these tools if
people do not know they are there.
For this to become useful, there should be a dedicated site,
highlighting what is already available, in what state (experimental,
partial, stable...) and how to use it.
Ideally, I would include also some tutorials, for instance for dealing
with standard problems such as Kaggle competitions. Here I think
Smalltalk would have an edge, since these tutorial could be in the
form of Prof Stef. Still, it would be nice if some form of the
tutorials was also on the web, which makes it discoverable.
Best,
Andrea
Post by Serge Stinckwich
Post by Sven Van Caekenberghe
OK, try making a proposal then, http://gsoc.pharo.org has the instructions and the current list, you probably know more about data science than I do.
Post by Andrea Ferretti
I am sorry if the previous messages came off as too harsh. The Neo
tools are perfectly fine for their intended use.
What I was trying to say is that a good idea for a SoC project would
be to develop a framework for data analysis that would be useful for
data scientists, and in particular this would include something to
import unstructured data more freely.
Sorry Andrea. I didn't see you message because I'm not pharo-users
mailing-list, only on pharo-dev.
I'm also really interested to have a gsoc project to develop data
analysis framework.
Please let's talk together in order to discuss about a proposal.
Regards,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
stepharo
2015-02-23 08:41:18 UTC
Permalink
Post by Andrea Ferretti
I am sorry if the previous messages came off as too harsh. The Neo
tools are perfectly fine for their intended use.
What I was trying to say is that a good idea for a SoC project would
be to develop a framework for data analysis that would be useful for
data scientists, and in particular this would include something to
import unstructured data more freely.
yes we need that for agile visulization book and distributions.

Stef
Sven Van Caekenberghe
2015-02-18 09:35:46 UTC
Permalink
Post by Andrea Ferretti
Thank you Sven. I think this should be emphasized and prominent on the
home page*. Still, libraries such as pandas are even more lenient,
- autodetecting which fields are numeric in CSV files
- allowing to fill missing data based on statistics (for instance, you
can say: where the field `age` is missing, use the average age)
Probably there is room for something built on top of Neo
* by the way, I suggest that the documentation on Neo could benefit
from a reorganization. Right now, the first topic on the NeoJSON
paper introduces JSON itself. I would argue that everyone that tries
to use the library knows what JSON is already. Still, there is no
example of how to read JSON from a file in the whole document.
These libraries (NeoCSV, NeoJSON, STON) were all written with only a dependency on a limited character stream API. It was a design decision not to depend on a File API, because at the time we were transitioning from the old FileStreams to FileSystem.

And I disagree about the JSON introduction ;-) You might know it, but that is not the case for everyone. Like not everyone knows CSV, HTTP, ...

But I do agree that sometimes I too would like a convenience method here or there ;-)
Post by Andrea Ferretti
Post by Sven Van Caekenberghe
Post by Andrea Ferretti
Also, these tasks
often involve consuming data from various sources, such as CSV and
Json files. NeoCSV and NeoJSON are still a little too rigid for the
task - libraries like pandas allow to just feed a csv file and try to
make head or tails of the content without having to define too much of
a schema beforehand
Both NeoCSV and NeoJSON can operate in two ways, (1) without the definition of any schema's or (2) with the definition of schema's and mappings. The quick and dirty explore style is most certainly possible.
'my-data.csv' asFileReference readStreamDo: [ :in | (NeoCSVReader on: in) upToEnd ].
=> an array of arrays
'my-data.json' asFileReference readStreamDo: [ :in | (NeoJSONReader on: in) next ].
=> objects structured using dictionaries and arrays
Sven
Serge Stinckwich
2015-02-27 10:47:03 UTC
Permalink
Dear all,

last week, we submit the Pharo proposal for GSOC 2015 with Uko and we
are waiting now for the answer from Google.
Accepted organisations will be announce March 2, 2015.

We have now more than 50 projects ideas but we are still looking for
more ideas from the community !

Don't be shy, propose your project idea here:
https://github.com/pharo-project/pharo-project-proposals/blob/master/Topics.st

regards,

On Sun, Feb 15, 2015 at 3:23 PM, Serge Stinckwich
Post by Serge Stinckwich
Dear pharoers,
this year Pharo consortium (and community) is going to take part in a
Google Summer of Code event[1] as a standalone organization. This is
an opportunity to promote Pharo, get some job done and have students
paid.
Currently we are at the most important stage as we are preparing the
organization application, and hoping that we will be accepted and
granted decent amount of project slots. Everyone can help with
application by submitting ideas for student projects.
https://github.com/pharo-project/pharo-project-proposals/blob/master/Topics.st
It is in STON format, and result is being generated at: http://gsoc.pharo.org/
Please add your ideas following the format of existing projects and
open a pull request with them (you will need a github account).
Preferably submit ideas with possible mentors, but if none are
available at the moment ideas without mentors are also welcome.
PharoTopic new
title: 'The name of your project;
contact: 'email address';
supervisors: 'Supervisors names';
keywords: 'keywords separated by spaces;
context: 'a description of the context of the project';
goal: 'description of the goal';
level: 'Beginner or Intermediate or Advanced';
yourself.
We will need a lot of projects/idea before February 20th 2015, the
deadline for applying to GSOC 2015.
Do not hesitate to ask questions. Administrators of this year’s
If you don't know how to edit the list, please send your project
following the template to the administrators.
[1]: https://www.google-melange.com/gsoc/homepage/google/gsoc2015
Cheers,
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
--
Serge Stinckwich
UCBN & UMI UMMISCO 209 (IRD/UPMC)
Every DSL ends up being Smalltalk
http://www.doesnotunderstand.org/
Continue reading on narkive:
Loading...